Payao: a community platform for SBML pathway model curation
Matsuoka, Yukiko; Ghosh, Samik; Kikuchi, Norihiro; Kitano, Hiroaki
2010-01-01
Summary: Payao is a community-based, collaborative web service platform for gene-regulatory and biochemical pathway model curation. The system combines Web 2.0 technologies and online model visualization functions to enable a collaborative community to annotate and curate biological models. Payao reads the models in Systems Biology Markup Language format, displays them with CellDesigner, a process diagram editor, which complies with the Systems Biology Graphical Notation, and provides an interface for model enrichment (adding tags and comments to the models) for the access-controlled community members. Availability and implementation: Freely available for model curation service at http://www.payaologue.org. Web site implemented in Seaser Framework 2.0 with S2Flex2, MySQL 5.0 and Tomcat 5.5, with all major browsers supported. Contact: kitano@sbi.jp PMID:20371497
Plant Reactome: a resource for plant pathways and comparative analysis
Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D.; Wu, Guanming; Fabregat, Antonio; Elser, Justin L.; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D.; Ware, Doreen; Jaiswal, Pankaj
2017-01-01
Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. PMID:27799469
Plant Reactome: a resource for plant pathways and comparative analysis.
Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D; Wu, Guanming; Fabregat, Antonio; Elser, Justin L; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D; Ware, Doreen; Jaiswal, Pankaj
2017-01-04
Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Foerster, Hartmut; Bombarely, Aureliano; Battey, James N D; Sierro, Nicolas; Ivanov, Nikolai V; Mueller, Lukas A
2018-01-01
Abstract SolCyc is the entry portal to pathway/genome databases (PGDBs) for major species of the Solanaceae family hosted at the Sol Genomics Network. Currently, SolCyc comprises six organism-specific PGDBs for tomato, potato, pepper, petunia, tobacco and one Rubiaceae, coffee. The metabolic networks of those PGDBs have been computationally predicted by the pathologic component of the pathway tools software using the manually curated multi-domain database MetaCyc (http://www.metacyc.org/) as reference. SolCyc has been recently extended by taxon-specific databases, i.e. the family-specific SolanaCyc database, containing only curated data pertinent to species of the nightshade family, and NicotianaCyc, a genus-specific database that stores all relevant metabolic data of the Nicotiana genus. Through manual curation of the published literature, new metabolic pathways have been created in those databases, which are complemented by the continuously updated, relevant species-specific pathways from MetaCyc. At present, SolanaCyc comprises 199 pathways and 29 superpathways and NicotianaCyc accounts for 72 pathways and 13 superpathways. Curator-maintained, taxon-specific databases such as SolanaCyc and NicotianaCyc are characterized by an enrichment of data specific to these taxa and free of falsely predicted pathways. Both databases have been used to update recently created Nicotiana-specific databases for Nicotiana tabacum, Nicotiana benthamiana, Nicotiana sylvestris and Nicotiana tomentosiformis by propagating verifiable data into those PGDBs. In addition, in-depth curation of the pathways in N.tabacum has been carried out which resulted in the elimination of 156 pathways from the 569 pathways predicted by pathway tools. Together, in-depth curation of the predicted pathway network and the supplementation with curated data from taxon-specific databases has substantially improved the curation status of the species–specific N.tabacum PGDB. The implementation of this strategy will significantly advance the curation status of all organism-specific databases in SolCyc resulting in the improvement on database accuracy, data analysis and visualization of biochemical networks in those species. Database URL https://solgenomics.net/tools/solcyc/ PMID:29762652
Soto, Axel J; Zerva, Chrysoula; Batista-Navarro, Riza; Ananiadou, Sophia
2018-04-15
Pathway models are valuable resources that help us understand the various mechanisms underpinning complex biological processes. Their curation is typically carried out through manual inspection of published scientific literature to find information relevant to a model, which is a laborious and knowledge-intensive task. Furthermore, models curated manually cannot be easily updated and maintained with new evidence extracted from the literature without automated support. We have developed LitPathExplorer, a visual text analytics tool that integrates advanced text mining, semi-supervised learning and interactive visualization, to facilitate the exploration and analysis of pathway models using statements (i.e. events) extracted automatically from the literature and organized according to levels of confidence. LitPathExplorer supports pathway modellers and curators alike by: (i) extracting events from the literature that corroborate existing models with evidence; (ii) discovering new events which can update models; and (iii) providing a confidence value for each event that is automatically computed based on linguistic features and article metadata. Our evaluation of event extraction showed a precision of 89% and a recall of 71%. Evaluation of our confidence measure, when used for ranking sampled events, showed an average precision ranging between 61 and 73%, which can be improved to 95% when the user is involved in the semi-supervised learning process. Qualitative evaluation using pair analytics based on the feedback of three domain experts confirmed the utility of our tool within the context of pathway model exploration. LitPathExplorer is available at http://nactem.ac.uk/LitPathExplorer_BI/. sophia.ananiadou@manchester.ac.uk. Supplementary data are available at Bioinformatics online.
Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology
Paley, Suzanne M.; Krummenacker, Markus; Latendresse, Mario; Dale, Joseph M.; Lee, Thomas J.; Kaipa, Pallavi; Gilham, Fred; Spaulding, Aaron; Popescu, Liviu; Altman, Tomer; Paulsen, Ian; Keseler, Ingrid M.; Caspi, Ron
2010-01-01
Pathway Tools is a production-quality software environment for creating a type of model-organism database called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc integrates the evolving understanding of the genes, proteins, metabolic network and regulatory network of an organism. This article provides an overview of Pathway Tools capabilities. The software performs multiple computational inferences including prediction of metabolic pathways, prediction of metabolic pathway hole fillers and prediction of operons. It enables interactive editing of PGDBs by DB curators. It supports web publishing of PGDBs, and provides a large number of query and visualization tools. The software also supports comparative analyses of PGDBs, and provides several systems biology analyses of PGDBs including reachability analysis of metabolic networks, and interactive tracing of metabolites through a metabolic network. More than 800 PGDBs have been created using Pathway Tools by scientists around the world, many of which are curated DBs for important model organisms. Those PGDBs can be exchanged using a peer-to-peer DB sharing system called the PGDB Registry. PMID:19955237
Gene regulation knowledge commons: community action takes care of DNA binding transcription factors
Tripathi, Sushil; Vercruysse, Steven; Chawla, Konika; Christie, Karen R.; Blake, Judith A.; Huntley, Rachael P.; Orchard, Sandra; Hermjakob, Henning; Thommesen, Liv; Lægreid, Astrid; Kuiper, Martin
2016-01-01
A large gap remains between the amount of knowledge in scientific literature and the fraction that gets curated into standardized databases, despite many curation initiatives. Yet the availability of comprehensive knowledge in databases is crucial for exploiting existing background knowledge, both for designing follow-up experiments and for interpreting new experimental data. Structured resources also underpin the computational integration and modeling of regulatory pathways, which further aids our understanding of regulatory dynamics. We argue how cooperation between the scientific community and professional curators can increase the capacity of capturing precise knowledge from literature. We demonstrate this with a project in which we mobilize biological domain experts who curate large amounts of DNA binding transcription factors, and show that they, although new to the field of curation, can make valuable contributions by harvesting reported knowledge from scientific papers. Such community curation can enhance the scientific epistemic process. Database URL: http://www.tfcheckpoint.org PMID:27270715
Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013
2015-01-01
Background Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two event extraction tasks introduced in the BioNLP Shared Task 2013. The CG task focuses on cancer, emphasizing the extraction of physiological and pathological processes at various levels of biological organization, and the PC task targets reactions relevant to the development of biomolecular pathway models, defining its extraction targets on the basis of established pathway representations and ontologies. Results Six groups participated in the CG task and two groups in the PC task, together applying a wide range of extraction approaches including both established state-of-the-art systems and newly introduced extraction methods. The best-performing systems achieved F-scores of 55% on the CG task and 53% on the PC task, demonstrating a level of performance comparable to the best results achieved in similar previously proposed tasks. Conclusions The results indicate that existing event extraction technology can generalize to meet the novel challenges represented by the CG and PC task settings, suggesting that extraction methods are capable of supporting the construction of knowledge bases on the molecular mechanisms of cancer and the curation of biomolecular pathway models. The CG and PC tasks continue as open challenges for all interested parties, with data, tools and resources available from the shared task homepage. PMID:26202570
A computational platform to maintain and migrate manual functional annotations for BioCyc databases.
Walsh, Jesse R; Sen, Taner Z; Dickerson, Julie A
2014-10-12
BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database. We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers. Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.
NetPath: a public resource of curated signal transduction pathways
2010-01-01
We have developed NetPath as a resource of curated human signaling pathways. As an initial step, NetPath provides detailed maps of a number of immune signaling pathways, which include approximately 1,600 reactions annotated from the literature and more than 2,800 instances of transcriptionally regulated genes - all linked to over 5,500 published articles. We anticipate NetPath to become a consolidated resource for human signaling pathways that should enable systems biology approaches. PMID:20067622
Modeling central metabolism and energy biosynthesis across microbial life.
Edirisinghe, Janaka N; Weisenhorn, Pamela; Conrad, Neal; Xia, Fangfang; Overbeek, Ross; Stevens, Rick L; Henry, Christopher S
2016-08-08
Automatically generated bacterial metabolic models, and even some curated models, lack accuracy in predicting energy yields due to poor representation of key pathways in energy biosynthesis and the electron transport chain (ETC). Further compounding the problem, complex interlinking pathways in genome-scale metabolic models, and the need for extensive gapfilling to support complex biomass reactions, often results in predicting unrealistic yields or unrealistic physiological flux profiles. To overcome this challenge, we developed methods and tools ( http://coremodels.mcs.anl.gov ) to build high quality core metabolic models (CMM) representing accurate energy biosynthesis based on a well studied, phylogenetically diverse set of model organisms. We compare these models to explore the variability of core pathways across all microbial life, and by analyzing the ability of our core models to synthesize ATP and essential biomass precursors, we evaluate the extent to which the core metabolic pathways and functional ETCs are known for all microbes. 6,600 (80 %) of our models were found to have some type of aerobic ETC, whereas 5,100 (62 %) have an anaerobic ETC, and 1,279 (15 %) do not have any ETC. Using our manually curated ETC and energy biosynthesis pathways with no gapfilling at all, we predict accurate ATP yields for nearly 5586 (70 %) of the models under aerobic and anaerobic growth conditions. This study revealed gaps in our knowledge of the central pathways that result in 2,495 (30 %) CMMs being unable to produce ATP under any of the tested conditions. We then established a methodology for the systematic identification and correction of inconsistent annotations using core metabolic models coupled with phylogenetic analysis. We predict accurate energy yields based on our improved annotations in energy biosynthesis pathways and the implementation of diverse ETC reactions across the microbial tree of life. We highlighted missing annotations that were essential to energy biosynthesis in our models. We examine the diversity of these pathways across all microbial life and enable the scientific community to explore the analyses generated from this large-scale analysis of over 8000 microbial genomes.
Modeling central metabolism and energy biosynthesis across microbial life
Edirisinghe, Janaka N.; Weisenhorn, Pamela; Conrad, Neal; ...
2016-08-08
Here, automatically generated bacterial metabolic models, and even some curated models, lack accuracy in predicting energy yields due to poor representation of key pathways in energy biosynthesis and the electron transport chain (ETC). Further compounding the problem, complex interlinking pathways in genome-scale metabolic models, and the need for extensive gapfilling to support complex biomass reactions, often results in predicting unrealistic yields or unrealistic physiological flux profiles. As a result, to overcome this challenge, we developed methods and tools to build high quality core metabolic models (CMM) representing accurate energy biosynthesis based on a well studied, phylogenetically diverse set of modelmore » organisms. We compare these models to explore the variability of core pathways across all microbial life, and by analyzing the ability of our core models to synthesize ATP and essential biomass precursors, we evaluate the extent to which the core metabolic pathways and functional ETCs are known for all microbes. 6,600 (80 %) of our models were found to have some type of aerobic ETC, whereas 5,100 (62 %) have an anaerobic ETC, and 1,279 (15 %) do not have any ETC. Using our manually curated ETC and energy biosynthesis pathways with no gapfilling at all, we predict accurate ATP yields for nearly 5586 (70 %) of the models under aerobic and anaerobic growth conditions. This study revealed gaps in our knowledge of the central pathways that result in 2,495 (30 %) CMMs being unable to produce ATP under any of the tested conditions. We then established a methodology for the systematic identification and correction of inconsistent annotations using core metabolic models coupled with phylogenetic analysis. In conclusion, we predict accurate energy yields based on our improved annotations in energy biosynthesis pathways and the implementation of diverse ETC reactions across the microbial tree of life. We highlighted missing annotations that were essential to energy biosynthesis in our models. We examine the diversity of these pathways across all microbial life and enable the scientific community to explore the analyses generated from this large-scale analysis of over 8000 microbial genomes.« less
Modeling central metabolism and energy biosynthesis across microbial life
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edirisinghe, Janaka N.; Weisenhorn, Pamela; Conrad, Neal
Here, automatically generated bacterial metabolic models, and even some curated models, lack accuracy in predicting energy yields due to poor representation of key pathways in energy biosynthesis and the electron transport chain (ETC). Further compounding the problem, complex interlinking pathways in genome-scale metabolic models, and the need for extensive gapfilling to support complex biomass reactions, often results in predicting unrealistic yields or unrealistic physiological flux profiles. As a result, to overcome this challenge, we developed methods and tools to build high quality core metabolic models (CMM) representing accurate energy biosynthesis based on a well studied, phylogenetically diverse set of modelmore » organisms. We compare these models to explore the variability of core pathways across all microbial life, and by analyzing the ability of our core models to synthesize ATP and essential biomass precursors, we evaluate the extent to which the core metabolic pathways and functional ETCs are known for all microbes. 6,600 (80 %) of our models were found to have some type of aerobic ETC, whereas 5,100 (62 %) have an anaerobic ETC, and 1,279 (15 %) do not have any ETC. Using our manually curated ETC and energy biosynthesis pathways with no gapfilling at all, we predict accurate ATP yields for nearly 5586 (70 %) of the models under aerobic and anaerobic growth conditions. This study revealed gaps in our knowledge of the central pathways that result in 2,495 (30 %) CMMs being unable to produce ATP under any of the tested conditions. We then established a methodology for the systematic identification and correction of inconsistent annotations using core metabolic models coupled with phylogenetic analysis. In conclusion, we predict accurate energy yields based on our improved annotations in energy biosynthesis pathways and the implementation of diverse ETC reactions across the microbial tree of life. We highlighted missing annotations that were essential to energy biosynthesis in our models. We examine the diversity of these pathways across all microbial life and enable the scientific community to explore the analyses generated from this large-scale analysis of over 8000 microbial genomes.« less
Hosmani, Prashant S.; Villalobos-Ayala, Krystal; Miller, Sherry; Shippy, Teresa; Flores, Mirella; Rosendale, Andrew; Cordola, Chris; Bell, Tracey; Mann, Hannah; DeAvila, Gabe; DeAvila, Daniel; Moore, Zachary; Buller, Kyle; Ciolkevich, Kathryn; Nandyal, Samantha; Mahoney, Robert; Van Voorhis, Joshua; Dunlevy, Megan; Farrow, David; Hunter, David; Morgan, Taylar; Shore, Kayla; Guzman, Victoria; Izsak, Allison; Dixon, Danielle E.; Cridge, Andrew; Cano, Liliana; Cao, Xiaolong; Jiang, Haobo; Leng, Nan; Johnson, Shannon; Cantarel, Brandi L.; Richards, Stephen; English, Adam; Shatters, Robert G.; Childers, Chris; Chen, Mei-Ju; Hunter, Wayne; Cilia, Michelle; Mueller, Lukas A.; Munoz-Torres, Monica; Nelson, David; Poelchau, Monica F.; Benoit, Joshua B.; Wiersma-Koch, Helen; D’Elia, Tom; Brown, Susan J.
2017-01-01
Abstract The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the pathogen associated with citrus Huanglongbing (HLB, citrus greening). HLB threatens citrus production worldwide. Suppression or reduction of the insect vector using chemical insecticides has been the primary method to inhibit the spread of citrus greening disease. Accurate structural and functional annotation of the Asian citrus psyllid genome, as well as a clear understanding of the interactions between the insect and CLas, are required for development of new molecular-based HLB control methods. A draft assembly of the D. citri genome has been generated and annotated with automated pipelines. However, knowledge transfer from well-curated reference genomes such as that of Drosophila melanogaster to newly sequenced ones is challenging due to the complexity and diversity of insect genomes. To identify and improve gene models as potential targets for pest control, we manually curated several gene families with a focus on genes that have key functional roles in D. citri biology and CLas interactions. This community effort produced 530 manually curated gene models across developmental, physiological, RNAi regulatory and immunity-related pathways. As previously shown in the pea aphid, RNAi machinery genes putatively involved in the microRNA pathway have been specifically duplicated. A comprehensive transcriptome enabled us to identify a number of gene families that are either missing or misassembled in the draft genome. In order to develop biocuration as a training experience, we included undergraduate and graduate students from multiple institutions, as well as experienced annotators from the insect genomics research community. The resulting gene set (OGS v1.0) combines both automatically predicted and manually curated gene models. Database URL: https://citrusgreening.org/ PMID:29220441
Gramene 2016: comparative plant genomics and pathway resources
Tello-Ruiz, Marcela K.; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M.; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A.; Huerta, Laura; Keays, Maria; Tang, Y. Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J.; Jaiswal, Pankaj; Ware, Doreen
2016-01-01
Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. PMID:26553803
EcoCyc: a comprehensive database resource for Escherichia coli
Keseler, Ingrid M.; Collado-Vides, Julio; Gama-Castro, Socorro; Ingraham, John; Paley, Suzanne; Paulsen, Ian T.; Peralta-Gil, Martín; Karp, Peter D.
2005-01-01
The EcoCyc database (http://EcoCyc.org/) is a comprehensive source of information on the biology of the prototypical model organism Escherichia coli K12. The mission for EcoCyc is to contain both computable descriptions of, and detailed comments describing, all genes, proteins, pathways and molecular interactions in E.coli. Through ongoing manual curation, extensive information such as summary comments, regulatory information, literature citations and evidence types has been extracted from 8862 publications and added to Version 8.5 of the EcoCyc database. The EcoCyc database can be accessed through a World Wide Web interface, while the downloadable Pathway Tools software and data files enable computational exploration of the data and provide enhanced querying capabilities that web interfaces cannot support. For example, EcoCyc contains carefully curated information that can be used as training sets for bioinformatics prediction of entities such as promoters, operons, genetic networks, transcription factor binding sites, metabolic pathways, functionally related genes, protein complexes and protein–ligand interactions. PMID:15608210
Gramene 2013: comparative plant genomics resources.
Monaco, Marcela K; Stein, Joshua; Naithani, Sushma; Wei, Sharon; Dharmawardhana, Palitha; Kumari, Sunita; Amarasinghe, Vindhya; Youens-Clark, Ken; Thomason, James; Preece, Justin; Pasternak, Shiran; Olson, Andrew; Jiao, Yinping; Lu, Zhenyuan; Bolser, Dan; Kerhornou, Arnaud; Staines, Dan; Walts, Brandon; Wu, Guanming; D'Eustachio, Peter; Haw, Robin; Croft, David; Kersey, Paul J; Stein, Lincoln; Jaiswal, Pankaj; Ware, Doreen
2014-01-01
Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.
WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research.
Slenter, Denise N; Kutmon, Martina; Hanspers, Kristina; Riutta, Anders; Windsor, Jacob; Nunes, Nuno; Mélius, Jonathan; Cirillo, Elisa; Coort, Susan L; Digles, Daniela; Ehrhart, Friederike; Giesbertz, Pieter; Kalafati, Marianthi; Martens, Marvin; Miller, Ryan; Nishida, Kozo; Rieswijk, Linda; Waagmeester, Andra; Eijssen, Lars M T; Evelo, Chris T; Pico, Alexander R; Willighagen, Egon L
2018-01-04
WikiPathways (wikipathways.org) captures the collective knowledge represented in biological pathways. By providing a database in a curated, machine readable way, omics data analysis and visualization is enabled. WikiPathways and other pathway databases are used to analyze experimental data by research groups in many fields. Due to the open and collaborative nature of the WikiPathways platform, our content keeps growing and is getting more accurate, making WikiPathways a reliable and rich pathway database. Previously, however, the focus was primarily on genes and proteins, leaving many metabolites with only limited annotation. Recent curation efforts focused on improving the annotation of metabolism and metabolic pathways by associating unmapped metabolites with database identifiers and providing more detailed interaction knowledge. Here, we report the outcomes of the continued growth and curation efforts, such as a doubling of the number of annotated metabolite nodes in WikiPathways. Furthermore, we introduce an OpenAPI documentation of our web services and the FAIR (Findable, Accessible, Interoperable and Reusable) annotation of resources to increase the interoperability of the knowledge encoded in these pathways and experimental omics data. New search options, monthly downloads, more links to metabolite databases, and new portals make pathway knowledge more effortlessly accessible to individual researchers and research communities. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
PathNER: a tool for systematic identification of biological pathway mentions in the literature
2013-01-01
Background Biological pathways are central to many biomedical studies and are frequently discussed in the literature. Several curated databases have been established to collate the knowledge of molecular processes constituting pathways. Yet, there has been little focus on enabling systematic detection of pathway mentions in the literature. Results We developed a tool, named PathNER (Pathway Named Entity Recognition), for the systematic identification of pathway mentions in the literature. PathNER is based on soft dictionary matching and rules, with the dictionary generated from public pathway databases. The rules utilise general pathway-specific keywords, syntactic information and gene/protein mentions. Detection results from both components are merged. On a gold-standard corpus, PathNER achieved an F1-score of 84%. To illustrate its potential, we applied PathNER on a collection of articles related to Alzheimer's disease to identify associated pathways, highlighting cases that can complement an existing manually curated knowledgebase. Conclusions In contrast to existing text-mining efforts that target the automatic reconstruction of pathway details from molecular interactions mentioned in the literature, PathNER focuses on identifying specific named pathway mentions. These mentions can be used to support large-scale curation and pathway-related systems biology applications, as demonstrated in the example of Alzheimer's disease. PathNER is implemented in Java and made freely available online at http://sourceforge.net/projects/pathner/. PMID:24555844
The BioGRID Interaction Database: 2011 update
Stark, Chris; Breitkreutz, Bobby-Joe; Chatr-aryamontri, Andrew; Boucher, Lorrie; Oughtred, Rose; Livstone, Michael S.; Nixon, Julie; Van Auken, Kimberly; Wang, Xiaodong; Shi, Xiaoqi; Reguly, Teresa; Rust, Jennifer M.; Winter, Andrew; Dolinski, Kara; Tyers, Mike
2011-01-01
The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions. PMID:21071413
Parallel labeling experiments for pathway elucidation and (13)C metabolic flux analysis.
Antoniewicz, Maciek R
2015-12-01
Metabolic pathway models provide the foundation for quantitative studies of cellular physiology through the measurement of intracellular metabolic fluxes. For model organisms metabolic models are well established, with many manually curated genome-scale model reconstructions, gene knockout studies and stable-isotope tracing studies. However, for non-model organisms a similar level of knowledge is often lacking. Compartmentation of cellular metabolism in eukaryotic systems also presents significant challenges for quantitative (13)C-metabolic flux analysis ((13)C-MFA). Recently, innovative (13)C-MFA approaches have been developed based on parallel labeling experiments, the use of multiple isotopic tracers and integrated data analysis, that allow more rigorous validation of pathway models and improved quantification of metabolic fluxes. Applications of these approaches open new research directions in metabolic engineering, biotechnology and medicine. Copyright © 2015 Elsevier Ltd. All rights reserved.
Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology
Latendresse, Mario; Paley, Suzanne M.; Krummenacker, Markus; Ong, Quang D.; Billington, Richard; Kothari, Anamika; Weaver, Daniel; Lee, Thomas; Subhraveti, Pallavi; Spaulding, Aaron; Fulcher, Carol; Keseler, Ingrid M.; Caspi, Ron
2016-01-01
Pathway Tools is a bioinformatics software environment with a broad set of capabilities. The software provides genome-informatics tools such as a genome browser, sequence alignments, a genome-variant analyzer and comparative-genomics operations. It offers metabolic-informatics tools, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings and metabolic route search. Pathway Tools also provides regulatory-informatics tools, such as the ability to represent and visualize a wide range of regulatory interactions. This article outlines the advances in Pathway Tools in the past 5 years. Major additions include components for metabolic modeling, metabolic route search, computation of atom mappings and estimation of compound Gibbs free energies of formation; addition of editors for signaling pathways, for genome sequences and for cellular architecture; storage of gene essentiality data and phenotype data; display of multiple alignments, and of signaling and electron-transport pathways; and development of Python and web-services application programming interfaces. Scientists around the world have created more than 9800 Pathway/Genome Databases by using Pathway Tools, many of which are curated databases for important model organisms. PMID:26454094
The Reactome pathway knowledgebase
Croft, David; Mundo, Antonio Fabregat; Haw, Robin; Milacic, Marija; Weiser, Joel; Wu, Guanming; Caudy, Michael; Garapati, Phani; Gillespie, Marc; Kamdar, Maulik R.; Jassal, Bijay; Jupe, Steven; Matthews, Lisa; May, Bruce; Palatnik, Stanislav; Rothfels, Karen; Shamovsky, Veronica; Song, Heeyeon; Williams, Mark; Birney, Ewan; Hermjakob, Henning; Stein, Lincoln; D'Eustachio, Peter
2014-01-01
Reactome (http://www.reactome.org) is a manually curated open-source open-data resource of human pathways and reactions. The current version 46 describes 7088 human proteins (34% of the predicted human proteome), participating in 6744 reactions based on data extracted from 15 107 research publications with PubMed links. The Reactome Web site and analysis tool set have been completely redesigned to increase speed, flexibility and user friendliness. The data model has been extended to support annotation of disease processes due to infectious agents and to mutation. PMID:24243840
The Reactome pathway knowledgebase.
Croft, David; Mundo, Antonio Fabregat; Haw, Robin; Milacic, Marija; Weiser, Joel; Wu, Guanming; Caudy, Michael; Garapati, Phani; Gillespie, Marc; Kamdar, Maulik R; Jassal, Bijay; Jupe, Steven; Matthews, Lisa; May, Bruce; Palatnik, Stanislav; Rothfels, Karen; Shamovsky, Veronica; Song, Heeyeon; Williams, Mark; Birney, Ewan; Hermjakob, Henning; Stein, Lincoln; D'Eustachio, Peter
2014-01-01
Reactome (http://www.reactome.org) is a manually curated open-source open-data resource of human pathways and reactions. The current version 46 describes 7088 human proteins (34% of the predicted human proteome), participating in 6744 reactions based on data extracted from 15 107 research publications with PubMed links. The Reactome Web site and analysis tool set have been completely redesigned to increase speed, flexibility and user friendliness. The data model has been extended to support annotation of disease processes due to infectious agents and to mutation.
Gramene 2016: comparative plant genomics and pathway resources.
Tello-Ruiz, Marcela K; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A; Huerta, Laura; Keays, Maria; Tang, Y Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J; Jaiswal, Pankaj; Ware, Doreen
2016-01-04
Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Whipworm kinomes reflect a unique biology and adaptation to the host animal.
Stroehlein, Andreas J; Young, Neil D; Korhonen, Pasi K; Chang, Bill C H; Nejsum, Peter; Pozio, Edoardo; La Rosa, Giuseppe; Sternberg, Paul W; Gasser, Robin B
2017-11-01
Roundworms belong to a diverse phylum (Nematoda) which is comprised of many parasitic species including whipworms (genus Trichuris). These worms have adapted to a biological niche within the host and exhibit unique morphological characteristics compared with other nematodes. Although these adaptations are known, the underlying molecular mechanisms remain elusive. The availability of genomes and transcriptomes of some whipworms now enables detailed studies of their molecular biology. Here, we defined and curated the full complement of an important class of enzymes, the protein kinases (kinomes) of two species of Trichuris, using an advanced and integrated bioinformatic pipeline. We investigated the transcription of Trichuris suis kinase genes across developmental stages, sexes and tissues, and reveal that selectively transcribed genes can be linked to central roles in developmental and reproductive processes. We also classified and functionally annotated the curated kinomes by integrating evidence from structural modelling and pathway analyses, and compared them with other curated kinomes of phylogenetically diverse nematode species. Our findings suggest unique adaptations in signalling processes governing worm morphology and biology, and provide an important resource that should facilitate experimental investigations of kinases and the biology of signalling pathways in nematodes. Copyright © 2017 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.
Reactome graph database: Efficient access to complex pathway data
Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902
Reactome graph database: Efficient access to complex pathway data.
Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology.
Karp, Peter D; Latendresse, Mario; Paley, Suzanne M; Krummenacker, Markus; Ong, Quang D; Billington, Richard; Kothari, Anamika; Weaver, Daniel; Lee, Thomas; Subhraveti, Pallavi; Spaulding, Aaron; Fulcher, Carol; Keseler, Ingrid M; Caspi, Ron
2016-09-01
Pathway Tools is a bioinformatics software environment with a broad set of capabilities. The software provides genome-informatics tools such as a genome browser, sequence alignments, a genome-variant analyzer and comparative-genomics operations. It offers metabolic-informatics tools, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings and metabolic route search. Pathway Tools also provides regulatory-informatics tools, such as the ability to represent and visualize a wide range of regulatory interactions. This article outlines the advances in Pathway Tools in the past 5 years. Major additions include components for metabolic modeling, metabolic route search, computation of atom mappings and estimation of compound Gibbs free energies of formation; addition of editors for signaling pathways, for genome sequences and for cellular architecture; storage of gene essentiality data and phenotype data; display of multiple alignments, and of signaling and electron-transport pathways; and development of Python and web-services application programming interfaces. Scientists around the world have created more than 9800 Pathway/Genome Databases by using Pathway Tools, many of which are curated databases for important model organisms. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
PathwayAccess: CellDesigner plugins for pathway databases.
Van Hemert, John L; Dickerson, Julie A
2010-09-15
CellDesigner provides a user-friendly interface for graphical biochemical pathway description. Many pathway databases are not directly exportable to CellDesigner models. PathwayAccess is an extensible suite of CellDesigner plugins, which connect CellDesigner directly to pathway databases using respective Java application programming interfaces. The process is streamlined for creating new PathwayAccess plugins for specific pathway databases. Three PathwayAccess plugins, MetNetAccess, BioCycAccess and ReactomeAccess, directly connect CellDesigner to the pathway databases MetNetDB, BioCyc and Reactome. PathwayAccess plugins enable CellDesigner users to expose pathway data to analytical CellDesigner functions, curate their pathway databases and visually integrate pathway data from different databases using standard Systems Biology Markup Language and Systems Biology Graphical Notation. Implemented in Java, PathwayAccess plugins run with CellDesigner version 4.0.1 and were tested on Ubuntu Linux, Windows XP and 7, and MacOSX. Source code, binaries, documentation and video walkthroughs are freely available at http://vrac.iastate.edu/~jlv.
Using uncertainty to link and rank evidence from biomedical literature for model curation
Zerva, Chrysoula; Batista-Navarro, Riza; Day, Philip; Ananiadou, Sophia
2017-01-01
Abstract Motivation In recent years, there has been great progress in the field of automated curation of biomedical networks and models, aided by text mining methods that provide evidence from literature. Such methods must not only extract snippets of text that relate to model interactions, but also be able to contextualize the evidence and provide additional confidence scores for the interaction in question. Although various approaches calculating confidence scores have focused primarily on the quality of the extracted information, there has been little work on exploring the textual uncertainty conveyed by the author. Despite textual uncertainty being acknowledged in biomedical text mining as an attribute of text mined interactions (events), it is significantly understudied as a means of providing a confidence measure for interactions in pathways or other biomedical models. In this work, we focus on improving identification of textual uncertainty for events and explore how it can be used as an additional measure of confidence for biomedical models. Results We present a novel method for extracting uncertainty from the literature using a hybrid approach that combines rule induction and machine learning. Variations of this hybrid approach are then discussed, alongside their advantages and disadvantages. We use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction. Our approach achieves F-scores of 0.76 and 0.88 based on the BioNLP-ST and Genia-MK corpora, respectively, making considerable improvements over previously published work. Moreover, we evaluate our proposed system on pathways related to two different areas, namely leukemia and melanoma cancer research. Availability and implementation The leukemia pathway model used is available in Pathway Studio while the Ras model is available via PathwayCommons. Online demonstration of the uncertainty extraction system is available for research purposes at http://argo.nactem.ac.uk/test. The related code is available on https://github.com/c-zrv/uncertainty_components.git. Details on the above are available in the Supplementary Material. Contact sophia.ananiadou@manchester.ac.uk Supplementary information Supplementary data are available at Bioinformatics online. PMID:29036627
Using uncertainty to link and rank evidence from biomedical literature for model curation.
Zerva, Chrysoula; Batista-Navarro, Riza; Day, Philip; Ananiadou, Sophia
2017-12-01
In recent years, there has been great progress in the field of automated curation of biomedical networks and models, aided by text mining methods that provide evidence from literature. Such methods must not only extract snippets of text that relate to model interactions, but also be able to contextualize the evidence and provide additional confidence scores for the interaction in question. Although various approaches calculating confidence scores have focused primarily on the quality of the extracted information, there has been little work on exploring the textual uncertainty conveyed by the author. Despite textual uncertainty being acknowledged in biomedical text mining as an attribute of text mined interactions (events), it is significantly understudied as a means of providing a confidence measure for interactions in pathways or other biomedical models. In this work, we focus on improving identification of textual uncertainty for events and explore how it can be used as an additional measure of confidence for biomedical models. We present a novel method for extracting uncertainty from the literature using a hybrid approach that combines rule induction and machine learning. Variations of this hybrid approach are then discussed, alongside their advantages and disadvantages. We use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction. Our approach achieves F-scores of 0.76 and 0.88 based on the BioNLP-ST and Genia-MK corpora, respectively, making considerable improvements over previously published work. Moreover, we evaluate our proposed system on pathways related to two different areas, namely leukemia and melanoma cancer research. The leukemia pathway model used is available in Pathway Studio while the Ras model is available via PathwayCommons. Online demonstration of the uncertainty extraction system is available for research purposes at http://argo.nactem.ac.uk/test. The related code is available on https://github.com/c-zrv/uncertainty_components.git. Details on the above are available in the Supplementary Material. sophia.ananiadou@manchester.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Walsh, Jesse R.; Schaeffer, Mary L.; Zhang, Peifen; ...
2016-11-29
As metabolic pathway resources become more commonly available, researchers have unprecedented access to information about their organism of interest. Despite efforts to ensure consistency between various resources, information content and quality can vary widely. Two maize metabolic pathway resources for the B73 inbred line, CornCyc 4.0 and MaizeCyc 2.2, are based on the same gene model set and were developed using Pathway Tools software. These resources differ in their initial enzymatic function assignments and in the extent of manual curation. Here, we present an in-depth comparison between CornCyc and MaizeCyc to demonstrate the effect of initial computational enzymatic function assignmentsmore » on the quality and content of metabolic pathway resources.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Walsh, Jesse R.; Schaeffer, Mary L.; Zhang, Peifen
As metabolic pathway resources become more commonly available, researchers have unprecedented access to information about their organism of interest. Despite efforts to ensure consistency between various resources, information content and quality can vary widely. Two maize metabolic pathway resources for the B73 inbred line, CornCyc 4.0 and MaizeCyc 2.2, are based on the same gene model set and were developed using Pathway Tools software. These resources differ in their initial enzymatic function assignments and in the extent of manual curation. Here, we present an in-depth comparison between CornCyc and MaizeCyc to demonstrate the effect of initial computational enzymatic function assignmentsmore » on the quality and content of metabolic pathway resources.« less
Wang, Han-I; Smith, Alexandra; Aas, Eline; Roman, Eve; Crouch, Simon; Burton, Cathy; Patmore, Russell
2017-03-01
Diffuse large B-cell lymphoma (DLBCL) is the commonest non-Hodgkin lymphoma. Previous studies examining the cost of treating DLBCL have generally focused on a specific first-line therapy alone; meaning that their findings can neither be extrapolated to the general patient population nor to other points along the treatment pathway. Based on empirical data from a representative population-based patient cohort, the objective of this study was to develop a simulation model that could predict costs and life expectancy of treating DLBCL. All patients newly diagnosed with DLBCL in the UK's population-based Haematological Malignancy Research Network ( www.hmrn.org ) in 2007 were followed until 2013 (n = 271). Mapped treatment pathways, alongside cost information derived from the National Tariff 2013/14, were incorporated into a patient-level simulation model in order to reflect the heterogeneities of patient characteristics and treatment options. The NHS and social services perspective was adopted, and all outcomes were discounted at 3.5 % per annum. Overall, the expected total medical costs were £22,122 for those treated with curative intent, and £2930 for those managed palliatively. For curative chemotherapy, the predicted medical costs were £14,966, £23,449 and £7376 for first-, second- and third-line treatments, respectively. The estimated annual cost for treating DLBCL across the UK was around £88-92 million. This is the first cost modelling study using empirical data to provide 'real world' evidence throughout the DLBCL treatment pathway. Future application of the model could include evaluation of new technologies/treatments to support healthcare decision makers, especially in the era of personalised medicine.
Grohar: Automated Visualization of Genome-Scale Metabolic Models and Their Pathways.
Moškon, Miha; Zimic, Nikolaj; Mraz, Miha
2018-05-01
Genome-scale metabolic models (GEMs) have become a powerful tool for the investigation of the entire metabolism of the organism in silico. These models are, however, often extremely hard to reconstruct and also difficult to apply to the selected problem. Visualization of the GEM allows us to easier comprehend the model, to perform its graphical analysis, to find and correct the faulty relations, to identify the parts of the system with a designated function, etc. Even though several approaches for the automatic visualization of GEMs have been proposed, metabolic maps are still manually drawn or at least require large amount of manual curation. We present Grohar, a computational tool for automatic identification and visualization of GEM (sub)networks and their metabolic fluxes. These (sub)networks can be specified directly by listing the metabolites of interest or indirectly by providing reference metabolic pathways from different sources, such as KEGG, SBML, or Matlab file. These pathways are identified within the GEM using three different pathway alignment algorithms. Grohar also supports the visualization of the model adjustments (e.g., activation or inhibition of metabolic reactions) after perturbations are induced.
The adverse outcome pathway (AOP) framework provides a way of organizing knowledge related to the key biological events that result in a particular health outcome. For the majority of environmental chemicals, the availability of curated pathways characterizing potential toxicity ...
Nag, Ambarish; Karpinets, Tatiana V; Chang, Christopher H; Bar-Peled, Maor
2012-01-01
Understanding how cellular metabolism works and is regulated requires that the underlying biochemical pathways be adequately represented and integrated with large metabolomic data sets to establish a robust network model. Genetically engineering energy crops to be less recalcitrant to saccharification requires detailed knowledge of plant polysaccharide structures and a thorough understanding of the metabolic pathways involved in forming and regulating cell-wall synthesis. Nucleotide-sugars are building blocks for synthesis of cell wall polysaccharides. The biosynthesis of nucleotide-sugars is catalyzed by a multitude of enzymes that reside in different subcellular organelles, and precise representation of these pathways requires accurate capture of this biological compartmentalization. The lack of simple localization cues in genomic sequence data and annotations however leads to missing compartmentalization information for eukaryotes in automatically generated databases, such as the Pathway-Genome Databases (PGDBs) of the SRI Pathway Tools software that drives much biochemical knowledge representation on the internet. In this report, we provide an informal mechanism using the existing Pathway Tools framework to integrate protein and metabolite sub-cellular localization data with the existing representation of the nucleotide-sugar metabolic pathways in a prototype PGDB for Populus trichocarpa. The enhanced pathway representations have been successfully used to map SNP abundance data to individual nucleotide-sugar biosynthetic genes in the PGDB. The manually curated pathway representations are more conducive to the construction of a computational platform that will allow the simulation of natural and engineered nucleotide-sugar precursor fluxes into specific recalcitrant polysaccharide(s). Database URL: The curated Populus PGDB is available in the BESC public portal at http://cricket.ornl.gov/cgi-bin/beocyc_home.cgi and the nucleotide-sugar biosynthetic pathways can be directly accessed at http://cricket.ornl.gov:1555/PTR/new-image?object=SUGAR-NUCLEOTIDES.
Nag, Ambarish; Karpinets, Tatiana V.; Chang, Christopher H.; Bar-Peled, Maor
2012-01-01
Understanding how cellular metabolism works and is regulated requires that the underlying biochemical pathways be adequately represented and integrated with large metabolomic data sets to establish a robust network model. Genetically engineering energy crops to be less recalcitrant to saccharification requires detailed knowledge of plant polysaccharide structures and a thorough understanding of the metabolic pathways involved in forming and regulating cell-wall synthesis. Nucleotide-sugars are building blocks for synthesis of cell wall polysaccharides. The biosynthesis of nucleotide-sugars is catalyzed by a multitude of enzymes that reside in different subcellular organelles, and precise representation of these pathways requires accurate capture of this biological compartmentalization. The lack of simple localization cues in genomic sequence data and annotations however leads to missing compartmentalization information for eukaryotes in automatically generated databases, such as the Pathway-Genome Databases (PGDBs) of the SRI Pathway Tools software that drives much biochemical knowledge representation on the internet. In this report, we provide an informal mechanism using the existing Pathway Tools framework to integrate protein and metabolite sub-cellular localization data with the existing representation of the nucleotide-sugar metabolic pathways in a prototype PGDB for Populus trichocarpa. The enhanced pathway representations have been successfully used to map SNP abundance data to individual nucleotide-sugar biosynthetic genes in the PGDB. The manually curated pathway representations are more conducive to the construction of a computational platform that will allow the simulation of natural and engineered nucleotide-sugar precursor fluxes into specific recalcitrant polysaccharide(s). Database URL: The curated Populus PGDB is available in the BESC public portal at http://cricket.ornl.gov/cgi-bin/beocyc_home.cgi and the nucleotide-sugar biosynthetic pathways can be directly accessed at http://cricket.ornl.gov:1555/PTR/new-image?object=SUGAR-NUCLEOTIDES. PMID:22465851
Burns, Gully A P C; Dasigi, Pradeep; de Waard, Anita; Hovy, Eduard H
2016-01-01
Automated machine-reading biocuration systems typically use sentence-by-sentence information extraction to construct meaning representations for use by curators. This does not directly reflect the typical discourse structure used by scientists to construct an argument from the experimental data available within a article, and is therefore less likely to correspond to representations typically used in biomedical informatics systems (let alone to the mental models that scientists have). In this study, we develop Natural Language Processing methods to locate, extract, and classify the individual passages of text from articles' Results sections that refer to experimental data. In our domain of interest (molecular biology studies of cancer signal transduction pathways), individual articles may contain as many as 30 small-scale individual experiments describing a variety of findings, upon which authors base their overall research conclusions. Our system automatically classifies discourse segments in these texts into seven categories (fact, hypothesis, problem, goal, method, result, implication) with an F-score of 0.68. These segments describe the essential building blocks of scientific discourse to (i) provide context for each experiment, (ii) report experimental details and (iii) explain the data's meaning in context. We evaluate our system on text passages from articles that were curated in molecular biology databases (the Pathway Logic Datum repository, the Molecular Interaction MINT and INTACT databases) linking individual experiments in articles to the type of assay used (coprecipitation, phosphorylation, translocation etc.). We use supervised machine learning techniques on text passages containing unambiguous references to experiments to obtain baseline F1 scores of 0.59 for MINT, 0.71 for INTACT and 0.63 for Pathway Logic. Although preliminary, these results support the notion that targeting information extraction methods to experimental results could provide accurate, automated methods for biocuration. We also suggest the need for finer-grained curation of experimental methods used when constructing molecular biology databases. © The Author(s) 2016. Published by Oxford University Press.
Knowledge-guided fuzzy logic modeling to infer cellular signaling networks from proteomic data
Liu, Hui; Zhang, Fan; Mishra, Shital Kumar; Zhou, Shuigeng; Zheng, Jie
2016-01-01
Modeling of signaling pathways is crucial for understanding and predicting cellular responses to drug treatments. However, canonical signaling pathways curated from literature are seldom context-specific and thus can hardly predict cell type-specific response to external perturbations; purely data-driven methods also have drawbacks such as limited biological interpretability. Therefore, hybrid methods that can integrate prior knowledge and real data for network inference are highly desirable. In this paper, we propose a knowledge-guided fuzzy logic network model to infer signaling pathways by exploiting both prior knowledge and time-series data. In particular, the dynamic time warping algorithm is employed to measure the goodness of fit between experimental and predicted data, so that our method can model temporally-ordered experimental observations. We evaluated the proposed method on a synthetic dataset and two real phosphoproteomic datasets. The experimental results demonstrate that our model can uncover drug-induced alterations in signaling pathways in cancer cells. Compared with existing hybrid models, our method can model feedback loops so that the dynamical mechanisms of signaling networks can be uncovered from time-series data. By calibrating generic models of signaling pathways against real data, our method supports precise predictions of context-specific anticancer drug effects, which is an important step towards precision medicine. PMID:27774993
Kinetic Modeling using BioPAX ontology
Ruebenacker, Oliver; Moraru, Ion. I.; Schaff, James C.; Blinov, Michael L.
2010-01-01
Thousands of biochemical interactions are available for download from curated databases such as Reactome, Pathway Interaction Database and other sources in the Biological Pathways Exchange (BioPAX) format. However, the BioPAX ontology does not encode the necessary information for kinetic modeling and simulation. The current standard for kinetic modeling is the System Biology Markup Language (SBML), but only a small number of models are available in SBML format in public repositories. Additionally, reusing and merging SBML models presents a significant challenge, because often each element has a value only in the context of the given model, and information encoding biological meaning is absent. We describe a software system that enables a variety of operations facilitating the use of BioPAX data to create kinetic models that can be visualized, edited, and simulated using the Virtual Cell (VCell), including improved conversion to SBML (for use with other simulation tools that support this format). PMID:20862270
Xenbase: Core features, data acquisition, and data processing.
James-Zorn, Christina; Ponferrada, Virgillio G; Burns, Kevin A; Fortriede, Joshua D; Lotay, Vaneet S; Liu, Yu; Brad Karpinka, J; Karimi, Kamran; Zorn, Aaron M; Vize, Peter D
2015-08-01
Xenbase, the Xenopus model organism database (www.xenbase.org), is a cloud-based, web-accessible resource that integrates the diverse genomic and biological data from Xenopus research. Xenopus frogs are one of the major vertebrate animal models used for biomedical research, and Xenbase is the central repository for the enormous amount of data generated using this model tetrapod. The goal of Xenbase is to accelerate discovery by enabling investigators to make novel connections between molecular pathways in Xenopus and human disease. Our relational database and user-friendly interface make these data easy to query and allows investigators to quickly interrogate and link different data types in ways that would otherwise be difficult, time consuming, or impossible. Xenbase also enhances the value of these data through high-quality gene expression curation and data integration, by providing bioinformatics tools optimized for Xenopus experiments, and by linking Xenopus data to other model organisms and to human data. Xenbase draws in data via pipelines that download data, parse the content, and save them into appropriate files and database tables. Furthermore, Xenbase makes these data accessible to the broader biomedical community by continually providing annotated data updates to organizations such as NCBI, UniProtKB, and Ensembl. Here, we describe our bioinformatics, genome-browsing tools, data acquisition and sharing, our community submitted and literature curation pipelines, text-mining support, gene page features, and the curation of gene nomenclature and gene models. © 2015 Wiley Periodicals, Inc.
BiGG Models: A platform for integrating, standardizing and sharing genome-scale models
King, Zachary A.; Lu, Justin; Drager, Andreas; ...
2015-10-17
In this study, genome-scale metabolic models are mathematically structured knowledge bases that can be used to predict metabolic pathway usage and growth phenotypes. Furthermore, they can generate and test hypotheses when integrated with experimental data. To maximize the value of these models, centralized repositories of high-quality models must be established, models must adhere to established standards and model components must be linked to relevant databases. Tools for model visualization further enhance their utility. To meet these needs, we present BiGG Models (http://bigg.ucsd.edu), a completely redesigned Biochemical, Genetic and Genomic knowledge base. BiGG Models contains more than 75 high-quality, manually-curated genome-scalemore » metabolic models. On the website, users can browse, search and visualize models. BiGG Models connects genome-scale models to genome annotations and external databases. Reaction and metabolite identifiers have been standardized across models to conform to community standards and enable rapid comparison across models. Furthermore, BiGG Models provides a comprehensive application programming interface for accessing BiGG Models with modeling and analysis tools. As a resource for highly curated, standardized and accessible models of metabolism, BiGG Models will facilitate diverse systems biology studies and support knowledge-based analysis of diverse experimental data.« less
BiGG Models: A platform for integrating, standardizing and sharing genome-scale models
King, Zachary A.; Lu, Justin; Dräger, Andreas; Miller, Philip; Federowicz, Stephen; Lerman, Joshua A.; Ebrahim, Ali; Palsson, Bernhard O.; Lewis, Nathan E.
2016-01-01
Genome-scale metabolic models are mathematically-structured knowledge bases that can be used to predict metabolic pathway usage and growth phenotypes. Furthermore, they can generate and test hypotheses when integrated with experimental data. To maximize the value of these models, centralized repositories of high-quality models must be established, models must adhere to established standards and model components must be linked to relevant databases. Tools for model visualization further enhance their utility. To meet these needs, we present BiGG Models (http://bigg.ucsd.edu), a completely redesigned Biochemical, Genetic and Genomic knowledge base. BiGG Models contains more than 75 high-quality, manually-curated genome-scale metabolic models. On the website, users can browse, search and visualize models. BiGG Models connects genome-scale models to genome annotations and external databases. Reaction and metabolite identifiers have been standardized across models to conform to community standards and enable rapid comparison across models. Furthermore, BiGG Models provides a comprehensive application programming interface for accessing BiGG Models with modeling and analysis tools. As a resource for highly curated, standardized and accessible models of metabolism, BiGG Models will facilitate diverse systems biology studies and support knowledge-based analysis of diverse experimental data. PMID:26476456
Classification of Chemical Compounds to Support Complex Queries in a Pathway Database
Weidemann, Andreas; Kania, Renate; Peiss, Christian; Rojas, Isabel
2004-01-01
Data quality in biological databases has become a topic of great discussion. To provide high quality data and to deal with the vast amount of biochemical data, annotators and curators need to be supported by software that carries out part of their work in an (semi-) automatic manner. The detection of errors and inconsistencies is a part that requires the knowledge of domain experts, thus in most cases it is done manually, making it very expensive and time-consuming. This paper presents two tools to partially support the curation of data on biochemical pathways. The tool enables the automatic classification of chemical compounds based on their respective SMILES strings. Such classification allows the querying and visualization of biochemical reactions at different levels of abstraction, according to the level of detail at which the reaction participants are described. Chemical compounds can be classified in a flexible manner based on different criteria. The support of the process of data curation is provided by facilitating the detection of compounds that are identified as different but that are actually the same. This is also used to identify similar reactions and, in turn, pathways. PMID:18629066
How accurate is automated gap filling of metabolic models?
Karp, Peter D; Weaver, Daniel; Latendresse, Mario
2018-06-19
Reaction gap filling is a computational technique for proposing the addition of reactions to genome-scale metabolic models to permit those models to run correctly. Gap filling completes what are otherwise incomplete models that lack fully connected metabolic networks. The models are incomplete because they are derived from annotated genomes in which not all enzymes have been identified. Here we compare the results of applying an automated likelihood-based gap filler within the Pathway Tools software with the results of manually gap filling the same metabolic model. Both gap-filling exercises were applied to the same genome-derived qualitative metabolic reconstruction for Bifidobacterium longum subsp. longum JCM 1217, and to the same modeling conditions - anaerobic growth under four nutrients producing 53 biomass metabolites. The solution computed by the gap-filling program GenDev contained 12 reactions, but closer examination showed that solution was not minimal; two of the twelve reactions can be removed to yield a set of ten reactions that enable model growth. The manually curated solution contained 13 reactions, eight of which were shared with the 12-reaction computed solution. Thus, GenDev achieved recall of 61.5% and precision of 66.6%. These results suggest that although computational gap fillers are populating metabolic models with significant numbers of correct reactions, automatically gap-filled metabolic models also contain significant numbers of incorrect reactions. Our conclusion is that manual curation of gap-filler results is needed to obtain high-accuracy models. Many of the differences between the manual and automatic solutions resulted from using expert biological knowledge to direct the choice of reactions within the curated solution, such as reactions specific to the anaerobic lifestyle of B. longum.
Caspi, Ron; Altman, Tomer; Dale, Joseph M.; Dreher, Kate; Fulcher, Carol A.; Gilham, Fred; Kaipa, Pallavi; Karthikeyan, Athikkattuvalasu S.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Paley, Suzanne; Popescu, Liviu; Pujar, Anuradha; Shearer, Alexander G.; Zhang, Peifen; Karp, Peter D.
2010-01-01
The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism. PMID:19850718
The BioCyc collection of microbial genomes and metabolic pathways.
Karp, Peter D; Billington, Richard; Caspi, Ron; Fulcher, Carol A; Latendresse, Mario; Kothari, Anamika; Keseler, Ingrid M; Krummenacker, Markus; Midford, Peter E; Ong, Quang; Ong, Wai Kit; Paley, Suzanne M; Subhraveti, Pallavi
2017-08-17
BioCyc.org is a microbial genome Web portal that combines thousands of genomes with additional information inferred by computer programs, imported from other databases and curated from the biomedical literature by biologist curators. BioCyc also provides an extensive range of query tools, visualization services and analysis software. Recent advances in BioCyc include an expansion in the content of BioCyc in terms of both the number of genomes and the types of information available for each genome; an expansion in the amount of curated content within BioCyc; and new developments in the BioCyc software tools including redesigned gene/protein pages and metabolite pages; new search tools; a new sequence-alignment tool; a new tool for visualizing groups of related metabolic pathways; and a facility called SmartTables, which enables biologists to perform analyses that previously would have required a programmer's assistance. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
PyPathway: Python Package for Biological Network Analysis and Visualization.
Xu, Yang; Luo, Xiao-Chun
2018-05-01
Life science studies represent one of the biggest generators of large data sets, mainly because of rapid sequencing technological advances. Biological networks including interactive networks and human curated pathways are essential to understand these high-throughput data sets. Biological network analysis offers a method to explore systematically not only the molecular complexity of a particular disease but also the molecular relationships among apparently distinct phenotypes. Currently, several packages for Python community have been developed, such as BioPython and Goatools. However, tools to perform comprehensive network analysis and visualization are still needed. Here, we have developed PyPathway, an extensible free and open source Python package for functional enrichment analysis, network modeling, and network visualization. The network process module supports various interaction network and pathway databases such as Reactome, WikiPathway, STRING, and BioGRID. The network analysis module implements overrepresentation analysis, gene set enrichment analysis, network-based enrichment, and de novo network modeling. Finally, the visualization and data publishing modules enable users to share their analysis by using an easy web application. For package availability, see the first Reference.
Minervini, Giovanni; Panizzoni, Elisabetta; Giollo, Manuel; Masiero, Alessandro; Ferrari, Carlo; Tosatto, Silvio C. E.
2014-01-01
Von Hippel-Lindau (VHL) syndrome is a hereditary condition predisposing to the development of different cancer forms, related to germline inactivation of the homonymous tumor suppressor pVHL. The best characterized function of pVHL is the ubiquitination dependent degradation of Hypoxia Inducible Factor (HIF) via the proteasome. It is also involved in several cellular pathways acting as a molecular hub and interacting with more than 200 different proteins. Molecular details of pVHL plasticity remain in large part unknown. Here, we present a novel manually curated Petri Net (PN) model of the main pVHL functional pathways. The model was built using functional information derived from the literature. It includes all major pVHL functions and is able to credibly reproduce VHL syndrome at the molecular level. The reliability of the PN model also allowed in silico knockout experiments, driven by previous model analysis. Interestingly, PN analysis suggests that the variability of different VHL manifestations is correlated with the concomitant inactivation of different metabolic pathways. PMID:24886840
Minervini, Giovanni; Panizzoni, Elisabetta; Giollo, Manuel; Masiero, Alessandro; Ferrari, Carlo; Tosatto, Silvio C E
2014-01-01
Von Hippel-Lindau (VHL) syndrome is a hereditary condition predisposing to the development of different cancer forms, related to germline inactivation of the homonymous tumor suppressor pVHL. The best characterized function of pVHL is the ubiquitination dependent degradation of Hypoxia Inducible Factor (HIF) via the proteasome. It is also involved in several cellular pathways acting as a molecular hub and interacting with more than 200 different proteins. Molecular details of pVHL plasticity remain in large part unknown. Here, we present a novel manually curated Petri Net (PN) model of the main pVHL functional pathways. The model was built using functional information derived from the literature. It includes all major pVHL functions and is able to credibly reproduce VHL syndrome at the molecular level. The reliability of the PN model also allowed in silico knockout experiments, driven by previous model analysis. Interestingly, PN analysis suggests that the variability of different VHL manifestations is correlated with the concomitant inactivation of different metabolic pathways.
Global Metabolic Reconstruction and Metabolic Gene Evolution in the Cattle Genome
Kim, Woonsu; Park, Hyesun; Seo, Seongwon
2016-01-01
The sequence of cattle genome provided a valuable opportunity to systematically link genetic and metabolic traits of cattle. The objectives of this study were 1) to reconstruct genome-scale cattle-specific metabolic pathways based on the most recent and updated cattle genome build and 2) to identify duplicated metabolic genes in the cattle genome for better understanding of metabolic adaptations in cattle. A bioinformatic pipeline of an organism for amalgamating genomic annotations from multiple sources was updated. Using this, an amalgamated cattle genome database based on UMD_3.1, was created. The amalgamated cattle genome database is composed of a total of 33,292 genes: 19,123 consensus genes between NCBI and Ensembl databases, 8,410 and 5,493 genes only found in NCBI or Ensembl, respectively, and 266 genes from NCBI scaffolds. A metabolic reconstruction of the cattle genome and cattle pathway genome database (PGDB) was also developed using Pathway Tools, followed by an intensive manual curation. The manual curation filled or revised 68 pathway holes, deleted 36 metabolic pathways, and added 23 metabolic pathways. Consequently, the curated cattle PGDB contains 304 metabolic pathways, 2,460 reactions including 2,371 enzymatic reactions, and 4,012 enzymes. Furthermore, this study identified eight duplicated genes in 12 metabolic pathways in the cattle genome compared to human and mouse. Some of these duplicated genes are related with specific hormone biosynthesis and detoxifications. The updated genome-scale metabolic reconstruction is a useful tool for understanding biology and metabolic characteristics in cattle. There has been significant improvements in the quality of cattle genome annotations and the MetaCyc database. The duplicated metabolic genes in the cattle genome compared to human and mouse implies evolutionary changes in the cattle genome and provides a useful information for further research on understanding metabolic adaptations of cattle. PMID:26992093
A human functional protein interaction network and its application to cancer data analysis
2010-01-01
Background One challenge facing biologists is to tease out useful information from massive data sets for further analysis. A pathway-based analysis may shed light by projecting candidate genes onto protein functional relationship networks. We are building such a pathway-based analysis system. Results We have constructed a protein functional interaction network by extending curated pathways with non-curated sources of information, including protein-protein interactions, gene coexpression, protein domain interaction, Gene Ontology (GO) annotations and text-mined protein interactions, which cover close to 50% of the human proteome. By applying this network to two glioblastoma multiforme (GBM) data sets and projecting cancer candidate genes onto the network, we found that the majority of GBM candidate genes form a cluster and are closer than expected by chance, and the majority of GBM samples have sequence-altered genes in two network modules, one mainly comprising genes whose products are localized in the cytoplasm and plasma membrane, and another comprising gene products in the nucleus. Both modules are highly enriched in known oncogenes, tumor suppressors and genes involved in signal transduction. Similar network patterns were also found in breast, colorectal and pancreatic cancers. Conclusions We have built a highly reliable functional interaction network upon expert-curated pathways and applied this network to the analysis of two genome-wide GBM and several other cancer data sets. The network patterns revealed from our results suggest common mechanisms in the cancer biology. Our system should provide a foundation for a network or pathway-based analysis platform for cancer and other diseases. PMID:20482850
New challenges for text mining: mapping between text and manually curated pathways
Oda, Kanae; Kim, Jin-Dong; Ohta, Tomoko; Okanohara, Daisuke; Matsuzaki, Takuya; Tateisi, Yuka; Tsujii, Jun'ichi
2008-01-01
Background Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge. Results To address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus. Conclusions We believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text. PMID:18426550
Canto: an online tool for community literature curation.
Rutherford, Kim M; Harris, Midori A; Lock, Antonia; Oliver, Stephen G; Wood, Valerie
2014-06-15
Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species. Canto code and documentation are available under an Open Source license from http://curation.pombase.org/. Canto is a component of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/). © The Author 2014. Published by Oxford University Press.
Systematic reconstruction of TRANSPATH data into Cell System Markup Language
Nagasaki, Masao; Saito, Ayumu; Li, Chen; Jeong, Euna; Miyano, Satoru
2008-01-01
Background Many biological repositories store information based on experimental study of the biological processes within a cell, such as protein-protein interactions, metabolic pathways, signal transduction pathways, or regulations of transcription factors and miRNA. Unfortunately, it is difficult to directly use such information when generating simulation-based models. Thus, modeling rules for encoding biological knowledge into system-dynamics-oriented standardized formats would be very useful for fully understanding cellular dynamics at the system level. Results We selected the TRANSPATH database, a manually curated high-quality pathway database, which provides a plentiful source of cellular events in humans, mice, and rats, collected from over 31,500 publications. In this work, we have developed 16 modeling rules based on hybrid functional Petri net with extension (HFPNe), which is suitable for graphical representing and simulating biological processes. In the modeling rules, each Petri net element is incorporated with Cell System Ontology to enable semantic interoperability of models. As a formal ontology for biological pathway modeling with dynamics, CSO also defines biological terminology and corresponding icons. By combining HFPNe with the CSO features, it is possible to make TRANSPATH data to simulation-based and semantically valid models. The results are encoded into a biological pathway format, Cell System Markup Language (CSML), which eases the exchange and integration of biological data and models. Conclusion By using the 16 modeling rules, 97% of the reactions in TRANSPATH are converted into simulation-based models represented in CSML. This reconstruction demonstrates that it is possible to use our rules to generate quantitative models from static pathway descriptions. PMID:18570683
Systematic reconstruction of TRANSPATH data into cell system markup language.
Nagasaki, Masao; Saito, Ayumu; Li, Chen; Jeong, Euna; Miyano, Satoru
2008-06-23
Many biological repositories store information based on experimental study of the biological processes within a cell, such as protein-protein interactions, metabolic pathways, signal transduction pathways, or regulations of transcription factors and miRNA. Unfortunately, it is difficult to directly use such information when generating simulation-based models. Thus, modeling rules for encoding biological knowledge into system-dynamics-oriented standardized formats would be very useful for fully understanding cellular dynamics at the system level. We selected the TRANSPATH database, a manually curated high-quality pathway database, which provides a plentiful source of cellular events in humans, mice, and rats, collected from over 31,500 publications. In this work, we have developed 16 modeling rules based on hybrid functional Petri net with extension (HFPNe), which is suitable for graphical representing and simulating biological processes. In the modeling rules, each Petri net element is incorporated with Cell System Ontology to enable semantic interoperability of models. As a formal ontology for biological pathway modeling with dynamics, CSO also defines biological terminology and corresponding icons. By combining HFPNe with the CSO features, it is possible to make TRANSPATH data to simulation-based and semantically valid models. The results are encoded into a biological pathway format, Cell System Markup Language (CSML), which eases the exchange and integration of biological data and models. By using the 16 modeling rules, 97% of the reactions in TRANSPATH are converted into simulation-based models represented in CSML. This reconstruction demonstrates that it is possible to use our rules to generate quantitative models from static pathway descriptions.
PathText: a text mining integrator for biological pathway visualizations
Kemper, Brian; Matsuzaki, Takuya; Matsuoka, Yukiko; Tsuruoka, Yoshimasa; Kitano, Hiroaki; Ananiadou, Sophia; Tsujii, Jun'ichi
2010-01-01
Motivation: Metabolic and signaling pathways are an increasingly important part of organizing knowledge in systems biology. They serve to integrate collective interpretations of facts scattered throughout literature. Biologists construct a pathway by reading a large number of articles and interpreting them as a consistent network, but most of the models constructed currently lack direct links to those articles. Biologists who want to check the original articles have to spend substantial amounts of time to collect relevant articles and identify the sections relevant to the pathway. Furthermore, with the scientific literature expanding by several thousand papers per week, keeping a model relevant requires a continuous curation effort. In this article, we present a system designed to integrate a pathway visualizer, text mining systems and annotation tools into a seamless environment. This will enable biologists to freely move between parts of a pathway and relevant sections of articles, as well as identify relevant papers from large text bases. The system, PathText, is developed by Systems Biology Institute, Okinawa Institute of Science and Technology, National Centre for Text Mining (University of Manchester) and the University of Tokyo, and is being used by groups of biologists from these locations. Contact: brian@monrovian.com. PMID:20529930
Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary
2016-08-01
Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. Copyright © 2016 the American Physiological Society.
MET network in PubMed: a text-mined network visualization and curation system.
Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian
2016-01-01
Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway. © The Author(s) 2016. Published by Oxford University Press.
Curation of inhibitor-target data: process and impact on pathway analysis.
Devidas, Sreenivas
2009-01-01
The past decade has seen a significant emergence in the availability and use of pathway analysis tools. The workflow that is supported by most of the pathway analysis tools is limited to either of the following: a. a network of genes based on the input data set, or b. the resultant network filtered down by a few criteria such as (but not limited to) i. disease association of the genes in the network; ii. targets known to be the target of one or more launched drugs; iii. targets known to be the target of one or more compounds in clinical trials; and iv. targets reasonably known to be potential candidate or clinical biomarkers. Almost all the tools in use today are biased towards the biological side and contain little, if any, information on the chemical inhibitors associated with the components of a given biological network. The limitation resides as follows: The fact that the number of inhibitors that have been published or patented is probably several fold (probably greater than 10-fold) more than the number of published protein-protein interactions. Curation of such data is both expensive and time consuming and could impact ROI significantly. The non-standardization associated with protein and gene names makes mapping reasonably non-straightforward. The number of patented and published inhibitors across target classes increases by over a million per year. Therefore, keeping the databases current becomes a monumental problem. Modifications required in the product architectures to accommodate chemistry-related content. GVK Bio has, over the past 7 years, curated the compound-target data that is necessary for the addition of such compound-centric workflows. This chapter focuses on identification, curation and utility of such data.
Annotating Cancer Variants and Anti-Cancer Therapeutics in Reactome
Milacic, Marija; Haw, Robin; Rothfels, Karen; Wu, Guanming; Croft, David; Hermjakob, Henning; D’Eustachio, Peter; Stein, Lincoln
2012-01-01
Reactome describes biological pathways as chemical reactions that closely mirror the actual physical interactions that occur in the cell. Recent extensions of our data model accommodate the annotation of cancer and other disease processes. First, we have extended our class of protein modifications to accommodate annotation of changes in amino acid sequence and the formation of fusion proteins to describe the proteins involved in disease processes. Second, we have added a disease attribute to reaction, pathway, and physical entity classes that uses disease ontology terms. To support the graphical representation of “cancer” pathways, we have adapted our Pathway Browser to display disease variants and events in a way that allows comparison with the wild type pathway, and shows connections between perturbations in cancer and other biological pathways. The curation of pathways associated with cancer, coupled with our efforts to create other disease-specific pathways, will interoperate with our existing pathway and network analysis tools. Using the Epidermal Growth Factor Receptor (EGFR) signaling pathway as an example, we show how Reactome annotates and presents the altered biological behavior of EGFR variants due to their altered kinase and ligand-binding properties, and the mode of action and specificity of anti-cancer therapeutics. PMID:24213504
The Pathway Coexpression Network: Revealing pathway relationships
Tanzi, Rudolph E.
2018-01-01
A goal of genomics is to understand the relationships between biological processes. Pathways contribute to functional interplay within biological processes through complex but poorly understood interactions. However, limited functional references for global pathway relationships exist. Pathways from databases such as KEGG and Reactome provide discrete annotations of biological processes. Their relationships are currently either inferred from gene set enrichment within specific experiments, or by simple overlap, linking pathway annotations that have genes in common. Here, we provide a unifying interpretation of functional interaction between pathways by systematically quantifying coexpression between 1,330 canonical pathways from the Molecular Signatures Database (MSigDB) to establish the Pathway Coexpression Network (PCxN). We estimated the correlation between canonical pathways valid in a broad context using a curated collection of 3,207 microarrays from 72 normal human tissues. PCxN accounts for shared genes between annotations to estimate significant correlations between pathways with related functions rather than with similar annotations. We demonstrate that PCxN provides novel insight into mechanisms of complex diseases using an Alzheimer’s Disease (AD) case study. PCxN retrieved pathways significantly correlated with an expert curated AD gene list. These pathways have known associations with AD and were significantly enriched for genes independently associated with AD. As a further step, we show how PCxN complements the results of gene set enrichment methods by revealing relationships between enriched pathways, and by identifying additional highly correlated pathways. PCxN revealed that correlated pathways from an AD expression profiling study include functional clusters involved in cell adhesion and oxidative stress. PCxN provides expanded connections to pathways from the extracellular matrix. PCxN provides a powerful new framework for interrogation of global pathway relationships. Comprehensive exploration of PCxN can be performed at http://pcxn.org/. PMID:29554099
Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar
2015-04-01
The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.
The DrugAge database of aging-related drugs.
Barardo, Diogo; Thornton, Daniel; Thoppil, Harikrishnan; Walsh, Michael; Sharifi, Samim; Ferreira, Susana; Anžič, Andreja; Fernandes, Maria; Monteiro, Patrick; Grum, Tjaša; Cordeiro, Rui; De-Souza, Evandro Araújo; Budovsky, Arie; Araujo, Natali; Gruber, Jan; Petrascheck, Michael; Fraifeld, Vadim E; Zhavoronkov, Alexander; Moskalev, Alexey; de Magalhães, João Pedro
2017-06-01
Aging is a major worldwide medical challenge. Not surprisingly, identifying drugs and compounds that extend lifespan in model organisms is a growing research area. Here, we present DrugAge (http://genomics.senescence.info/drugs/), a curated database of lifespan-extending drugs and compounds. At the time of writing, DrugAge contains 1316 entries featuring 418 different compounds from studies across 27 model organisms, including worms, flies, yeast and mice. Data were manually curated from 324 publications. Using drug-gene interaction data, we also performed a functional enrichment analysis of targets of lifespan-extending drugs. Enriched terms include various functional categories related to glutathione and antioxidant activity, ion transport and metabolic processes. In addition, we found a modest but significant overlap between targets of lifespan-extending drugs and known aging-related genes, suggesting that some but not most aging-related pathways have been targeted pharmacologically in longevity studies. DrugAge is freely available online for the scientific community and will be an important resource for biogerontologists. © 2017 The Authors. Aging Cell published by the Anatomical Society and John Wiley & Sons Ltd.
Morris, Melody K.; Saez-Rodriguez, Julio; Clarke, David C.; Sorger, Peter K.; Lauffenburger, Douglas A.
2011-01-01
Predictive understanding of cell signaling network operation based on general prior knowledge but consistent with empirical data in a specific environmental context is a current challenge in computational biology. Recent work has demonstrated that Boolean logic can be used to create context-specific network models by training proteomic pathway maps to dedicated biochemical data; however, the Boolean formalism is restricted to characterizing protein species as either fully active or inactive. To advance beyond this limitation, we propose a novel form of fuzzy logic sufficiently flexible to model quantitative data but also sufficiently simple to efficiently construct models by training pathway maps on dedicated experimental measurements. Our new approach, termed constrained fuzzy logic (cFL), converts a prior knowledge network (obtained from literature or interactome databases) into a computable model that describes graded values of protein activation across multiple pathways. We train a cFL-converted network to experimental data describing hepatocytic protein activation by inflammatory cytokines and demonstrate the application of the resultant trained models for three important purposes: (a) generating experimentally testable biological hypotheses concerning pathway crosstalk, (b) establishing capability for quantitative prediction of protein activity, and (c) prediction and understanding of the cytokine release phenotypic response. Our methodology systematically and quantitatively trains a protein pathway map summarizing curated literature to context-specific biochemical data. This process generates a computable model yielding successful prediction of new test data and offering biological insight into complex datasets that are difficult to fully analyze by intuition alone. PMID:21408212
Reconstruction of metabolic pathways for the cattle genome
Seo, Seongwon; Lewin, Harris A
2009-01-01
Background Metabolic reconstruction of microbial, plant and animal genomes is a necessary step toward understanding the evolutionary origins of metabolism and species-specific adaptive traits. The aims of this study were to reconstruct conserved metabolic pathways in the cattle genome and to identify metabolic pathways with missing genes and proteins. The MetaCyc database and PathwayTools software suite were chosen for this work because they are widely used and easy to implement. Results An amalgamated cattle genome database was created using the NCBI and Ensembl cattle genome databases (based on build 3.1) as data sources. PathwayTools was used to create a cattle-specific pathway genome database, which was followed by comprehensive manual curation for the reconstruction of metabolic pathways. The curated database, CattleCyc 1.0, consists of 217 metabolic pathways. A total of 64 mammalian-specific metabolic pathways were modified from the reference pathways in MetaCyc, and two pathways previously identified but missing from MetaCyc were added. Comparative analysis of metabolic pathways revealed the absence of mammalian genes for 22 metabolic enzymes whose activity was reported in the literature. We also identified six human metabolic protein-coding genes for which the cattle ortholog is missing from the sequence assembly. Conclusion CattleCyc is a powerful tool for understanding the biology of ruminants and other cetartiodactyl species. In addition, the approach used to develop CattleCyc provides a framework for the metabolic reconstruction of other newly sequenced mammalian genomes. It is clear that metabolic pathway analysis strongly reflects the quality of the underlying genome annotations. Thus, having well-annotated genomes from many mammalian species hosted in BioCyc will facilitate the comparative analysis of metabolic pathways among different species and a systems approach to comparative physiology. PMID:19284618
NASA Astrophysics Data System (ADS)
Williams, J. W.; Grimm, E. C.; Ashworth, A. C.; Blois, J.; Charles, D. F.; Crawford, S.; Davis, E.; Goring, S. J.; Graham, R. W.; Miller, D. A.; Smith, A. J.; Stryker, M.; Uhen, M. D.
2017-12-01
The Neotoma Paleoecology Database supports global change research at the intersection of geology and ecology by providing a high-quality, community-curated data repository for paleoecological data. These data are widely used to study biological responses and feedbacks to past environmental change at local to global scales. The Neotoma data model is flexible and can store multiple kinds of fossil, biogeochemical, or physical variables measured from sedimentary archives. Data additions to Neotoma are growing and include >3.5 million observations, >16,000 datasets, and >8,500 sites. Dataset types include fossil pollen, vertebrates, diatoms, ostracodes, macroinvertebrates, plant macrofossils, insects, testate amoebae, geochronological data, and the recently added organic biomarkers, stable isotopes, and specimen-level data. Neotoma data can be found and retrieved in multiple ways, including the Explorer map-based interface, a RESTful Application Programming Interface, the neotoma R package, and digital object identifiers. Neotoma has partnered with the Paleobiology Database to produce a common data portal for paleobiological data, called the Earth Life Consortium. A new embargo management is designed to allow investigators to put their data into Neotoma and then make use of Neotoma's value-added services. Neotoma's distributed scientific governance model is flexible and scalable, with many open pathways for welcoming new members, data contributors, stewards, and research communities. As the volume and variety of scientific data grow, community-curated data resources such as Neotoma have become foundational infrastructure for big data science.
ADAGE signature analysis: differential expression analysis with data-defined gene sets.
Tan, Jie; Huyck, Matthew; Hu, Dongbo; Zelaya, René A; Hogan, Deborah A; Greene, Casey S
2017-11-22
Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data. Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server ( http://adage.greenelab.com ) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr. We designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community.
Becnel, Lauren B; Ochsner, Scott A; Darlington, Yolanda F; McOwiti, Apollo; Kankanamge, Wasula H; Dehart, Michael; Naumov, Alexey; McKenna, Neil J
2017-04-25
We previously developed a web tool, Transcriptomine, to explore expression profiling data sets involving small-molecule or genetic manipulations of nuclear receptor signaling pathways. We describe advances in biocuration, query interface design, and data visualization that enhance the discovery of uncharacterized biology in these pathways using this tool. Transcriptomine currently contains about 45 million data points encompassing more than 2000 experiments in a reference library of nearly 550 data sets retrieved from public archives and systematically curated. To make the underlying data points more accessible to bench biologists, we classified experimental small molecules and gene manipulations into signaling pathways and experimental tissues and cell lines into physiological systems and organs. Incorporation of these mappings into Transcriptomine enables the user to readily evaluate tissue-specific regulation of gene expression by nuclear receptor signaling pathways. Data points from animal and cell model experiments and from clinical data sets elucidate the roles of nuclear receptor pathways in gene expression events accompanying various normal and pathological cellular processes. In addition, data sets targeting non-nuclear receptor signaling pathways highlight transcriptional cross-talk between nuclear receptors and other signaling pathways. We demonstrate with specific examples how data points that exist in isolation in individual data sets validate each other when connected and made accessible to the user in a single interface. In summary, Transcriptomine allows bench biologists to routinely develop research hypotheses, validate experimental data, or model relationships between signaling pathways, genes, and tissues. Copyright © 2017, American Association for the Advancement of Science.
Exploring Genetic, Genomic, and Phenotypic Data at the Rat Genome Database
Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Dwinell, Melinda R.; Jacob, Howard J.; Shimoyama, Mary
2013-01-01
The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have relevance to human physiology and disease. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model organism database that provides access to a wide variety of curated rat data including disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components for genes, quantitative trait loci, and strains. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat. PMID:23255149
KERIS: kaleidoscope of gene responses to inflammation between species
Li, Peng; Tompkins, Ronald G; Xiao, Wenzhong
2017-01-01
A cornerstone of modern biomedical research is the use of animal models to study disease mechanisms and to develop new therapeutic approaches. In order to help the research community to better explore the similarities and differences of genomic response between human inflammatory diseases and murine models, we developed KERIS: kaleidoscope of gene responses to inflammation between species (available at http://www.igenomed.org/keris/). As of June 2016, KERIS includes comparisons of the genomic response of six human inflammatory diseases (burns, trauma, infection, sepsis, endotoxin and acute respiratory distress syndrome) and matched mouse models, using 2257 curated samples from the Inflammation and the Host Response to Injury Glue Grant studies and other representative studies in Gene Expression Omnibus. A researcher can browse, query, visualize and compare the response patterns of genes, pathways and functional modules across different diseases and corresponding murine models. The database is expected to help biologists choosing models when studying the mechanisms of particular genes and pathways in a disease and prioritizing the translation of findings from disease models into clinical studies. PMID:27789704
Stavrakas, Vassilis; Melas, Ioannis N; Sakellaropoulos, Theodore; Alexopoulos, Leonidas G
2015-01-01
Modeling of signal transduction pathways is instrumental for understanding cells' function. People have been tackling modeling of signaling pathways in order to accurately represent the signaling events inside cells' biochemical microenvironment in a way meaningful for scientists in a biological field. In this article, we propose a method to interrogate such pathways in order to produce cell-specific signaling models. We integrate available prior knowledge of protein connectivity, in a form of a Prior Knowledge Network (PKN) with phosphoproteomic data to construct predictive models of the protein connectivity of the interrogated cell type. Several computational methodologies focusing on pathways' logic modeling using optimization formulations or machine learning algorithms have been published on this front over the past few years. Here, we introduce a light and fast approach that uses a breadth-first traversal of the graph to identify the shortest pathways and score proteins in the PKN, fitting the dependencies extracted from the experimental design. The pathways are then combined through a heuristic formulation to produce a final topology handling inconsistencies between the PKN and the experimental scenarios. Our results show that the algorithm we developed is efficient and accurate for the construction of medium and large scale signaling networks. We demonstrate the applicability of the proposed approach by interrogating a manually curated interaction graph model of EGF/TNFA stimulation against made up experimental data. To avoid the possibility of erroneous predictions, we performed a cross-validation analysis. Finally, we validate that the introduced approach generates predictive topologies, comparable to the ILP formulation. Overall, an efficient approach based on graph theory is presented herein to interrogate protein-protein interaction networks and to provide meaningful biological insights.
Reconstruction of Tissue-Specific Metabolic Networks Using CORDA
Schultz, André; Qutub, Amina A.
2016-01-01
Human metabolism involves thousands of reactions and metabolites. To interpret this complexity, computational modeling becomes an essential experimental tool. One of the most popular techniques to study human metabolism as a whole is genome scale modeling. A key challenge to applying genome scale modeling is identifying critical metabolic reactions across diverse human tissues. Here we introduce a novel algorithm called Cost Optimization Reaction Dependency Assessment (CORDA) to build genome scale models in a tissue-specific manner. CORDA performs more efficiently computationally, shows better agreement to experimental data, and displays better model functionality and capacity when compared to previous algorithms. CORDA also returns reaction associations that can greatly assist in any manual curation to be performed following the automated reconstruction process. Using CORDA, we developed a library of 76 healthy and 20 cancer tissue-specific reconstructions. These reconstructions identified which metabolic pathways are shared across diverse human tissues. Moreover, we identified changes in reactions and pathways that are differentially included and present different capacity profiles in cancer compared to healthy tissues, including up-regulation of folate metabolism, the down-regulation of thiamine metabolism, and tight regulation of oxidative phosphorylation. PMID:26942765
Aung, Hnin W.; Henry, Susan A.
2013-01-01
Abstract Genome-scale metabolic models are built using information from an organism's annotated genome and, correspondingly, information on reactions catalyzed by the set of metabolic enzymes encoded by the genome. These models have been successfully applied to guide metabolic engineering to increase production of metabolites of industrial interest. Congruity between simulated and experimental metabolic behavior is influenced by the accuracy of the representation of the metabolic network in the model. In the interest of applying the consensus model of Saccharomyces cerevisiae metabolism for increased productivity of triglycerides, we manually evaluated the representation of fatty acid, glycerophospholipid, and glycerolipid metabolism in the consensus model (Yeast v6.0). These areas of metabolism were chosen due to their tightly interconnected nature to triglyceride synthesis. Manual curation was facilitated by custom MATLAB functions that return information contained in the model for reactions associated with genes and metabolites within the stated areas of metabolism. Through manual curation, we have identified inconsistencies between information contained in the model and literature knowledge. These inconsistencies include incorrect gene-reaction associations, improper definition of substrates/products in reactions, inappropriate assignments of reaction directionality, nonfunctional β-oxidation pathways, and missing reactions relevant to the synthesis and degradation of triglycerides. Suggestions to amend these inconsistencies in the Yeast v6.0 model can be implemented through a MATLAB script provided in the Supplementary Materials, Supplementary Data S1 (Supplementary Data are available online at www.liebertpub.com/ind). PMID:24678285
Fourches, Denis; Muratov, Eugene; Tropsha, Alexander
2010-01-01
Molecular modelers and cheminformaticians typically analyze experimental data generated by other scientists. Consequently, when it comes to data accuracy, cheminformaticians are always at the mercy of data providers who may inadvertently publish (partially) erroneous data. Thus, dataset curation is crucial for any cheminformatics analysis such as similarity searching, clustering, QSAR modeling, virtual screening, etc., especially nowadays when the availability of chemical datasets in public domain has skyrocketed in recent years. Despite the obvious importance of this preliminary step in the computational analysis of any dataset, there appears to be no commonly accepted guidance or set of procedures for chemical data curation. The main objective of this paper is to emphasize the need for a standardized chemical data curation strategy that should be followed at the onset of any molecular modeling investigation. Herein, we discuss several simple but important steps for cleaning chemical records in a database including the removal of a fraction of the data that cannot be appropriately handled by conventional cheminformatics techniques. Such steps include the removal of inorganic and organometallic compounds, counterions, salts and mixtures; structure validation; ring aromatization; normalization of specific chemotypes; curation of tautomeric forms; and the deletion of duplicates. To emphasize the importance of data curation as a mandatory step in data analysis, we discuss several case studies where chemical curation of the original “raw” database enabled the successful modeling study (specifically, QSAR analysis) or resulted in a significant improvement of model's prediction accuracy. We also demonstrate that in some cases rigorously developed QSAR models could be even used to correct erroneous biological data associated with chemical compounds. We believe that good practices for curation of chemical records outlined in this paper will be of value to all scientists working in the fields of molecular modeling, cheminformatics, and QSAR studies. PMID:20572635
Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B.; Pyysalo, Sampo; Ananiadou, Sophia
2013-01-01
Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23813008
Feldmesser, Ester; Rosenwasser, Shilo; Vardi, Assaf; Ben-Dor, Shifra
2014-02-22
The advent of Next Generation Sequencing technologies and corresponding bioinformatics tools allows the definition of transcriptomes in non-model organisms. Non-model organisms are of great ecological and biotechnological significance, and consequently the understanding of their unique metabolic pathways is essential. Several methods that integrate de novo assembly with genome-based assembly have been proposed. Yet, there are many open challenges in defining genes, particularly where genomes are not available or incomplete. Despite the large numbers of transcriptome assemblies that have been performed, quality control of the transcript building process, particularly on the protein level, is rarely performed if ever. To test and improve the quality of the automated transcriptome reconstruction, we used manually defined and curated genes, several of them experimentally validated. Several approaches to transcript construction were utilized, based on the available data: a draft genome, high quality RNAseq reads, and ESTs. In order to maximize the contribution of the various data, we integrated methods including de novo and genome based assembly, as well as EST clustering. After each step a set of manually curated genes was used for quality assessment of the transcripts. The interplay between the automated pipeline and the quality control indicated which additional processes were required to improve the transcriptome reconstruction. We discovered that E. huxleyi has a very high percentage of non-canonical splice junctions, and relatively high rates of intron retention, which caused unique issues with the currently available tools. While individual tools missed genes and artificially joined overlapping transcripts, combining the results of several tools improved the completeness and quality considerably. The final collection, created from the integration of several quality control and improvement rounds, was compared to the manually defined set both on the DNA and protein levels, and resulted in an improvement of 20% versus any of the read-based approaches alone. To the best of our knowledge, this is the first time that an automated transcript definition is subjected to quality control using manually defined and curated genes and thereafter the process is improved. We recommend using a set of manually curated genes to troubleshoot transcriptome reconstruction.
Data Curation: Improving Environmental Health Data Quality.
Yang, Lin; Li, Jiao; Hou, Li; Qian, Qing
2015-01-01
With the growing recognition of the influence of climate change on human health, scientists' attention to analyzing the relationship between meteorological factors and adverse health effects. However, the paucity of high quality integrated data is one of the great challenges, especially when scientific studies rely on data-intensive computing. This paper aims to design an appropriate curation process to address this problem. We present a data curation workflow that: (i) follows the guidance of DCC Curation Lifecycle Model; (ii) combines manual curation with automatic curation; (iii) and solves environmental health data curation problem. The workflow was applied to a medical knowledge service system and showed that it was capable of improving work efficiency and data quality.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
Keseler, Ingrid M; Mackie, Amanda; Santos-Zavaleta, Alberto; Billington, Richard; Bonavides-Martínez, César; Caspi, Ron; Fulcher, Carol; Gama-Castro, Socorro; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Muñiz-Rascado, Luis; Ong, Quang; Paley, Suzanne; Peralta-Gil, Martin; Subhraveti, Pallavi; Velázquez-Ramírez, David A; Weaver, Daniel; Collado-Vides, Julio; Paulsen, Ian; Karp, Peter D
2017-01-04
EcoCyc (EcoCyc.org) is a freely accessible, comprehensive database that collects and summarizes experimental data for Escherichia coli K-12, the best-studied bacterial model organism. New experimental discoveries about gene products, their function and regulation, new metabolic pathways, enzymes and cofactors are regularly added to EcoCyc. New SmartTable tools allow users to browse collections of related EcoCyc content. SmartTables can also serve as repositories for user- or curator-generated lists. EcoCyc now supports running and modifying E. coli metabolic models directly on the EcoCyc website. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Gramene database: navigating plant comparative genomics resources
USDA-ARS?s Scientific Manuscript database
Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...
Caveat emptor: limitations of the automated reconstruction of metabolic pathways in Plasmodium.
Ginsburg, Hagai
2009-01-01
The functional reconstruction of metabolic pathways from an annotated genome is a tedious and demanding enterprise. Automation of this endeavor using bioinformatics algorithms could cope with the ever-increasing number of sequenced genomes and accelerate the process. Here, the manual reconstruction of metabolic pathways in the functional genomic database of Plasmodium falciparum--Malaria Parasite Metabolic Pathways--is described and compared with pathways generated automatically as they appear in PlasmoCyc, metaSHARK and the Kyoto Encyclopedia for Genes and Genomes. A critical evaluation of this comparison discloses that the automatic reconstruction of pathways generates manifold paths that need an expert manual verification to accept some and reject most others based on manually curated gene annotation.
Integrating publicly-available data to generate computationally ...
The adverse outcome pathway (AOP) framework provides a way of organizing knowledge related to the key biological events that result in a particular health outcome. For the majority of environmental chemicals, the availability of curated pathways characterizing potential toxicity is limited. Methods are needed to assimilate large amounts of available molecular data and quickly generate putative AOPs for further testing and use in hazard assessment. A graph-based workflow was used to facilitate the integration of multiple data types to generate computationally-predicted (cp) AOPs. Edges between graph entities were identified through direct experimental or literature information or computationally inferred using frequent itemset mining. Data from the TG-GATEs and ToxCast programs were used to channel large-scale toxicogenomics information into a cpAOP network (cpAOPnet) of over 20,000 relationships describing connections between chemical treatments, phenotypes, and perturbed pathways measured by differential gene expression and high-throughput screening targets. Sub-networks of cpAOPs for a reference chemical (carbon tetrachloride, CCl4) and outcome (hepatic steatosis) were extracted using the network topology. Comparison of the cpAOP subnetworks to published mechanistic descriptions for both CCl4 toxicity and hepatic steatosis demonstrate that computational approaches can be used to replicate manually curated AOPs and identify pathway targets that lack genomic mar
Colak, Emine; Ustuner, Mehmet Cengiz; Tekin, Neslihan; Colak, Ertugrul; Burukoglu, Dilek; Degirmenci, Irfan; Gunes, Hasan Veysi
2016-01-01
Cynara scolymus is a pharmacologically important medicinal plant containing phenolic acids and flavonoids. Experimental studies indicate antioxidant and hepatoprotective effects of C. scolymus but there have been no studies about therapeutic effects of liver diseases yet. In the present study, hepatocurative effects of C. scolymus leaf extract on carbon tetrachloride (CCl4)-induced oxidative stress and hepatic injury in rats were investigated by serum hepatic enzyme levels, oxidative stress indicator (malondialdehyde-MDA), endogenous antioxidants, DNA fragmentation, p53, caspase 3 and histopathology. Animals were divided into six groups: control, olive oil, CCl4, C. scolymus leaf extract, recovery and curative. CCl4 was administered at a dose of 0.2 mL/kg twice daily on CCl4, recovery and curative groups. Cynara scolymus extract was given orally for 2 weeks at a dose of 1.5 g/kg after CCl4 application on the curative group. Significant decrease of serum alanine-aminotransferase (ALT) and aspartate-aminotransferase (AST) levels were determined in the curative group. MDA levels were significantly lower in the curative group. Significant increase of superoxide dismutase (SOD) and catalase (CAT) activity in the curative group was determined. In the curative group, C. scolymus leaf extract application caused the DNA % fragmentation, p53 and caspase 3 levels of liver tissues towards the normal range. Our results indicated that C. scolymus leaf extract has hepatocurative effects of on CCl4-induced oxidative stress and hepatic injury by reducing lipid peroxidation, providing affected antioxidant systems towards the normal range. It also had positive effects on the pathway of the regulatory mechanism allowing repair of DNA damage on CCl4-induced hepatotoxicity.
CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database
Jia, Baofeng; Raphenya, Amogelang R.; Alcock, Brian; Waglechner, Nicholas; Guo, Peiyao; Tsang, Kara K.; Lago, Briony A.; Dave, Biren M.; Pereira, Sheldon; Sharma, Arjun N.; Doshi, Sachin; Courtot, Mélanie; Lo, Raymond; Williams, Laura E.; Frye, Jonathan G.; Elsayegh, Tariq; Sardar, Daim; Westman, Erin L.; Pawlowski, Andrew C.; Johnson, Timothy A.; Brinkman, Fiona S.L.; Wright, Gerard D.; McArthur, Andrew G.
2017-01-01
The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR. CARD is ontologically structured, model centric, and spans the breadth of AMR drug classes and resistance mechanisms, including intrinsic, mutation-driven and acquired resistance. It is built upon the Antibiotic Resistance Ontology (ARO), a custom built, interconnected and hierarchical controlled vocabulary allowing advanced data sharing and organization. Its design allows the development of novel genome analysis tools, such as the Resistance Gene Identifier (RGI) for resistome prediction from raw genome sequence. Recent improvements include extensive curation of additional reference sequences and mutations, development of a unique Model Ontology and accompanying AMR detection models to power sequence analysis, new visualization tools, and expansion of the RGI for detection of emergent AMR threats. CARD curation is updated monthly based on an interplay of manual literature curation, computational text mining, and genome analysis. PMID:27789705
Reducing Recon 2 for steady-state flux analysis of HEK cell culture.
Quek, Lake-Ee; Dietmair, Stefanie; Hanscho, Michael; Martínez, Verónica S; Borth, Nicole; Nielsen, Lars K
2014-08-20
A representative stoichiometric model is essential to perform metabolic flux analysis (MFA) using experimentally measured consumption (or production) rates as constraints. For Human Embryonic Kidney (HEK) cell culture, there is the opportunity to use an extremely well-curated and annotated human genome-scale model Recon 2 for MFA. Performing MFA using Recon 2 without any modification would have implied that cells have access to all functionality encoded by the genome, which is not realistic. The majority of intracellular fluxes are poorly determined as only extracellular exchange rates are measured. This is compounded by the fact that there is no suitable metabolic objective function to suppress non-specific fluxes. We devised a heuristic to systematically reduce Recon 2 to emphasize flux through core metabolic reactions. This implies that cells would engage these dominant metabolic pathways to grow, and any significant changes in gross metabolic phenotypes would have invoked changes in these pathways. The reduced metabolic model becomes a functionalized version of Recon 2 used for identifying significant metabolic changes in cells by flux analysis. Copyright © 2014 Elsevier B.V. All rights reserved.
SignaLink 2 – a signaling pathway resource with multi-layered regulatory networks
2013-01-01
Background Signaling networks in eukaryotes are made up of upstream and downstream subnetworks. The upstream subnetwork contains the intertwined network of signaling pathways, while the downstream regulatory part contains transcription factors and their binding sites on the DNA as well as microRNAs and their mRNA targets. Currently, most signaling and regulatory databases contain only a subsection of this network, making comprehensive analyses highly time-consuming and dependent on specific data handling expertise. The need for detailed mapping of signaling systems is also supported by the fact that several drug development failures were caused by undiscovered cross-talk or regulatory effects of drug targets. We previously created a uniformly curated signaling pathway resource, SignaLink, to facilitate the analysis of pathway cross-talks. Here, we present SignaLink 2, which significantly extends the coverage and applications of its predecessor. Description We developed a novel concept to integrate and utilize different subsections (i.e., layers) of the signaling network. The multi-layered (onion-like) database structure is made up of signaling pathways, their pathway regulators (e.g., scaffold and endocytotic proteins) and modifier enzymes (e.g., phosphatases, ubiquitin ligases), as well as transcriptional and post-transcriptional regulators of all of these components. The user-friendly website allows the interactive exploration of how each signaling protein is regulated. The customizable download page enables the analysis of any user-specified part of the signaling network. Compared to other signaling resources, distinctive features of SignaLink 2 are the following: 1) it involves experimental data not only from humans but from two invertebrate model organisms, C. elegans and D. melanogaster; 2) combines manual curation with large-scale datasets; 3) provides confidence scores for each interaction; 4) operates a customizable download page with multiple file formats (e.g., BioPAX, Cytoscape, SBML). Non-profit users can access SignaLink 2 free of charge at http://SignaLink.org. Conclusions With SignaLink 2 as a single resource, users can effectively analyze signaling pathways, scaffold proteins, modifier enzymes, transcription factors and miRNAs that are important in the regulation of signaling processes. This integrated resource allows the systems-level examination of how cross-talks and signaling flow are regulated, as well as provide data for cross-species comparisons and drug discovery analyses. PMID:23331499
SignaLink 2 - a signaling pathway resource with multi-layered regulatory networks.
Fazekas, Dávid; Koltai, Mihály; Türei, Dénes; Módos, Dezső; Pálfy, Máté; Dúl, Zoltán; Zsákai, Lilian; Szalay-Bekő, Máté; Lenti, Katalin; Farkas, Illés J; Vellai, Tibor; Csermely, Péter; Korcsmáros, Tamás
2013-01-18
Signaling networks in eukaryotes are made up of upstream and downstream subnetworks. The upstream subnetwork contains the intertwined network of signaling pathways, while the downstream regulatory part contains transcription factors and their binding sites on the DNA as well as microRNAs and their mRNA targets. Currently, most signaling and regulatory databases contain only a subsection of this network, making comprehensive analyses highly time-consuming and dependent on specific data handling expertise. The need for detailed mapping of signaling systems is also supported by the fact that several drug development failures were caused by undiscovered cross-talk or regulatory effects of drug targets. We previously created a uniformly curated signaling pathway resource, SignaLink, to facilitate the analysis of pathway cross-talks. Here, we present SignaLink 2, which significantly extends the coverage and applications of its predecessor. We developed a novel concept to integrate and utilize different subsections (i.e., layers) of the signaling network. The multi-layered (onion-like) database structure is made up of signaling pathways, their pathway regulators (e.g., scaffold and endocytotic proteins) and modifier enzymes (e.g., phosphatases, ubiquitin ligases), as well as transcriptional and post-transcriptional regulators of all of these components. The user-friendly website allows the interactive exploration of how each signaling protein is regulated. The customizable download page enables the analysis of any user-specified part of the signaling network. Compared to other signaling resources, distinctive features of SignaLink 2 are the following: 1) it involves experimental data not only from humans but from two invertebrate model organisms, C. elegans and D. melanogaster; 2) combines manual curation with large-scale datasets; 3) provides confidence scores for each interaction; 4) operates a customizable download page with multiple file formats (e.g., BioPAX, Cytoscape, SBML). Non-profit users can access SignaLink 2 free of charge at http://SignaLink.org. With SignaLink 2 as a single resource, users can effectively analyze signaling pathways, scaffold proteins, modifier enzymes, transcription factors and miRNAs that are important in the regulation of signaling processes. This integrated resource allows the systems-level examination of how cross-talks and signaling flow are regulated, as well as provide data for cross-species comparisons and drug discovery analyses.
Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks.
Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui
2017-01-01
The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.
Guo, Yu-qing; Han, Xin-min; Zhu, Xian-kang; Zhou, Zheng; Ma, Bing-xiang; Zhang, Bao-qing; Li, Yan-ning; Feng, Yu-lin; Xue, Zheng; Wang, Yong-hong; Li, Yi-min; Jiang, Zhi-mei; Xu, Jin-xing; Yue, Wei-zhen; Xiang, Xi-xiong
2015-12-01
To evaluate the application effect of Chinese medical clinical pathway for treating attention-deficit hyperactivity disorder (ADHD), and to provide evidence for further improving clinical pathways. Totally 270 ADHD children patients were recruited and treated at pediatrics clinics of 9 cooperative hospitals from December 2011 to December 2012. The treatment course for all was 3 months. Scores of attention deficit and hyperactivity rating scale, scores of behavior, Conners index of hyperactivity (CIH), and Chinese medical syndrome scores were compared between before and after treatment. The efficacy difference in various sexes, ages, and disease courses were evaluated by judging standards for Chinese medical syndrome and ADHD. Fifteen children patients who entered clinical pathway dropped out, and the rest 255 completed this trial. Compared with before treatment, total scores of attention deficit and hyperactivity rating scale, scores of attention deficit and hyperactivity rating scale, CIH, and Chinese medical syndrome scores obviously decreased (all P < 0.01). The total effective rate in disease efficacy was 87.8% (224/255 cases), and the total effective rate in Chinese medical syndrome curative effect was 87.5% (223/255 cases). The clinical curative effect was not influenced by age, gender, or course of disease when statistically analyzed from judging standards for Chinese medical syndrome or for disease efficacy. Intervention by Chinese medical clinical pathway could improve ADHD patients' symptoms, and its efficacy was not influenced by sex, age, or course of disease.
Guidelines for the functional annotation of microRNAs using the Gene Ontology
D'Eustachio, Peter; Smith, Jennifer R.; Zampetaki, Anna
2016-01-01
MicroRNA regulation of developmental and cellular processes is a relatively new field of study, and the available research data have not been organized to enable its inclusion in pathway and network analysis tools. The association of gene products with terms from the Gene Ontology is an effective method to analyze functional data, but until recently there has been no substantial effort dedicated to applying Gene Ontology terms to microRNAs. Consequently, when performing functional analysis of microRNA data sets, researchers have had to rely instead on the functional annotations associated with the genes encoding microRNA targets. In consultation with experts in the field of microRNA research, we have created comprehensive recommendations for the Gene Ontology curation of microRNAs. This curation manual will enable provision of a high-quality, reliable set of functional annotations for the advancement of microRNA research. Here we describe the key aspects of the work, including development of the Gene Ontology to represent this data, standards for describing the data, and guidelines to support curators making these annotations. The full microRNA curation guidelines are available on the GO Consortium wiki (http://wiki.geneontology.org/index.php/MicroRNA_GO_annotation_manual). PMID:26917558
Advancing the application of systems thinking in health: why cure crowds out prevention.
Bishai, David; Paina, Ligia; Li, Qingfeng; Peters, David H; Hyder, Adnan A
2014-06-16
This paper presents a system dynamics computer simulation model to illustrate unintended consequences of apparently rational allocations to curative and preventive services. A modeled population is subject to only two diseases. Disease A is a curable disease that can be shortened by curative care. Disease B is an instantly fatal but preventable disease. Curative care workers are financed by public spending and private fees to cure disease A. Non-personal, preventive services are delivered by public health workers supported solely by public spending to prevent disease B. Each type of worker tries to tilt the balance of government spending towards their interests. Their influence on the government is proportional to their accumulated revenue. The model demonstrates effects on lost disability-adjusted life years and costs over the course of several epidemics of each disease. Policy interventions are tested including: i) an outside donor rationally donates extra money to each type of disease precisely in proportion to the size of epidemics of each disease; ii) lobbying is eliminated; iii) fees for personal health services are eliminated; iv) the government continually rebalances the funding for prevention by ring-fencing it to protect it from lobbying.The model exhibits a "spend more get less" equilibrium in which higher revenue by the curative sector is used to influence government allocations away from prevention towards cure. Spending more on curing disease A leads paradoxically to a higher overall disease burden of unprevented cases of disease B. This paradoxical behavior of the model can be stopped by eliminating lobbying, eliminating fees for curative services, and ring-fencing public health funding. We have created an artificial system as a laboratory to gain insights about the trade-offs between curative and preventive health allocations, and the effect of indicative policy interventions. The underlying dynamics of this artificial system resemble features of modern health systems where a self-perpetuating industry has grown up around disease-specific curative programs like HIV/AIDS or malaria. The model shows how the growth of curative care services can crowd both fiscal and policy space for the practice of population level prevention work, requiring dramatic interventions to overcome these trends.
A statistical approach to identify, monitor, and manage incomplete curated data sets.
Howe, Douglas G
2018-04-02
Many biological knowledge bases gather data through expert curation of published literature. High data volume, selective partial curation, delays in access, and publication of data prior to the ability to curate it can result in incomplete curation of published data. Knowing which data sets are incomplete and how incomplete they are remains a challenge. Awareness that a data set may be incomplete is important for proper interpretation, to avoiding flawed hypothesis generation, and can justify further exploration of published literature for additional relevant data. Computational methods to assess data set completeness are needed. One such method is presented here. In this work, a multivariate linear regression model was used to identify genes in the Zebrafish Information Network (ZFIN) Database having incomplete curated gene expression data sets. Starting with 36,655 gene records from ZFIN, data aggregation, cleansing, and filtering reduced the set to 9870 gene records suitable for training and testing the model to predict the number of expression experiments per gene. Feature engineering and selection identified the following predictive variables: the number of journal publications; the number of journal publications already attributed for gene expression annotation; the percent of journal publications already attributed for expression data; the gene symbol; and the number of transgenic constructs associated with each gene. Twenty-five percent of the gene records (2483 genes) were used to train the model. The remaining 7387 genes were used to test the model. One hundred and twenty-two and 165 of the 7387 tested genes were identified as missing expression annotations based on their residuals being outside the model lower or upper 95% confidence interval respectively. The model had precision of 0.97 and recall of 0.71 at the negative 95% confidence interval and precision of 0.76 and recall of 0.73 at the positive 95% confidence interval. This method can be used to identify data sets that are incompletely curated, as demonstrated using the gene expression data set from ZFIN. This information can help both database resources and data consumers gauge when it may be useful to look further for published data to augment the existing expertly curated information.
The transprofessional model: blending intents in terminal care of AIDS.
Cherin, D A; Simmons, W J; Hillary, K
1998-01-01
Current terminal care services present dying patients and their families with a dichotomy in service delivery and the intent care between curative treatments and palliative treatments. This arbitrary dichotomy reduces patients' quality of life in many cases and robs patients and families of benefiting from the psychosocial aspects of treatment until the last few weeks of life. This article presents a blended model of care, the Transprofessional Model, in which patients receive both curative and palliative service throughout their care process. The blended intent model differs from traditional home care in that services are provided by a care coordination team composed of nurses and social workers; the traditional model of care is often case managed by a single, registered nurse. The combination of the multi-disciplinary approach to care coordination and training in both curative and palliative services in the Transprofessional Model demonstrates that this blended model of care produces a bio-psychosocial focus to terminal care as compared to a primary focus on curative services present in the traditional model of home care.
Medullary thyroid cancer: the functions of raf-1 and human achaete-scute homologue-1.
Chen, Herbert; Kunnimalaiyaan, Muthusamy; Van Gompel, Jamie J
2005-06-01
Medullary thyroid cancer (MTC) is a prototypic neuroendocrine tumor of the thyroid C cells. Other than surgery, there are no curative therapies for MTC. In this review, we detail recent studies that suggest that targeting specific signaling pathways may be a viable strategy to control MTC tumor progression. Specifically, we discuss the role of the raf-1 and achaete-scute homologue-1 pathways in the MTC tumor growth and differentiation.
Literature Mining of Pathogenesis-Related Proteins in Human Pathogens for Database Annotation
2009-10-01
Salmonella , and Shigella. In most cases the host is human, but may also include other mammal species. 2. Negative literature set of PH-PPIs. Of...cis.udel.edu The objective of Gallus Reactome is to provide a curated set of metabolic and signaling pathways for the chicken . To assist annotators...interested in papers that document pathways in the chicken , abstracts are classified according to the species that were the source of the experimental
Ory, Benjamin; Charrier, Céline; Brion, Régis; Blanchard, Frederic; Redini, Françoise; Heymann, Dominique
2014-01-01
Osteosarcoma is the most common primary malignant bone tumour characterized by osteoid production and/or osteolytic lesions of bone. A lack of response to chemotherapeutic treatments shows the importance of exploring new therapeutic methods. Imatinib mesylate (Gleevec, Novartis Pharma), a tyrosine kinase inhibitor, was originally developed for the treatment of chronic myeloid leukemia. Several studies revealed that imatinib mesylate inhibits osteoclast differentiation through the M-CSFR pathway and activates osteoblast differentiation through PDGFR pathway, two key cells involved in the vicious cycle controlling the tumour development. The present study investigated the in vitro effects of imatinib mesylate on the proliferation, apoptosis, cell cycle, and migration ability of five osteosarcoma cell lines (human: MG-63, HOS; rat: OSRGA; mice: MOS-J, POS-1). Imatinib mesylate was also assessed as a curative and preventive treatment in two syngenic osteosarcoma models: MOS-J (mixed osteoblastic/osteolytic osteosarcoma) and POS-1 (undifferentiated osteosarcoma). Imatinib mesylate exhibited a dose-dependent anti-proliferative effect in all cell lines studied. The drug induced a G0/G1 cell cycle arrest in most cell lines, except for POS-1 and HOS cells that were blocked in the S phase. In addition, imatinib mesylate induced cell death and strongly inhibited osteosarcoma cell migration. In the MOS-J osteosarcoma model, oral administration of imatinib mesylate significantly inhibited the tumour development in both preventive and curative approaches. A phospho-receptor tyrosine kinase array kit revealed that PDGFRα, among 7 other receptors (PDFGFRβ, Axl, RYK, EGFR, EphA2 and 10, IGF1R), appears as one of the main molecular targets for imatinib mesylate. In the light of the present study and the literature, it would be particularly interesting to revisit therapeutic evaluation of imatinib mesylate in osteosarcoma according to the tyrosine-kinase receptor status of patients. PMID:24599309
NASA Astrophysics Data System (ADS)
Radhakrishnan, A.; Balaji, V.; Schweitzer, R.; Nikonov, S.; O'Brien, K.; Vahlenkamp, H.; Burger, E. F.
2016-12-01
There are distinct phases in the development cycle of an Earth system model. During the model development phase, scientists make changes to code and parameters and require rapid access to results for evaluation. During the production phase, scientists may make an ensemble of runs with different settings, and produce large quantities of output, that must be further analyzed and quality controlled for scientific papers and submission to international projects such as the Climate Model Intercomparison Project (CMIP). During this phase, provenance is a key concern:being able to track back from outputs to inputs. We will discuss one of the paths taken at GFDL in delivering tools across this lifecycle, offering on-demand analysis of data by integrating the use of GFDL's in-house FRE-Curator, Unidata's THREDDS and NOAA PMEL's Live Access Servers (LAS).Experience over this lifecycle suggests that a major difficulty in developing analysis capabilities is only partially the scientific content, but often devoted to answering the questions "where is the data?" and "how do I get to it?". "FRE-Curator" is the name of a database-centric paradigm used at NOAA GFDL to ingest information about the model runs into an RDBMS (Curator database). The components of FRE-Curator are integrated into Flexible Runtime Environment workflow and can be invoked during climate model simulation. The front end to FRE-Curator, known as the Model Development Database Interface (MDBI) provides an in-house web-based access to GFDL experiments: metadata, analysis output and more. In order to provide on-demand visualization, MDBI uses Live Access Servers which is a highly configurable web server designed to provide flexible access to geo-referenced scientific data, that makes use of OPeNDAP. Model output saved in GFDL's tape archive, the size of the database and experiments, continuous model development initiatives with more dynamic configurations add complexity and challenges in providing an on-demand visualization experience to our GFDL users.
Integrated analysis of breast cancer cell lines reveals unique signaling pathways
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heiser, Laura M.; Wang, Nicholas J.; Talcott, Carolyn L.
Cancer is a heterogeneous disease resulting from the accumulation of genetic defects that negatively impact control of cell division, motility, adhesion and apoptosis. Deregulation in signaling along the EGFR-MAPK pathway is common in breast cancer, though the manner in which deregulation occurs varies between both individuals and cancer subtypes. We were interested in identifying subnetworks within the EGFR-MAPK pathway that are similarly deregulated across subsets of breast cancers. To that end, we mapped genomic, transcriptional and proteomic profiles for 30 breast cancer cell lines onto a curated Pathway Logic symbolic systems model of EGFR-MEK signaling. This model was comprised ofmore » 539 molecular states and 396 rules governing signaling between active states. We analyzed these models and identified several subtype specific subnetworks, including one that suggested PAK1 is particularly important in regulating the MAPK cascade when it is over-expressed. We hypothesized that PAK1 overexpressing cell lines would have increased sensitivity to MEK inhibitors. We tested this experimentally by measuring quantitative responses of 20 breast cancer cell lines to three MEK inhibitors. We found that PAK1 over-expressing luminal breast cancer cell lines are significantly more sensitive to MEK inhibition as compared to those that express PAK1 at low levels. This indicates that PAK1 over-expression may be a useful clinical marker to identify patient populations that may be sensitive to MEK inhibitors. All together, our results support the utility of symbolic system biology models for identification of therapeutic approaches that will be effective against breast cancer subsets.« less
Integrated analysis of breast cancer cell lines reveals unique signaling pathways.
Heiser, Laura M; Wang, Nicholas J; Talcott, Carolyn L; Laderoute, Keith R; Knapp, Merrill; Guan, Yinghui; Hu, Zhi; Ziyad, Safiyyah; Weber, Barbara L; Laquerre, Sylvie; Jackson, Jeffrey R; Wooster, Richard F; Kuo, Wen Lin; Gray, Joe W; Spellman, Paul T
2009-01-01
Cancer is a heterogeneous disease resulting from the accumulation of genetic defects that negatively impact control of cell division, motility, adhesion and apoptosis. Deregulation in signaling along the EgfR-MAPK pathway is common in breast cancer, though the manner in which deregulation occurs varies between both individuals and cancer subtypes. We were interested in identifying subnetworks within the EgfR-MAPK pathway that are similarly deregulated across subsets of breast cancers. To that end, we mapped genomic, transcriptional and proteomic profiles for 30 breast cancer cell lines onto a curated Pathway Logic symbolic systems model of EgfR-MAPK signaling. This model was composed of 539 molecular states and 396 rules governing signaling between active states. We analyzed these models and identified several subtype-specific subnetworks, including one that suggested Pak1 is particularly important in regulating the MAPK cascade when it is over-expressed. We hypothesized that Pak1 over-expressing cell lines would have increased sensitivity to Mek inhibitors. We tested this experimentally by measuring quantitative responses of 20 breast cancer cell lines to three Mek inhibitors. We found that Pak1 over-expressing luminal breast cancer cell lines are significantly more sensitive to Mek inhibition compared to those that express Pak1 at low levels. This indicates that Pak1 over-expression may be a useful clinical marker to identify patient populations that may be sensitive to Mek inhibitors. All together, our results support the utility of symbolic system biology models for identification of therapeutic approaches that will be effective against breast cancer subsets.
PathScore: a web tool for identifying altered pathways in cancer data.
Gaffney, Stephen G; Townsend, Jeffrey P
2016-12-01
PathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects. Web application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at: github.com/sggaffney/pathscore with a GPLv3 license. stephen.gaffney@yale.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Co-Curate: Working with Schools and Communities to Add Value to Open Collections
ERIC Educational Resources Information Center
Cotterill, Simon; Hudson, Martyn; Lloyd, Katherine; Outterside, James; Peterson, John; Coburn, John; Thomas, Ulrike; Tiplady, Lucy; Robinson, Phil; Heslop, Phil
2016-01-01
Co-Curate North East is a cross-disciplinary initiative involving Newcastle University and partner organisations, working with schools and community groups in the North East of England. Co-curation builds on the concept of the "ecomuseum" model for heritage based around a virtual territory, social memory and participative input from the…
Results from rodent and non-rodent prenatal developmental toxicity tests for over 300 chemicals have been curated into the relational database ToxRefDB. These same chemicals have been run in concentration-response format through over 500 high-throughput screening assays assessin...
Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks
Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui
2017-01-01
The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways. PMID:29049295
Advancing the application of systems thinking in health: why cure crowds out prevention
2014-01-01
Introduction This paper presents a system dynamics computer simulation model to illustrate unintended consequences of apparently rational allocations to curative and preventive services. Methods A modeled population is subject to only two diseases. Disease A is a curable disease that can be shortened by curative care. Disease B is an instantly fatal but preventable disease. Curative care workers are financed by public spending and private fees to cure disease A. Non-personal, preventive services are delivered by public health workers supported solely by public spending to prevent disease B. Each type of worker tries to tilt the balance of government spending towards their interests. Their influence on the government is proportional to their accumulated revenue. Results The model demonstrates effects on lost disability-adjusted life years and costs over the course of several epidemics of each disease. Policy interventions are tested including: i) an outside donor rationally donates extra money to each type of disease precisely in proportion to the size of epidemics of each disease; ii) lobbying is eliminated; iii) fees for personal health services are eliminated; iv) the government continually rebalances the funding for prevention by ring-fencing it to protect it from lobbying. The model exhibits a “spend more get less” equilibrium in which higher revenue by the curative sector is used to influence government allocations away from prevention towards cure. Spending more on curing disease A leads paradoxically to a higher overall disease burden of unprevented cases of disease B. This paradoxical behavior of the model can be stopped by eliminating lobbying, eliminating fees for curative services, and ring-fencing public health funding. Conclusions We have created an artificial system as a laboratory to gain insights about the trade-offs between curative and preventive health allocations, and the effect of indicative policy interventions. The underlying dynamics of this artificial system resemble features of modern health systems where a self-perpetuating industry has grown up around disease-specific curative programs like HIV/AIDS or malaria. The model shows how the growth of curative care services can crowd both fiscal and policy space for the practice of population level prevention work, requiring dramatic interventions to overcome these trends. PMID:24935344
Smart Mobility Stakeholders - Curating Urban Data & Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sperling, Joshua
This presentation provides an overview of the curation of urban data and models through engaging SMART mobility stakeholders. SMART Mobility Urban Science Efforts are helping to expose key data sets, models, and roles for the U.S. Department of Energy in engaging across stakeholders to ensure useful insights. This will help to support other Urban Science and broader SMART initiatives.
Building an efficient curation workflow for the Arabidopsis literature corpus
Li, Donghui; Berardini, Tanya Z.; Muller, Robert J.; Huala, Eva
2012-01-01
TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org PMID:23221298
Mansouri, K; Grulke, C M; Richard, A M; Judson, R S; Williams, A J
2016-11-01
The increasing availability of large collections of chemical structures and associated experimental data provides an opportunity to build robust QSAR models for applications in different fields. One common concern is the quality of both the chemical structure information and associated experimental data. Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publicly available PHYSPROP physicochemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest-quality subset of the original dataset was compared with the larger curated and corrected dataset. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publicly available for further usage and integration by the scientific community.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lemaître, Nadine; Liang, Xiaofei; Najeeb, Javaria
ABSTRACT The infectious diseases caused by multidrug-resistant bacteria pose serious threats to humankind. It has been suggested that an antibiotic targeting LpxC of the lipid A biosynthetic pathway in Gram-negative bacteria is a promising strategy for curing Gram-negative bacterial infections. However, experimental proof of this concept is lacking. Here, we describe our discovery and characterization of a biphenylacetylene-based inhibitor of LpxC, an essential enzyme in the biosynthesis of the lipid A component of the outer membrane of Gram-negative bacteria. The compound LPC-069 has no known adverse effects in mice and is effectivein vitroagainst a broad panel of Gram-negative clinical isolates,more » including several multiresistant and extremely drug-resistant strains involved in nosocomial infections. Furthermore, LPC-069 is curative in a murine model of one of the most severe human diseases, bubonic plague, which is caused by the Gram-negative bacteriumYersinia pestis. Our results demonstrate the safety and efficacy of LpxC inhibitors as a new class of antibiotic against fatal infections caused by extremely virulent pathogens. The present findings also highlight the potential of LpxC inhibitors for clinical development as therapeutics for infections caused by multidrug-resistant bacteria. IMPORTANCEThe rapid spread of antimicrobial resistance among Gram-negative bacilli highlights the urgent need for new antibiotics. Here, we describe a new class of antibiotics lacking cross-resistance with conventional antibiotics. The compounds inhibit LpxC, a key enzyme in the lipid A biosynthetic pathway in Gram-negative bacteria, and are activein vitroagainst a broad panel of clinical isolates of Gram-negative bacilli involved in nosocomial and community infections. The present study also constitutes the first demonstration of the curative treatment of bubonic plague by a novel, broad-spectrum antibiotic targeting LpxC. Hence, the data highlight the therapeutic potential of LpxC inhibitors against a wide variety of Gram-negative bacterial infections, including the most severe ones caused byY. pestisand by multidrug-resistant and extensively drug-resistant carbapenemase-producing strains.« less
Lemaître, Nadine; Liang, Xiaofei; Najeeb, Javaria; Lee, Chul-Jin; Titecat, Marie; Leteurtre, Emmanuelle; Simonet, Michel; Toone, Eric J; Zhou, Pei; Sebbane, Florent
2017-07-25
The infectious diseases caused by multidrug-resistant bacteria pose serious threats to humankind. It has been suggested that an antibiotic targeting LpxC of the lipid A biosynthetic pathway in Gram-negative bacteria is a promising strategy for curing Gram-negative bacterial infections. However, experimental proof of this concept is lacking. Here, we describe our discovery and characterization of a biphenylacetylene-based inhibitor of LpxC, an essential enzyme in the biosynthesis of the lipid A component of the outer membrane of Gram-negative bacteria. The compound LPC-069 has no known adverse effects in mice and is effective in vitro against a broad panel of Gram-negative clinical isolates, including several multiresistant and extremely drug-resistant strains involved in nosocomial infections. Furthermore, LPC-069 is curative in a murine model of one of the most severe human diseases, bubonic plague, which is caused by the Gram-negative bacterium Yersinia pestis Our results demonstrate the safety and efficacy of LpxC inhibitors as a new class of antibiotic against fatal infections caused by extremely virulent pathogens. The present findings also highlight the potential of LpxC inhibitors for clinical development as therapeutics for infections caused by multidrug-resistant bacteria. IMPORTANCE The rapid spread of antimicrobial resistance among Gram-negative bacilli highlights the urgent need for new antibiotics. Here, we describe a new class of antibiotics lacking cross-resistance with conventional antibiotics. The compounds inhibit LpxC, a key enzyme in the lipid A biosynthetic pathway in Gram-negative bacteria, and are active in vitro against a broad panel of clinical isolates of Gram-negative bacilli involved in nosocomial and community infections. The present study also constitutes the first demonstration of the curative treatment of bubonic plague by a novel, broad-spectrum antibiotic targeting LpxC. Hence, the data highlight the therapeutic potential of LpxC inhibitors against a wide variety of Gram-negative bacterial infections, including the most severe ones caused by Y. pestis and by multidrug-resistant and extensively drug-resistant carbapenemase-producing strains. Copyright © 2017 Lemaître et al.
TGF-beta signaling proteins and the Protein Ontology.
Arighi, Cecilia N; Liu, Hongfang; Natale, Darren A; Barker, Winona C; Drabkin, Harold; Blake, Judith A; Smith, Barry; Wu, Cathy H
2009-05-06
The Protein Ontology (PRO) is designed as a formal and principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from a classification of proteins on the basis of evolutionary relationships at the homeomorphic level to the representation of the multiple protein forms of a gene, including those resulting from alternative splicing, cleavage and/or post-translational modifications. Focusing specifically on the TGF-beta signaling proteins, we describe the building, curation, usage and dissemination of PRO. PRO is manually curated on the basis of PrePRO, an automatically generated file with content derived from standard protein data sources. Manual curation ensures that the treatment of the protein classes and the internal and external relationships conform to the PRO framework. The current release of PRO is based upon experimental data from mouse and human proteins wherein equivalent protein forms are represented by single terms. In addition to the PRO ontology, the annotation of PRO terms is released as a separate PRO association file, which contains, for each given PRO term, an annotation from the experimentally characterized sub-types as well as the corresponding database identifiers and sequence coordinates. The annotations are added in the form of relationship to other ontologies. Whenever possible, equivalent forms in other species are listed to facilitate cross-species comparison. Splice and allelic variants, gene fusion products and modified protein forms are all represented as entities in the ontology. Therefore, PRO provides for the representation of protein entities and a resource for describing the associated data. This makes PRO useful both for proteomics studies where isoforms and modified forms must be differentiated, and for studies of biological pathways, where representations need to take account of the different ways in which the cascade of events may depend on specific protein modifications. PRO provides a framework for the formal representation of protein classes and protein forms in the OBO Foundry. It is designed to enable data retrieval and integration and machine reasoning at the molecular level of proteins, thereby facilitating cross-species comparisons, pathway analysis, disease modeling and the generation of new hypotheses.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zuniga, Cristal; Li, Chien -Ting; Huelsman, Tyler
The green microalgae Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organismmore » to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Moreover, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine.« less
Zuniga, Cristal; Li, Chien -Ting; Huelsman, Tyler; ...
2016-07-02
The green microalgae Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organismmore » to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Moreover, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine.« less
Zuñiga, Cristal; Li, Chien-Ting; Huelsman, Tyler; Levering, Jennifer; Zielinski, Daniel C; McConnell, Brian O; Long, Christopher P; Knoshaug, Eric P; Guarnieri, Michael T; Antoniewicz, Maciek R; Betenbaugh, Michael J; Zengler, Karsten
2016-09-01
The green microalga Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organism to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Furthermore, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine. © 2016 American Society of Plant Biologists. All rights reserved.
Zuñiga, Cristal; Li, Chien-Ting; Zielinski, Daniel C.; Guarnieri, Michael T.; Antoniewicz, Maciek R.; Zengler, Karsten
2016-01-01
The green microalga Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organism to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Furthermore, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine. PMID:27372244
2013-01-01
Background Molecular biology knowledge can be formalized and systematically represented in a computer-readable form as a comprehensive map of molecular interactions. There exist an increasing number of maps of molecular interactions containing detailed and step-wise description of various cell mechanisms. It is difficult to explore these large maps, to organize discussion of their content and to maintain them. Several efforts were recently made to combine these capabilities together in one environment, and NaviCell is one of them. Results NaviCell is a web-based environment for exploiting large maps of molecular interactions, created in CellDesigner, allowing their easy exploration, curation and maintenance. It is characterized by a combination of three essential features: (1) efficient map browsing based on Google Maps; (2) semantic zooming for viewing different levels of details or of abstraction of the map and (3) integrated web-based blog for collecting community feedback. NaviCell can be easily used by experts in the field of molecular biology for studying molecular entities of interest in the context of signaling pathways and crosstalk between pathways within a global signaling network. NaviCell allows both exploration of detailed molecular mechanisms represented on the map and a more abstract view of the map up to a top-level modular representation. NaviCell greatly facilitates curation, maintenance and updating the comprehensive maps of molecular interactions in an interactive and user-friendly fashion due to an imbedded blogging system. Conclusions NaviCell provides user-friendly exploration of large-scale maps of molecular interactions, thanks to Google Maps and WordPress interfaces, with which many users are already familiar. Semantic zooming which is used for navigating geographical maps is adopted for molecular maps in NaviCell, making any level of visualization readable. In addition, NaviCell provides a framework for community-based curation of maps. PMID:24099179
Kuperstein, Inna; Cohen, David P A; Pook, Stuart; Viara, Eric; Calzone, Laurence; Barillot, Emmanuel; Zinovyev, Andrei
2013-10-07
Molecular biology knowledge can be formalized and systematically represented in a computer-readable form as a comprehensive map of molecular interactions. There exist an increasing number of maps of molecular interactions containing detailed and step-wise description of various cell mechanisms. It is difficult to explore these large maps, to organize discussion of their content and to maintain them. Several efforts were recently made to combine these capabilities together in one environment, and NaviCell is one of them. NaviCell is a web-based environment for exploiting large maps of molecular interactions, created in CellDesigner, allowing their easy exploration, curation and maintenance. It is characterized by a combination of three essential features: (1) efficient map browsing based on Google Maps; (2) semantic zooming for viewing different levels of details or of abstraction of the map and (3) integrated web-based blog for collecting community feedback. NaviCell can be easily used by experts in the field of molecular biology for studying molecular entities of interest in the context of signaling pathways and crosstalk between pathways within a global signaling network. NaviCell allows both exploration of detailed molecular mechanisms represented on the map and a more abstract view of the map up to a top-level modular representation. NaviCell greatly facilitates curation, maintenance and updating the comprehensive maps of molecular interactions in an interactive and user-friendly fashion due to an imbedded blogging system. NaviCell provides user-friendly exploration of large-scale maps of molecular interactions, thanks to Google Maps and WordPress interfaces, with which many users are already familiar. Semantic zooming which is used for navigating geographical maps is adopted for molecular maps in NaviCell, making any level of visualization readable. In addition, NaviCell provides a framework for community-based curation of maps.
Schlatzer, Daniela M.; Dazard, Jean-Eudes; Ewing, Rob M.; Ilchenko, Serguei; Tomcheko, Sara E.; Eid, Saada; Ho, Vincent; Yanik, Greg; Chance, Mark R.; Cooke, Kenneth R.
2012-01-01
Allogeneic hematopoietic stem cell transplantation (SCT) is the only curative therapy for many malignant and nonmalignant conditions. Idiopathic pneumonia syndrome (IPS) is a frequently fatal complication that limits successful outcomes. Preclinical models suggest that IPS represents an immune mediated attack on the lung involving elements of both the adaptive and the innate immune system. However, the etiology of IPS in humans is less well understood. To explore the disease pathway and uncover potential biomarkers of disease, we performed two separate label-free, proteomics experiments defining the plasma protein profiles of allogeneic SCT patients with IPS. Samples obtained from SCT recipients without complications served as controls. The initial discovery study, intended to explore the disease pathway in humans, identified a set of 81 IPS-associated proteins. These data revealed similarities between the known IPS pathways in mice and the condition in humans, in particular in the acute phase response. In addition, pattern recognition pathways were judged to be significant as a function of development of IPS, and from this pathway we chose the lipopolysaccaharide-binding protein (LBP) protein as a candidate molecular diagnostic for IPS, and verified its increase as a function of disease using an ELISA assay. In a separately designed study, we identified protein-based classifiers that could predict, at day 0 of SCT, patients who: 1) progress to IPS and 2) respond to cytokine neutralization therapy. Using cross-validation strategies, we built highly predictive classifier models of both disease progression and therapeutic response. In sum, data generated in this report confirm previous clinical and experimental findings, provide new insights into the pathophysiology of IPS, identify potential molecular classifiers of the condition, and uncover a set of markers potentially of interest for patient stratification as a basis for individualized therapy. PMID:22337588
BioModels: expanding horizons to include more modelling approaches and formats
Nguyen, Tung V N; Graesslin, Martin; Hälke, Robert; Ali, Raza; Schramm, Jochen; Wimalaratne, Sarala M; Kothamachu, Varun B; Rodriguez, Nicolas; Swat, Maciej J; Eils, Jurgen; Eils, Roland; Laibe, Camille; Chelliah, Vijayalakshmi
2018-01-01
Abstract BioModels serves as a central repository of mathematical models representing biological processes. It offers a platform to make mathematical models easily shareable across the systems modelling community, thereby supporting model reuse. To facilitate hosting a broader range of model formats derived from diverse modelling approaches and tools, a new infrastructure for BioModels has been developed that is available at http://www.ebi.ac.uk/biomodels. This new system allows submitting and sharing of a wide range of models with improved support for formats other than SBML. It also offers a version-control backed environment in which authors and curators can work collaboratively to curate models. This article summarises the features available in the current system and discusses the potential benefit they offer to the users over the previous system. In summary, the new portal broadens the scope of models accepted in BioModels and supports collaborative model curation which is crucial for model reproducibility and sharing. PMID:29106614
Álvarez-Yela, Astrid Catalina; Gómez-Cano, Fabio; Zambrano, María Mercedes; Husserl, Johana; Danies, Giovanna; Restrepo, Silvia; González-Barrios, Andrés Fernando
2017-01-01
Soil microbial communities are responsible for a wide range of ecological processes and have an important economic impact in agriculture. Determining the metabolic processes performed by microbial communities is crucial for understanding and managing ecosystem properties. Metagenomic approaches allow the elucidation of the main metabolic processes that determine the performance of microbial communities under different environmental conditions and perturbations. Here we present the first compartmentalized metabolic reconstruction at a metagenomics scale of a microbial ecosystem. This systematic approach conceives a meta-organism without boundaries between individual organisms and allows the in silico evaluation of the effect of agricultural intervention on soils at a metagenomics level. To characterize the microbial ecosystems, topological properties, taxonomic and metabolic profiles, as well as a Flux Balance Analysis (FBA) were considered. Furthermore, topological and optimization algorithms were implemented to carry out the curation of the models, to ensure the continuity of the fluxes between the metabolic pathways, and to confirm the metabolite exchange between subcellular compartments. The proposed models provide specific information about ecosystems that are generally overlooked in non-compartmentalized or non-curated networks, like the influence of transport reactions in the metabolic processes, especially the important effect on mitochondrial processes, as well as provide more accurate results of the fluxes used to optimize the metabolic processes within the microbial community. PMID:28767679
[Care pathway of patients with hepatocellular carcinoma in France: State of play in 2017].
Costentin, Charlotte; Ganne-Carrié, Nathalie; Rousseau, Benoit; Gérolami, René; Barbare, Jean-Claude
2017-09-01
Hepatocellular carcinoma is a major public health problem with one of the highest overall mortality compared to other cancers. The median overall survival in France in a hospital population with hepatocellular carcinoma is 9.4 months. Several publications reported a positive impact of hepatocellular carcinoma screening on diagnosis at an early-stage, eligibility for curative treatment and overall survival. However, the identification of patients to be included in a hepatocellular carcinoma screening program and the application of screening recommendations are not optimal. Other studies suggest a potentially negative impact of delayed diagnosis or treatment initiation on the patient's prognosis. Finally, marked variations between French regions and departments have been described in terms of access to curative treatment and overall survival. In this review article, we propose a state of play of the hepatocellular carcinoma patient's care pathway in France with the aim of identifying potential breaking points with negative impact on prognosis and of developing proposals for improvement. Copyright © 2017 Société Française du Cancer. Published by Elsevier Masson SAS. All rights reserved.
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.
Lu, Zhiyong; Hirschman, Lynette
2012-01-01
Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.
The exploration of contrasting pathways in Triple Negative Breast Cancer (TNBC).
Narrandes, Shavira; Huang, Shujun; Murphy, Leigh; Xu, Wayne
2018-01-04
Triple Negative Breast Cancers (TNBCs) lack the appropriate targets for currently used breast cancer therapies, conferring an aggressive phenotype, more frequent relapse and poorer survival rates. The biological heterogeneity of TNBC complicates the clinical treatment further. We have explored and compared the biological pathways in TNBC and other subtypes of breast cancers, using an in silico approach and the hypothesis that two opposing effects (Yin and Yang) pathways in cancer cells determine the fate of cancer cells. Identifying breast subgroup specific components of these opposing pathways may aid in selecting potential therapeutic targets as well as further classifying the heterogeneous TNBC subtype. Gene expression and patient clinical data from The Cancer Genome Atlas (TCGA) and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) were used for this study. Gene Set Enrichment Analysis (GSEA) was used to identify the more active pathways in cancer (Yin) than in normal and the more active pathways in normal (Yang) than in cancer. The clustering analysis was performed to compare pathways of TNBC with other types of breast cancers. The association of pathway classified TNBC sub-groups to clinical outcomes was tested using Cox regression model. Among 4729 curated canonical pathways in GSEA database, 133 Yin pathways (FDR < 0.05) and 71 Yang pathways (p-value <0.05) were discovered in TNBC. The FOXM1 is the top Yin pathway while PPARα is the top Yang pathway in TNBC. The TNBC and other types of breast cancers showed different pathways enrichment significance profiles. Using top Yin and Yang pathways as classifier, the TNBC can be further subtyped into six sub-groups each having different clinical outcomes. We first reported that the FOMX1 pathway is the most upregulated and the PPARα pathway is the most downregulated pathway in TNBC. These two pathways could be simultaneously targeted in further studies. Also the pathway classifier we performed in this study provided insight into the TNBC heterogeneity.
Role of the NFκB-signaling pathway in cancer
Zhou, Yujuan; Lin, Jingguan; Wang, Heran; Oyang, Linda; Tian, Yutong; Liu, Lu; Su, Min; Wang, Hui; Cao, Deliang; Liao, Qianjin
2018-01-01
Cancer is a group of cells that malignantly grow and proliferate uncontrollably. At present, treatment modes for cancer mainly comprise surgery, chemotherapy, radiotherapy, molecularly targeted therapy, gene therapy, and immunotherapy. However, the curative effects of these treatments have been limited thus far by specific characteristics of tumors. Abnormal activation of signaling pathways is involved in tumor pathogenesis and plays critical roles in growth, progression, and relapse of cancers. Targeted therapies against effectors in oncogenic signaling have improved the outcomes of cancer patients. NFκB is an important signaling pathway involved in pathogenesis and treatment of cancers. Excessive activation of the NFκB-signaling pathway has been documented in various tumor tissues, and studies on this signaling pathway for targeted cancer therapy have become a hot topic. In this review, we update current understanding of the NFκB-signaling pathway in cancer. PMID:29695914
The Hippo pathway in hepatocellular carcinoma: Non-coding RNAs in action.
Shi, Xuan; Zhu, Hai-Rong; Liu, Tao-Tao; Shen, Xi-Zhong; Zhu, Ji-Min
2017-08-01
Hepatocellular carcinoma (HCC) is the sixth most common cancer and the third leading cause of cancer-related death worldwide. However, current strategies curing HCC are far from satisfaction. The Hippo pathway is an evolutionarily conserved tumor suppressive pathway that plays crucial roles in organ size control and tissue homeostasis. Its dysregulation is commonly observed in various types of cancer including HCC. Recently, the prominent role of non-coding RNAs in the Hippo pathway during normal development and neoplastic progression is also emerging in liver. Thus, further investigation into the regulatory network between non-coding RNAs and the Hippo pathway and their connections with HCC may provide new therapeutic avenues towards developing an effective preventative or perhaps curative treatment for HCC. Herein we summarize the role of non-coding RNAs in the Hippo pathway, with an emphasis on their contribution to carcinogenesis, diagnosis, treatment and prognosis of HCC. Copyright © 2017 Elsevier B.V. All rights reserved.
Directly e-mailing authors of newly published papers encourages community curation
Bunt, Stephanie M.; Grumbling, Gary B.; Field, Helen I.; Marygold, Steven J.; Brown, Nicholas H.; Millburn, Gillian H.
2012-01-01
Much of the data within Model Organism Databases (MODs) comes from manual curation of the primary research literature. Given limited funding and an increasing density of published material, a significant challenge facing all MODs is how to efficiently and effectively prioritize the most relevant research papers for detailed curation. Here, we report recent improvements to the triaging process used by FlyBase. We describe an automated method to directly e-mail corresponding authors of new papers, requesting that they list the genes studied and indicate (‘flag’) the types of data described in the paper using an online tool. Based on the author-assigned flags, papers are then prioritized for detailed curation and channelled to appropriate curator teams for full data extraction. The overall response rate has been 44% and the flagging of data types by authors is sufficiently accurate for effective prioritization of papers. In summary, we have established a sustainable community curation program, with the result that FlyBase curators now spend less time triaging and can devote more effort to the specialized task of detailed data extraction. Database URL: http://flybase.org/ PMID:22554788
Benson, Helen E; Sharman, Joanna L; Mpamhanga, Chido P; Parton, Andrew; Southan, Christopher; Harmar, Anthony J; Ghazal, Peter
2017-01-01
Background and Purpose An ever‐growing wealth of information on current drugs and their pharmacological effects is available from online databases. As our understanding of systems biology increases, we have the opportunity to predict, model and quantify how drug combinations can be introduced that outperform conventional single‐drug therapies. Here, we explore the feasibility of such systems pharmacology approaches with an analysis of the mevalonate branch of the cholesterol biosynthesis pathway. Experimental Approach Using open online resources, we assembled a computational model of the mevalonate pathway and compiled a set of inhibitors directed against targets in this pathway. We used computational optimization to identify combination and dose options that show not only maximal efficacy of inhibition on the cholesterol producing branch but also minimal impact on the geranylation branch, known to mediate the side effects of pharmaceutical treatment. Key Results We describe serious impediments to systems pharmacology studies arising from limitations in the data, incomplete coverage and inconsistent reporting. By curating a more complete dataset, we demonstrate the utility of computational optimization for identifying multi‐drug treatments with high efficacy and minimal off‐target effects. Conclusion and Implications We suggest solutions that facilitate systems pharmacology studies, based on the introduction of standards for data capture that increase the power of experimental data. We propose a systems pharmacology workflow for the refinement of data and the generation of future therapeutic hypotheses. PMID:28910500
2014-01-01
Background Uncovering the complex transcriptional regulatory networks (TRNs) that underlie plant and animal development remains a challenge. However, a vast amount of data from public microarray experiments is available, which can be subject to inference algorithms in order to recover reliable TRN architectures. Results In this study we present a simple bioinformatics methodology that uses public, carefully curated microarray data and the mutual information algorithm ARACNe in order to obtain a database of transcriptional interactions. We used data from Arabidopsis thaliana root samples to show that the transcriptional regulatory networks derived from this database successfully recover previously identified root transcriptional modules and to propose new transcription factors for the SHORT ROOT/SCARECROW and PLETHORA pathways. We further show that these networks are a powerful tool to integrate and analyze high-throughput expression data, as exemplified by our analysis of a SHORT ROOT induction time-course microarray dataset, and are a reliable source for the prediction of novel root gene functions. In particular, we used our database to predict novel genes involved in root secondary cell-wall synthesis and identified the MADS-box TF XAL1/AGL12 as an unexpected participant in this process. Conclusions This study demonstrates that network inference using carefully curated microarray data yields reliable TRN architectures. In contrast to previous efforts to obtain root TRNs, that have focused on particular functional modules or tissues, our root transcriptional interactions provide an overview of the transcriptional pathways present in Arabidopsis thaliana roots and will likely yield a plethora of novel hypotheses to be tested experimentally. PMID:24739361
Ye, Chao; Xu, Nan; Dong, Chuan; Ye, Yuannong; Zou, Xuan; Chen, Xiulai; Guo, Fengbiao; Liu, Liming
2017-04-07
Genome-scale metabolic models (GSMMs) constitute a platform that combines genome sequences and detailed biochemical information to quantify microbial physiology at the system level. To improve the unity, integrity, correctness, and format of data in published GSMMs, a consensus IMGMD database was built in the LAMP (Linux + Apache + MySQL + PHP) system by integrating and standardizing 328 GSMMs constructed for 139 microorganisms. The IMGMD database can help microbial researchers download manually curated GSMMs, rapidly reconstruct standard GSMMs, design pathways, and identify metabolic targets for strategies on strain improvement. Moreover, the IMGMD database facilitates the integration of wet-lab and in silico data to gain an additional insight into microbial physiology. The IMGMD database is freely available, without any registration requirements, at http://imgmd.jiangnan.edu.cn/database.
DDRprot: a database of DNA damage response-related proteins.
Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M
2016-01-01
The DNA Damage Response (DDR) signalling network is an essential system that protects the genome's integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.Database URL: http://ddr.cbbio.es. © The Author(s) 2016. Published by Oxford University Press.
Adenosine signaling promotes regeneration of pancreatic β-cells in vivo
Andersson, Olov; Adams, Bruce A.; Yoo, Daniel; Ellis, Gregory C.; Gut, Philipp; Anderson, Ryan M.; German, Michael S.; Stainier, Didier Y. R.
2012-01-01
Diabetes can be controlled with insulin injections, but a curative approach that restores the number of insulin-producing β-cells is still needed. Using a zebrafish model of diabetes, we screened ~7000 small molecules to identify enhancers of β-cell regeneration. The compounds we identified converge on the adenosine signaling pathway and include exogenous agonists and compounds that inhibit degradation of endogenously produced adenosine. The most potent enhancer of β-cell regeneration was the adenosine agonist 5′-N-Ethylcarboxamidoadenosine (NECA), which acting through the adenosine receptor A2aa increased β-cell proliferation and accelerated restoration of normoglycemia in zebrafish. Despite markedly stimulating β-cell proliferation during regeneration, NECA had only a modest effect during development. The proliferative and glucose-lowering effect of NECA was confirmed in diabetic mice, suggesting an evolutionarily conserved role for adenosine in β-cell regeneration. With this whole-organism screen, we identified components of the adenosine pathway that could be therapeutically targeted for the treatment of diabetes. PMID:22608007
Silbergeld, Ellen K.; Contreras, Elizabeth Q.; Hartung, Thomas; Hirsch, Cordula; Hogberg, Helena; Jachak, Ashish C.; Jordan, William; Landsiedel, Robert; Morris, Jeffery; Patri, Anil; Pounds, Joel G.; de Vizcaya Ruiz, Andrea; Shvedova, Anna; Tanguay, Robert; Tatarazako, Norihasa; van Vliet, Erwin; Walker, Nigel J.; Wiesner, Mark; Wilcox, Neil; Zurlo, Joanne
2014-01-01
Summary In October 2010, a group of experts met as part of the transatlantic think tank for toxicology (t4) to exchange ideas about the current status and future of safety testing of nanomaterials. At present, there is no widely accepted path forward to assure appropriate and effective hazard identification for engineered nanomaterials. The group discussed needs for characterization of nanomaterials and identified testing protocols that incorporate the use of innovative alternative whole models such as zebrafish or C. elegans, as well as in vitro or alternative methods to examine specific functional pathways and modes of action. The group proposed elements of a potential testing scheme for nanomaterials that works towards an integrated testing strategy, incorporating the goals of the NRC report Toxicity Testing in the 21st Century: A Vision and a Strategy by focusing on pathways of toxic response, and utilizing an evidence-based strategy for developing the knowledge base for safety assessment. Finally, the group recommended that a reliable, open, curated database be developed that interfaces with existing databases to enable sharing of information. PMID:21993959
The art and science of data curation: Lessons learned from constructing a virtual collection
NASA Astrophysics Data System (ADS)
Bugbee, Kaylin; Ramachandran, Rahul; Maskey, Manil; Gatlin, Patrick
2018-03-01
A digital, or virtual, collection is a value added service developed by libraries that curates information and resources around a topic, theme or organization. Adoption of the virtual collection concept as an Earth science data service improves the discoverability, accessibility and usability of data both within individual data centers but also across data centers and disciplines. In this paper, we introduce a methodology for systematically and rigorously curating Earth science data and information into a cohesive virtual collection. This methodology builds on the geocuration model of searching, selecting and synthesizing Earth science data, metadata and other information into a single and useful collection. We present our experiences curating a virtual collection for one of NASA's twelve Distributed Active Archive Centers (DAACs), the Global Hydrology Resource Center (GHRC), and describe lessons learned as a result of this curation effort. We also provide recommendations and best practices for data centers and data providers who wish to curate virtual collections for the Earth sciences.
Dissecting Germ Cell Metabolism through Network Modeling.
Whitmore, Leanne S; Ye, Ping
2015-01-01
Metabolic pathways are increasingly postulated to be vital in programming cell fate, including stemness, differentiation, proliferation, and apoptosis. The commitment to meiosis is a critical fate decision for mammalian germ cells, and requires a metabolic derivative of vitamin A, retinoic acid (RA). Recent evidence showed that a pulse of RA is generated in the testis of male mice thereby triggering meiotic commitment. However, enzymes and reactions that regulate this RA pulse have yet to be identified. We developed a mouse germ cell-specific metabolic network with a curated vitamin A pathway. Using this network, we implemented flux balance analysis throughout the initial wave of spermatogenesis to elucidate important reactions and enzymes for the generation and degradation of RA. Our results indicate that primary RA sources in the germ cell include RA import from the extracellular region, release of RA from binding proteins, and metabolism of retinal to RA. Further, in silico knockouts of genes and reactions in the vitamin A pathway predict that deletion of Lipe, hormone-sensitive lipase, disrupts the RA pulse thereby causing spermatogenic defects. Examination of other metabolic pathways reveals that the citric acid cycle is the most active pathway. In addition, we discover that fatty acid synthesis/oxidation are the primary energy sources in the germ cell. In summary, this study predicts enzymes, reactions, and pathways important for germ cell commitment to meiosis. These findings enhance our understanding of the metabolic control of germ cell differentiation and will help guide future experiments to improve reproductive health.
Simon, Frank; Bockhorn, Maximilian; Praha, Christian; Baba, Hideo A; Broelsch, Christoph E; Frilling, Andrea; Weber, Frank
2010-04-01
The aim of this study was to elucidate the role of HIF1A expression in hepatocellular carcinoma (HCC) and the corresponding non-malignant liver tissue and to correlate it with the clinical outcome of HCC patients after curative liver resection. HIF1A expression was determined by quantitative RT-PCR in HCC and corresponding non-malignant liver tissue of 53 patients surgically treated for HCC. High-density gene expression analysis and pathway analysis was performed on a selected subset of patients with high and low HIF1A expression in the non-malignant liver tissue. HIF1A over-expression in the apparently non-malignant liver tissue was a predictor of tumor recurrence and survival. The estimated 1-year and 5-year disease-free survival was significantly better in patients with low HIF1A expression in the non-malignant liver tissue when compared to those patients with high HIF1 expression (88.9% vs. 67.9% and 61.0% vs. 22.6%, respectively, p = 0.008). Based on molecular pathway analysis utilizing high-density gene-expression profiling, HIF1A related molecular networks were identified that contained genes involved in cell migration, cell homing, and cell-cell interaction. Our study identified a potential novel mechanism contributing to prognosis of HCC. The deregulation of HIF1A and its related pathways in the apparently non-malignant liver tissue provides for a modulated environment that potentially enhances or allows for HCC recurrence after curative resection.
With the increasing need to leverage data and models to perform cutting edge analyses within the environmental science community, collection and organization of that data into a readily accessible format for consumption is a pressing need. The EPA CompTox chemical dashboard is i...
Lemaître, Nadine; Liang, Xiaofei; Najeeb, Javaria; Lee, Chul-Jin; Titecat, Marie; Leteurtre, Emmanuelle; Simonet, Michel; Toone, Eric J.
2017-01-01
ABSTRACT The infectious diseases caused by multidrug-resistant bacteria pose serious threats to humankind. It has been suggested that an antibiotic targeting LpxC of the lipid A biosynthetic pathway in Gram-negative bacteria is a promising strategy for curing Gram-negative bacterial infections. However, experimental proof of this concept is lacking. Here, we describe our discovery and characterization of a biphenylacetylene-based inhibitor of LpxC, an essential enzyme in the biosynthesis of the lipid A component of the outer membrane of Gram-negative bacteria. The compound LPC-069 has no known adverse effects in mice and is effective in vitro against a broad panel of Gram-negative clinical isolates, including several multiresistant and extremely drug-resistant strains involved in nosocomial infections. Furthermore, LPC-069 is curative in a murine model of one of the most severe human diseases, bubonic plague, which is caused by the Gram-negative bacterium Yersinia pestis. Our results demonstrate the safety and efficacy of LpxC inhibitors as a new class of antibiotic against fatal infections caused by extremely virulent pathogens. The present findings also highlight the potential of LpxC inhibitors for clinical development as therapeutics for infections caused by multidrug-resistant bacteria. PMID:28743813
Gama-Castro, Socorro; Salgado, Heladia; Santos-Zavaleta, Alberto; Ledezma-Tejeida, Daniela; Muñiz-Rascado, Luis; García-Sotelo, Jair Santiago; Alquicira-Hernández, Kevin; Martínez-Flores, Irma; Pannier, Lucia; Castro-Mondragón, Jaime Abraham; Medina-Rivera, Alejandra; Solano-Lira, Hilda; Bonavides-Martínez, César; Pérez-Rueda, Ernesto; Alquicira-Hernández, Shirley; Porrón-Sotelo, Liliana; López-Fuentes, Alejandra; Hernández-Koutoucheva, Anastasia; Moral-Chávez, Víctor Del; Rinaldi, Fabio; Collado-Vides, Julio
2016-01-01
RegulonDB (http://regulondb.ccg.unam.mx) is one of the most useful and important resources on bacterial gene regulation,as it integrates the scattered scientific knowledge of the best-characterized organism, Escherichia coli K-12, in a database that organizes large amounts of data. Its electronic format enables researchers to compare their results with the legacy of previous knowledge and supports bioinformatics tools and model building. Here, we summarize our progress with RegulonDB since our last Nucleic Acids Research publication describing RegulonDB, in 2013. In addition to maintaining curation up-to-date, we report a collection of 232 interactions with small RNAs affecting 192 genes, and the complete repertoire of 189 Elementary Genetic Sensory-Response units (GENSOR units), integrating the signal, regulatory interactions, and metabolic pathways they govern. These additions represent major progress to a higher level of understanding of regulated processes. We have updated the computationally predicted transcription factors, which total 304 (184 with experimental evidence and 120 from computational predictions); we updated our position-weight matrices and have included tools for clustering them in evolutionary families. We describe our semiautomatic strategy to accelerate curation, including datasets from high-throughput experiments, a novel coexpression distance to search for ‘neighborhood’ genes to known operons and regulons, and computational developments. PMID:26527724
MaizeGDB: New tools and resource
USDA-ARS?s Scientific Manuscript database
MaizeGDB, the USDA-ARS genetics and genomics database, is a highly curated, community-oriented informatics service to researchers focused on the crop plant and model organism Zea mays. MaizeGDB facilitates maize research by curating, integrating, and maintaining a database that serves as the central...
Distilling Design Patterns From Agile Curation Case Studies
NASA Astrophysics Data System (ADS)
Benedict, K. K.; Lenhardt, W. C.; Young, J. W.
2016-12-01
In previous work the authors have argued that there is a need to take a new look at the data management lifecycle. Our core argument is that the data management lifecycle needs to be in essence deconstructed and rebuilt. As part of this process we also argue that much can be gained from applying ideas, concepts, and principles from agile software development methods. To be sure we are not arguing for a rote application of these agile software approaches, however, given various trends related to data and technology, it is imperative to update our thinking about how to approach the data management lifecycle, recognize differing project scales, corresponding variations in structure, and alternative models for solving the problems of scientific data curation. In this paper we will describe what we term agile curation design patterns, borrowing the concept of design patterns from the software world and we will present some initial thoughts on agile curation design patterns as informed by a sample of data curation case studies solicited from participants in agile data curation meeting sessions conducted in 2015-16.
Howe, Douglas G.; Bradford, Yvonne M.; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte
2017-01-01
The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, ‘Fish’ records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search. PMID:27899582
The BioGRID interaction database: 2013 update.
Chatr-Aryamontri, Andrew; Breitkreutz, Bobby-Joe; Heinicke, Sven; Boucher, Lorrie; Winter, Andrew; Stark, Chris; Nixon, Julie; Ramage, Lindsay; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Breitkreutz, Ashton; Sellam, Adnane; Chen, Daici; Chang, Christie; Rust, Jennifer; Livstone, Michael; Oughtred, Rose; Dolinski, Kara; Tyers, Mike
2013-01-01
The Biological General Repository for Interaction Datasets (BioGRID: http//thebiogrid.org) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species. As of September 2012, BioGRID houses more than 500 000 manually annotated interactions from more than 30 model organisms. BioGRID maintains complete curation coverage of the literature for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe and the model plant Arabidopsis thaliana. A number of themed curation projects in areas of biomedical importance are also supported. BioGRID has established collaborations and/or shares data records for the annotation of interactions and phenotypes with most major model organism databases, including Saccharomyces Genome Database, PomBase, WormBase, FlyBase and The Arabidopsis Information Resource. BioGRID also actively engages with the text-mining community to benchmark and deploy automated tools to expedite curation workflows. BioGRID data are freely accessible through both a user-defined interactive interface and in batch downloads in a wide variety of formats, including PSI-MI2.5 and tab-delimited files. BioGRID records can also be interrogated and analyzed with a series of new bioinformatics tools, which include a post-translational modification viewer, a graphical viewer, a REST service and a Cytoscape plugin.
Mining and integration of pathway diagrams from imaging data.
Kozhenkov, Sergey; Baitaluk, Michael
2012-03-01
Pathway diagrams from PubMed and World Wide Web (WWW) contain valuable highly curated information difficult to reach without tools specifically designed and customized for the biological semantics and high-content density of the images. There is currently no search engine or tool that can analyze pathway images, extract their pathway components (molecules, genes, proteins, organelles, cells, organs, etc.) and indicate their relationships. Here, we describe a resource of pathway diagrams retrieved from article and web-page images through optical character recognition, in conjunction with data mining and data integration methods. The recognized pathways are integrated into the BiologicalNetworks research environment linking them to a wealth of data available in the BiologicalNetworks' knowledgebase, which integrates data from >100 public data sources and the biomedical literature. Multiple search and analytical tools are available that allow the recognized cellular pathways, molecular networks and cell/tissue/organ diagrams to be studied in the context of integrated knowledge, experimental data and the literature. BiologicalNetworks software and the pathway repository are freely available at www.biologicalnetworks.org. Supplementary data are available at Bioinformatics online.
A comprehensive map of the influenza A virus replication cycle
2013-01-01
Background Influenza is a common infectious disease caused by influenza viruses. Annual epidemics cause severe illnesses, deaths, and economic loss around the world. To better defend against influenza viral infection, it is essential to understand its mechanisms and associated host responses. Many studies have been conducted to elucidate these mechanisms, however, the overall picture remains incompletely understood. A systematic understanding of influenza viral infection in host cells is needed to facilitate the identification of influential host response mechanisms and potential drug targets. Description We constructed a comprehensive map of the influenza A virus (‘IAV’) life cycle (‘FluMap’) by undertaking a literature-based, manual curation approach. Based on information obtained from publicly available pathway databases, updated with literature-based information and input from expert virologists and immunologists, FluMap is currently composed of 960 factors (i.e., proteins, mRNAs etc.) and 456 reactions, and is annotated with ~500 papers and curation comments. In addition to detailing the type of molecular interactions, isolate/strain specific data are also available. The FluMap was built with the pathway editor CellDesigner in standard SBML (Systems Biology Markup Language) format and visualized as an SBGN (Systems Biology Graphical Notation) diagram. It is also available as a web service (online map) based on the iPathways+ system to enable community discussion by influenza researchers. We also demonstrate computational network analyses to identify targets using the FluMap. Conclusion The FluMap is a comprehensive pathway map that can serve as a graphically presented knowledge-base and as a platform to analyze functional interactions between IAV and host factors. Publicly available webtools will allow continuous updating to ensure the most reliable representation of the host-virus interaction network. The FluMap is available at http://www.influenza-x.org/flumap/. PMID:24088197
Iyappan, Anandhi; Kawalia, Shweta Bagewadi; Raschka, Tamara; Hofmann-Apitius, Martin; Senger, Philipp
2016-07-08
Neurodegenerative diseases are incurable and debilitating indications with huge social and economic impact, where much is still to be learnt about the underlying molecular events. Mechanistic disease models could offer a knowledge framework to help decipher the complex interactions that occur at molecular and cellular levels. This motivates the need for the development of an approach integrating highly curated and heterogeneous data into a disease model of different regulatory data layers. Although several disease models exist, they often do not consider the quality of underlying data. Moreover, even with the current advancements in semantic web technology, we still do not have cure for complex diseases like Alzheimer's disease. One of the key reasons accountable for this could be the increasing gap between generated data and the derived knowledge. In this paper, we describe an approach, called as NeuroRDF, to develop an integrative framework for modeling curated knowledge in the area of complex neurodegenerative diseases. The core of this strategy lies in the usage of well curated and context specific data for integration into one single semantic web-based framework, RDF. This increases the probability of the derived knowledge to be novel and reliable in a specific disease context. This infrastructure integrates highly curated data from databases (Bind, IntAct, etc.), literature (PubMed), and gene expression resources (such as GEO and ArrayExpress). We illustrate the effectiveness of our approach by asking real-world biomedical questions that link these resources to prioritize the plausible biomarker candidates. Among the 13 prioritized candidate genes, we identified MIF to be a potential emerging candidate due to its role as a pro-inflammatory cytokine. We additionally report on the effort and challenges faced during generation of such an indication-specific knowledge base comprising of curated and quality-controlled data. Although many alternative approaches have been proposed and practiced for modeling diseases, the semantic web technology is a flexible and well established solution for harmonized aggregation. The benefit of this work, to use high quality and context specific data, becomes apparent in speculating previously unattended biomarker candidates around a well-known mechanism, further leveraged for experimental investigations.
CARD 2017: expansion and model-centric curation of the Comprehensive Antibiotic Resistance Database
USDA-ARS?s Scientific Manuscript database
The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins, and mutations involved in AMR. CARD is ontologi...
A rat model of hypohidrotic ectodermal dysplasia carries a missense mutation in the Edaradd gene
2011-01-01
Background Hypohidrotic ectodermal dysplasia (HED) is a congenital disorder characterized by sparse hair, oligodontia, and inability to sweat. It is caused by mutations in any of three Eda pathway genes: ectodysplasin (Eda), Eda receptor (Edar), and Edar-associated death domain (Edaradd), which encode ligand, receptor, and intracellular adaptor molecule, respectively. The Eda signaling pathway activates NF-κB, which is central to ectodermal differentiation. Although the causative genes and the molecular pathway affecting HED have been identified, no curative treatment for HED has been established. Previously, we found a rat spontaneous mutation that caused defects in hair follicles and named it sparse-and-wavy (swh). Here, we have established the swh rat as the first rat model of HED and successfully identified the swh mutation. Results The swh/swh rat showed sparse hair, abnormal morphology of teeth, and absence of sweat glands. The ectoderm-derived glands, meibomian, preputial, and tongue glands, were absent. We mapped the swh mutation to the most telomeric part of rat Chr 7 and found a Pro153Ser missense mutation in the Edaradd gene. This mutation was located in the death domain of EDARADD, which is crucial for signal transduction and resulted in failure to activate NF-κB. Conclusions These findings suggest that swh is a loss-of-function mutation in the rat Edaradd and indicate that the swh/swh rat would be an excellent animal model of HED that could be used to investigate the pathological basis of the disease and the development of new therapies. PMID:22013926
Tang, Hongwei; Wei, Peng; Duell, Eric J; Risch, Harvey A; Olson, Sara H; Bueno-de-Mesquita, H Bas; Gallinger, Steven; Holly, Elizabeth A; Petersen, Gloria; Bracci, Paige M; McWilliams, Robert R; Jenab, Mazda; Riboli, Elio; Tjønneland, Anne; Boutron-Ruault, Marie Christine; Kaaks, Rudolph; Trichopoulos, Dimitrios; Panico, Salvatore; Sund, Malin; Peeters, Petra H M; Khaw, Kay-Tee; Amos, Christopher I; Li, Donghui
2014-05-01
Cigarette smoking is the best established modifiable risk factor for pancreatic cancer. Genetic factors that underlie smoking-related pancreatic cancer have previously not been examined at the genome-wide level. Taking advantage of the existing Genome-wide association study (GWAS) genotype and risk factor data from the Pancreatic Cancer Case Control Consortium, we conducted a discovery study in 2028 cases and 2109 controls to examine gene-smoking interactions at pathway/gene/single nucleotide polymorphism (SNP) level. Using the likelihood ratio test nested in logistic regression models and ingenuity pathway analysis (IPA), we examined 172 KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, 3 manually curated gene sets, 3 nicotine dependency gene ontology pathways, 17 912 genes and 468 114 SNPs. None of the individual pathway/gene/SNP showed significant interaction with smoking after adjusting for multiple comparisons. Six KEGG pathways showed nominal interactions (P < 0.05) with smoking, and the top two are the pancreatic secretion and salivary secretion pathways (major contributing genes: RAB8A, PLCB and CTRB1). Nine genes, i.e. ZBED2, EXO1, PSG2, SLC36A1, CLSTN1, MTHFSD, FAT2, IL10RB and ATXN2 had P interaction < 0.0005. Five intergenic region SNPs and two SNPs of the EVC and KCNIP4 genes had P interaction < 0.00003. In IPA analysis of genes with nominal interactions with smoking, axonal guidance signaling $$\\left(P=2.12\\times 1{0}^{-7}\\right)$$ and α-adrenergic signaling $$\\left(P=2.52\\times 1{0}^{-5}\\right)$$ genes were significantly overrepresented canonical pathways. Genes contributing to the axon guidance signaling pathway included the SLIT/ROBO signaling genes that were frequently altered in pancreatic cancer. These observations need to be confirmed in additional data set. Once confirmed, it will open a new avenue to unveiling the etiology of smoking-associated pancreatic cancer.
Drug-Path: a database for drug-induced pathways
Zeng, Hui; Cui, Qinghua
2015-01-01
Some databases for drug-associated pathways have been built and are publicly available. However, the pathways curated in most of these databases are drug-action or drug-metabolism pathways. In recent years, high-throughput technologies such as microarray and RNA-sequencing have produced lots of drug-induced gene expression profiles. Interestingly, drug-induced gene expression profile frequently show distinct patterns, indicating that drugs normally induce the activation or repression of distinct pathways. Therefore, these pathways contribute to study the mechanisms of drugs and drug-repurposing. Here, we present Drug-Path, a database of drug-induced pathways, which was generated by KEGG pathway enrichment analysis for drug-induced upregulated genes and downregulated genes based on drug-induced gene expression datasets in Connectivity Map. Drug-Path provides user-friendly interfaces to retrieve, visualize and download the drug-induced pathway data in the database. In addition, the genes deregulated by a given drug are highlighted in the pathways. All data were organized using SQLite. The web site was implemented using Django, a Python web framework. Finally, we believe that this database will be useful for related researches. Database URL: http://www.cuilab.cn/drugpath PMID:26130661
Drug-Path: a database for drug-induced pathways.
Zeng, Hui; Qiu, Chengxiang; Cui, Qinghua
2015-01-01
Some databases for drug-associated pathways have been built and are publicly available. However, the pathways curated in most of these databases are drug-action or drug-metabolism pathways. In recent years, high-throughput technologies such as microarray and RNA-sequencing have produced lots of drug-induced gene expression profiles. Interestingly, drug-induced gene expression profile frequently show distinct patterns, indicating that drugs normally induce the activation or repression of distinct pathways. Therefore, these pathways contribute to study the mechanisms of drugs and drug-repurposing. Here, we present Drug-Path, a database of drug-induced pathways, which was generated by KEGG pathway enrichment analysis for drug-induced upregulated genes and downregulated genes based on drug-induced gene expression datasets in Connectivity Map. Drug-Path provides user-friendly interfaces to retrieve, visualize and download the drug-induced pathway data in the database. In addition, the genes deregulated by a given drug are highlighted in the pathways. All data were organized using SQLite. The web site was implemented using Django, a Python web framework. Finally, we believe that this database will be useful for related researches. © The Author(s) 2015. Published by Oxford University Press.
Seaver, Samuel M. D.; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M. T.; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D.; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D.; Henry, Christopher S.
2014-01-01
The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today’s annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed. PMID:24927599
Seaver, Samuel M D; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M T; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D; Henry, Christopher S
2014-07-01
The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.
Hood, Heather M.; Ocasio, Linda R.; Sachs, Matthew S.; Galagan, James E.
2013-01-01
The filamentous fungus Neurospora crassa played a central role in the development of twentieth-century genetics, biochemistry and molecular biology, and continues to serve as a model organism for eukaryotic biology. Here, we have reconstructed a genome-scale model of its metabolism. This model consists of 836 metabolic genes, 257 pathways, 6 cellular compartments, and is supported by extensive manual curation of 491 literature citations. To aid our reconstruction, we developed three optimization-based algorithms, which together comprise Fast Automated Reconstruction of Metabolism (FARM). These algorithms are: LInear MEtabolite Dilution Flux Balance Analysis (limed-FBA), which predicts flux while linearly accounting for metabolite dilution; One-step functional Pruning (OnePrune), which removes blocked reactions with a single compact linear program; and Consistent Reproduction Of growth/no-growth Phenotype (CROP), which reconciles differences between in silico and experimental gene essentiality faster than previous approaches. Against an independent test set of more than 300 essential/non-essential genes that were not used to train the model, the model displays 93% sensitivity and specificity. We also used the model to simulate the biochemical genetics experiments originally performed on Neurospora by comprehensively predicting nutrient rescue of essential genes and synthetic lethal interactions, and we provide detailed pathway-based mechanistic explanations of our predictions. Our model provides a reliable computational framework for the integration and interpretation of ongoing experimental efforts in Neurospora, and we anticipate that our methods will substantially reduce the manual effort required to develop high-quality genome-scale metabolic models for other organisms. PMID:23935467
Efficient Workflows for Curation of Heterogeneous Data Supporting Modeling of U-Nb Alloy Aging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ward, Logan Timothy; Hackenberg, Robert Errol
These are slides from a presentation summarizing a graduate research associate's summer project. The following topics are covered in these slides: data challenges in materials, aging in U-Nb Alloys, Building an Aging Model, Different Phase Trans. in U-Nb, the Challenge, Storing Materials Data, Example Data Source, Organizing Data: What is a Schema?, What does a "XML Schema" look like?, Our Data Schema: Nice and Simple, Storing Data: Materials Data Curation System (MDCS), Problem with MDCS: Slow Data Entry, Getting Literature into MDCS, Staging Data in Excel Document, Final Result: MDCS Records, Analyzing Image Data, Process for Making TTT Diagram, Bottleneckmore » Number 1: Image Analysis, Fitting a TTP Boundary, Fitting a TTP Curve: Comparable Results, How Does it Compare to Our Data?, Image Analysis Workflow, Curating Hardness Records, Hardness Data: Two Key Decisions, Before Peak Age? - Automation, Interactive Viz, Which Transformation?, Microstructure-Informed Model, Tracking the Entire Process, General Problem with Property Models, Pinyon: Toolkit for Managing Model Creation, Tracking Individual Decisions, Jupyter: Docs and Code in One File, Hardness Analysis Workflow, Workflow for Aging Models, and conclusions.« less
Mouse Tumor Biology (MTB): a database of mouse models for human cancer.
Bult, Carol J; Krupke, Debra M; Begley, Dale A; Richardson, Joel E; Neuhauser, Steven B; Sundberg, John P; Eppig, Janan T
2015-01-01
The Mouse Tumor Biology (MTB; http://tumor.informatics.jax.org) database is a unique online compendium of mouse models for human cancer. MTB provides online access to expertly curated information on diverse mouse models for human cancer and interfaces for searching and visualizing data associated with these models. The information in MTB is designed to facilitate the selection of strains for cancer research and is a platform for mining data on tumor development and patterns of metastases. MTB curators acquire data through manual curation of peer-reviewed scientific literature and from direct submissions by researchers. Data in MTB are also obtained from other bioinformatics resources including PathBase, the Gene Expression Omnibus and ArrayExpress. Recent enhancements to MTB improve the association between mouse models and human genes commonly mutated in a variety of cancers as identified in large-scale cancer genomics studies, provide new interfaces for exploring regions of the mouse genome associated with cancer phenotypes and incorporate data and information related to Patient-Derived Xenograft models of human cancers. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ponce-de-León, Miguel; Montero, Francisco; Peretó, Juli
2013-10-31
Metabolic reconstruction is the computational-based process that aims to elucidate the network of metabolites interconnected through reactions catalyzed by activities assigned to one or more genes. Reconstructed models may contain inconsistencies that appear as gap metabolites and blocked reactions. Although automatic methods for solving this problem have been previously developed, there are many situations where manual curation is still needed. We introduce a general definition of gap metabolite that allows its detection in a straightforward manner. Moreover, a method for the detection of Unconnected Modules, defined as isolated sets of blocked reactions connected through gap metabolites, is proposed. The method has been successfully applied to the curation of iCG238, the genome-scale metabolic model for the bacterium Blattabacterium cuenoti, obligate endosymbiont of cockroaches. We found the proposed approach to be a valuable tool for the curation of genome-scale metabolic models. The outcome of its application to the genome-scale model B. cuenoti iCG238 is a more accurate model version named as B. cuenoti iMP240.
Curating and Integrating Data from Multiple Sources to Support Healthcare Analytics.
Ng, Kenney; Kakkanatt, Chris; Benigno, Michael; Thompson, Clay; Jackson, Margaret; Cahan, Amos; Zhu, Xinxin; Zhang, Ping; Huang, Paul
2015-01-01
As the volume and variety of healthcare related data continues to grow, the analysis and use of this data will increasingly depend on the ability to appropriately collect, curate and integrate disparate data from many different sources. We describe our approach to and highlight our experiences with the development of a robust data collection, curation and integration infrastructure that supports healthcare analytics. This system has been successfully applied to the processing of a variety of data types including clinical data from electronic health records and observational studies, genomic data, microbiomic data, self-reported data from surveys and self-tracked data from wearable devices from over 600 subjects. The curated data is currently being used to support healthcare analytic applications such as data visualization, patient stratification and predictive modeling.
BC4GO: a full-text corpus for the BioCreative IV GO Task
USDA-ARS?s Scientific Manuscript database
Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database (MOD) groups. Due to its manual nature, this task is time-consuming and labor-intensive, and thus considered one of the bottlenecks in literature curation. There have been many previous attempts a...
The systematic annotation of the three main GPCR families in Reactome.
Jassal, Bijay; Jupe, Steven; Caudy, Michael; Birney, Ewan; Stein, Lincoln; Hermjakob, Henning; D'Eustachio, Peter
2010-07-29
Reactome is an open-source, freely available database of human biological pathways and processes. A major goal of our work is to provide an integrated view of cellular signalling processes that spans from ligand-receptor interactions to molecular readouts at the level of metabolic and transcriptional events. To this end, we have built the first catalogue of all human G protein-coupled receptors (GPCRs) known to bind endogenous or natural ligands. The UniProt database has records for 797 proteins classified as GPCRs and sorted into families A/1, B/2 and C/3 on the basis of amino acid sequence. To these records we have added details from the IUPHAR database and our own manual curation of relevant literature to create reactions in which 563 GPCRs bind ligands and also interact with specific G-proteins to initiate signalling cascades. We believe the remaining 234 GPCRs are true orphans. The Reactome GPCR pathway can be viewed as a detailed interactive diagram and can be exported in many forms. It provides a template for the orthology-based inference of GPCR reactions for diverse model organism species, and can be overlaid with protein-protein interaction and gene expression datasets to facilitate overrepresentation studies and other forms of pathway analysis. Database URL: http://www.reactome.org.
Disease model curation improvements at Mouse Genome Informatics
Bello, Susan M.; Richardson, Joel E.; Davis, Allan P.; Wiegers, Thomas C.; Mattingly, Carolyn J.; Dolan, Mary E.; Smith, Cynthia L.; Blake, Judith A.; Eppig, Janan T.
2012-01-01
Optimal curation of human diseases requires an ontology or structured vocabulary that contains terms familiar to end users, is robust enough to support multiple levels of annotation granularity, is limited to disease terms and is stable enough to avoid extensive reannotation following updates. At Mouse Genome Informatics (MGI), we currently use disease terms from Online Mendelian Inheritance in Man (OMIM) to curate mouse models of human disease. While OMIM provides highly detailed disease records that are familiar to many in the medical community, it lacks structure to support multilevel annotation. To improve disease annotation at MGI, we evaluated the merged Medical Subject Headings (MeSH) and OMIM disease vocabulary created by the Comparative Toxicogenomics Database (CTD) project. Overlaying MeSH onto OMIM provides hierarchical access to broad disease terms, a feature missing from the OMIM. We created an extended version of the vocabulary to meet the genetic disease-specific curation needs at MGI. Here we describe our evaluation of the CTD application, the extensions made by MGI and discuss the strengths and weaknesses of this approach. Database URL: http://www.informatics.jax.org/ PMID:22434831
Kim, Sun; Chatr-aryamontri, Andrew; Chang, Christie S.; Oughtred, Rose; Rust, Jennifer; Wilbur, W. John; Comeau, Donald C.; Dolinski, Kara; Tyers, Mike
2017-01-01
A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein–protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future uses of the BioC-BioGRID corpus are detailed in this report. Database URL: http://bioc.sourceforge.net/BioC-BioGRID.html PMID:28077563
Sharing Responsibility for Data Stewardship Between Scientists and Curators
NASA Astrophysics Data System (ADS)
Hedstrom, M. L.
2012-12-01
Data stewardship is becoming increasingly important to support accurate conclusions from new forms of data, integration of and computation across heterogeneous data types, interactions between models and data, replication of results, data governance and long-term archiving. In addition to increasing recognition of the importance of data management, data science, and data curation by US and international scientific agencies, the National Academies of Science Board on Research Data and Information is sponsoring a study on Data Curation Education and Workforce Issues. Effective data stewardship requires a distributed effort among scientists who produce data, IT staff and/or vendors who provide data storage and computational facilities and services, and curators who enhance data quality, manage data governance, provide access to third parties, and assume responsibility for long-term archiving of data. The expertise necessary for scientific data management includes a mix of knowledge of the scientific domain; an understanding of domain data requirements, standards, ontologies and analytical methods; facility with leading edge information technology; and knowledge of data governance, standards, and best practices for long-term preservation and access that rarely are found in a single individual. Rather than developing data science and data curation as new and distinct occupations, this paper examines the set of tasks required for data stewardship. The paper proposes an alternative model that embeds data stewardship in scientific workflows and coordinates hand-offs between instruments, repositories, analytical processing, publishers, distributors, and archives. This model forms the basis for defining knowledge and skill requirements for specific actors in the processes required for data stewardship and the corresponding educational and training needs.
Wu, Honghan; Oellrich, Anika; Girges, Christine; de Bono, Bernard; Hubbard, Tim J P; Dobson, Richard J B
2017-01-01
Neurodegenerative disorders such as Parkinson's and Alzheimer's disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F 1 -measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process. https://github.com/KHP-Informatics/NapEasy. © The Author(s) 2017. Published by Oxford University Press.
Oellrich, Anika; Girges, Christine; de Bono, Bernard; Hubbard, Tim J.P.; Dobson, Richard J.B.
2017-01-01
Abstract Neurodegenerative disorders such as Parkinson’s and Alzheimer’s disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F1-measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process. Database URL: https://github.com/KHP-Informatics/NapEasy PMID:28365743
Temperature-Dependent Estimation of Gibbs Energies Using an Updated Group-Contribution Method.
Du, Bin; Zhang, Zhen; Grubner, Sharon; Yurkovich, James T; Palsson, Bernhard O; Zielinski, Daniel C
2018-06-05
Reaction-equilibrium constants determine the metabolite concentrations necessary to drive flux through metabolic pathways. Group-contribution methods offer a way to estimate reaction-equilibrium constants at wide coverage across the metabolic network. Here, we present an updated group-contribution method with 1) additional curated thermodynamic data used in fitting and 2) capabilities to calculate equilibrium constants as a function of temperature. We first collected and curated aqueous thermodynamic data, including reaction-equilibrium constants, enthalpies of reaction, Gibbs free energies of formation, enthalpies of formation, entropy changes of formation of compounds, and proton- and metal-ion-binding constants. Next, we formulated the calculation of equilibrium constants as a function of temperature and calculated the standard entropy change of formation (Δ f S ∘ ) using a model based on molecular properties. The median absolute error in estimating Δ f S ∘ was 0.013 kJ/K/mol. We also estimated magnesium binding constants for 618 compounds using a linear regression model validated against measured data. We demonstrate the improved performance of the current method (8.17 kJ/mol in median absolute residual) over the current state-of-the-art method (11.47 kJ/mol) in estimating the 185 new reactions added in this work. The efforts here fill in gaps for thermodynamic calculations under various conditions, specifically different temperatures and metal-ion concentrations. These, to our knowledge, new capabilities empower the study of thermodynamic driving forces underlying the metabolic function of organisms living under diverse conditions. Copyright © 2018 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Gama-Castro, Socorro; Salgado, Heladia; Santos-Zavaleta, Alberto; Ledezma-Tejeida, Daniela; Muñiz-Rascado, Luis; García-Sotelo, Jair Santiago; Alquicira-Hernández, Kevin; Martínez-Flores, Irma; Pannier, Lucia; Castro-Mondragón, Jaime Abraham; Medina-Rivera, Alejandra; Solano-Lira, Hilda; Bonavides-Martínez, César; Pérez-Rueda, Ernesto; Alquicira-Hernández, Shirley; Porrón-Sotelo, Liliana; López-Fuentes, Alejandra; Hernández-Koutoucheva, Anastasia; Del Moral-Chávez, Víctor; Rinaldi, Fabio; Collado-Vides, Julio
2016-01-04
RegulonDB (http://regulondb.ccg.unam.mx) is one of the most useful and important resources on bacterial gene regulation,as it integrates the scattered scientific knowledge of the best-characterized organism, Escherichia coli K-12, in a database that organizes large amounts of data. Its electronic format enables researchers to compare their results with the legacy of previous knowledge and supports bioinformatics tools and model building. Here, we summarize our progress with RegulonDB since our last Nucleic Acids Research publication describing RegulonDB, in 2013. In addition to maintaining curation up-to-date, we report a collection of 232 interactions with small RNAs affecting 192 genes, and the complete repertoire of 189 Elementary Genetic Sensory-Response units (GENSOR units), integrating the signal, regulatory interactions, and metabolic pathways they govern. These additions represent major progress to a higher level of understanding of regulated processes. We have updated the computationally predicted transcription factors, which total 304 (184 with experimental evidence and 120 from computational predictions); we updated our position-weight matrices and have included tools for clustering them in evolutionary families. We describe our semiautomatic strategy to accelerate curation, including datasets from high-throughput experiments, a novel coexpression distance to search for 'neighborhood' genes to known operons and regulons, and computational developments. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zhou, Hufeng; Jin, Jingjing; Zhang, Haojun; Yi, Bo; Wozniak, Michal; Wong, Limsoon
2012-01-01
Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and relationship errors in KEGG). We turn complicated and incompatible xml data formats and inconsistent gene and gene relationship representations from different source databases into normalized and unified pathway-gene and pathway-gene pair relationships neatly recorded in simple tab-delimited text format and MySQL tables, which facilitates convenient automatic computation and large-scale referencing in many related studies. IntPath data can be downloaded in text format or MySQL dump. IntPath data can also be retrieved and analyzed conveniently through web service by local programs or through web interface by mouse clicks. Several useful analysis tools are also provided in IntPath. We have overcome in IntPath the issues of compatibility, consistency, and comprehensiveness that often hamper effective use of pathway databases. We have included four organisms in the current release of IntPath. Our methodology and programs described in this work can be easily applied to other organisms; and we will include more model organisms and important pathogens in future releases of IntPath. IntPath maintains regular updates and is freely available at http://compbio.ddns.comp.nus.edu.sg:8080/IntPath.
Ci4SeR--curation interface for semantic resources--evaluation with adverse drug reactions.
Souvignet, Julien; Asfari, Hadyl; Declerck, Gunnar; Lardon, Jérémy; Trombert-Paviot, Béatrice; Jaulent, Marie-Christine; Bousquet, Cédric
2014-01-01
Evaluation and validation have become a crucial problem for the development of semantic resources. We developed Ci4SeR, a Graphical User Interface to optimize the curation work (not taking into account structural aspects), suitable for any type of resource with lightweight description logic. We tested it on OntoADR, an ontology of adverse drug reactions. A single curator has reviewed 326 terms (1020 axioms) in an estimated time of 120 hours (2.71 concepts and 8.5 axioms reviewed per hour) and added 1874 new axioms (15.6 axioms per hour). Compared with previous manual endeavours, the interface allows increasing the speed-rate of reviewed concepts by 68% and axiom addition by 486%. A wider use of Ci4SeR would help semantic resources curation and improve completeness of knowledge modelling.
BioSurfDB: knowledge and algorithms to support biosurfactants and biodegradation studies
Oliveira, Jorge S.; Araújo, Wydemberg; Lopes Sales, Ana Isabela; de Brito Guerra, Alaine; da Silva Araújo, Sinara Carla; de Vasconcelos, Ana Tereza Ribeiro; Agnez-Lima, Lucymara F.; Freitas, Ana Teresa
2015-01-01
Crude oil extraction, transportation and use provoke the contamination of countless ecosystems. Therefore, bioremediation through surfactants mobilization or biodegradation is an important subject, both economically and environmentally. Bioremediation research had a great boost with the recent advances in Metagenomics, as it enabled the sequencing of uncultured microorganisms providing new insights on surfactant-producing and/or oil-degrading bacteria. Many research studies are making available genomic data from unknown organisms obtained from metagenomics analysis of oil-contaminated environmental samples. These new datasets are presently demanding the development of new tools and data repositories tailored for the biological analysis in a context of bioremediation data analysis. This work presents BioSurfDB, www.biosurfdb.org, a curated relational information system integrating data from: (i) metagenomes; (ii) organisms; (iii) biodegradation relevant genes; proteins and their metabolic pathways; (iv) bioremediation experiments results, with specific pollutants treatment efficiencies by surfactant producing organisms; and (v) a biosurfactant-curated list, grouped by producing organism, surfactant name, class and reference. The main goal of this repository is to gather information on the characterization of biological compounds and mechanisms involved in biosurfactant production and/or biodegradation and make it available in a curated way and associated with a number of computational tools to support studies of genomic and metagenomic data. Database URL: www.biosurfdb.org PMID:25833955
Data Curation Education Grounded in Earth Sciences and the Science of Data
NASA Astrophysics Data System (ADS)
Palmer, C. L.
2015-12-01
This presentation looks back over ten years of experience advancing data curation education at two Information Schools, highlighting the vital role of earth science case studies, expertise, and collaborations in development of curriculum and internships. We also consider current data curation practices and workforce demand in data centers in the geosciences, drawing on studies conducted in the Data Curation Education in Research Centers (DCERC) initiative and the Site-Based Data Curation project. Outcomes from this decade of data curation research and education has reinforced the importance of key areas of information science in preparing data professionals to respond to the needs of user communities, provide services across disciplines, invest in standards and interoperability, and promote open data practices. However, a serious void remains in principles to guide education and practice that are distinct to the development of data systems and services that meet both local and global aims. We identify principles emerging from recent empirical studies on the reuse value of data in the earth sciences and propose an approach for advancing data curation education that depends on systematic coordination with data intensive research and propagation of current best practices from data centers into curriculum. This collaborative model can increase both domain-based and cross-disciplinary expertise among data professionals, ultimately improving data systems and services in our universities and data centers while building the new base of knowledge needed for a foundational science of data.
Tappenden, Paul; Chilcott, Jim; Brennan, Alan; Squires, Hazel; Glynne-Jones, Rob; Tappenden, Janine
2013-06-01
To assess the feasibility and value of simulating whole disease and treatment pathways within a single model to provide a common economic basis for informing resource allocation decisions. A patient-level simulation model was developed with the intention of being capable of evaluating multiple topics within National Institute for Health and Clinical Excellence's colorectal cancer clinical guideline. The model simulates disease and treatment pathways from preclinical disease through to detection, diagnosis, adjuvant/neoadjuvant treatments, follow-up, curative/palliative treatments for metastases, supportive care, and eventual death. The model parameters were informed by meta-analyses, randomized trials, observational studies, health utility studies, audit data, costing sources, and expert opinion. Unobservable natural history parameters were calibrated against external data using Bayesian Markov chain Monte Carlo methods. Economic analysis was undertaken using conventional cost-utility decision rules within each guideline topic and constrained maximization rules across multiple topics. Under usual processes for guideline development, piecewise economic modeling would have been used to evaluate between one and three topics. The Whole Disease Model was capable of evaluating 11 of 15 guideline topics, ranging from alternative diagnostic technologies through to treatments for metastatic disease. The constrained maximization analysis identified a configuration of colorectal services that is expected to maximize quality-adjusted life-year gains without exceeding current expenditure levels. This study indicates that Whole Disease Model development is feasible and can allow for the economic analysis of most interventions across a disease service within a consistent conceptual and mathematical infrastructure. This disease-level modeling approach may be of particular value in providing an economic basis to support other clinical guidelines. Copyright © 2013 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C.
2014-01-01
Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology. PMID:24921649
Chen, Guocai; Cairelli, Michael J; Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C
2014-06-01
Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.
Traditional Chinese medical therapy for erectile dysfunction
Li, Hao; Jiang, Hongyang
2017-01-01
Traditional Chinese medicine (TCM), including acupuncture and Chinese herbs, is used as an alternative therapy to increase the curative effect for erectile dysfunction (ED). A large number of studies have been conducted to investigate the effect and mechanism of TCM for treating ED. The therapeutic effect of acupuncture on ED is still controversial at present. However, some Chinese herbs exhibited satisfying outcomes and they might improve erectile function by activating nitric oxide synthase (NOS)-cyclic guanosine monophosphate (cGMP) pathway, increasing cyclic adenosine monophosphate (cAMP) expression, elevating testosterone level, reducing intracellular Ca2+ concentration, down-regulating transforming growth factor β1 (TGFβ1)/Smad2 signaling pathway, or ameliorating the oxidative stress. PMID:28540226
Latino, Diogo A R S; Wicker, Jörg; Gütlein, Martin; Schmid, Emanuel; Kramer, Stefan; Fenner, Kathrin
2017-03-22
Developing models for the prediction of microbial biotransformation pathways and half-lives of trace organic contaminants in different environments requires as training data easily accessible and sufficiently large collections of respective biotransformation data that are annotated with metadata on study conditions. Here, we present the Eawag-Soil package, a public database that has been developed to contain all freely accessible regulatory data on pesticide degradation in laboratory soil simulation studies for pesticides registered in the EU (282 degradation pathways, 1535 reactions, 1619 compounds and 4716 biotransformation half-life values with corresponding metadata on study conditions). We provide a thorough description of this novel data resource, and discuss important features of the pesticide soil degradation data that are relevant for model development. Most notably, the variability of half-life values for individual compounds is large and only about one order of magnitude lower than the entire range of median half-life values spanned by all compounds, demonstrating the need to consider study conditions in the development of more accurate models for biotransformation prediction. We further show how the data can be used to find missing rules relevant for predicting soil biotransformation pathways. From this analysis, eight examples of reaction types were presented that should trigger the formulation of new biotransformation rules, e.g., Ar-OH methylation, or the extension of existing rules, e.g., hydroxylation in aliphatic rings. The data were also used to exemplarily explore the dependence of half-lives of different amide pesticides on chemical class and experimental parameters. This analysis highlighted the value of considering initial transformation reactions for the development of meaningful quantitative-structure biotransformation relationships (QSBR), which is a novel opportunity offered by the simultaneous encoding of transformation reactions and corresponding half-lives in Eawag-Soil. Overall, Eawag-Soil provides an unprecedentedly rich collection of manually extracted and curated biotransformation data, which should be useful in a great variety of applications.
Reconciled rat and human metabolic networks for comparative toxicogenomics and biomarker predictions
Blais, Edik M.; Rawls, Kristopher D.; Dougherty, Bonnie V.; Li, Zhuo I.; Kolling, Glynis L.; Ye, Ping; Wallqvist, Anders; Papin, Jason A.
2017-01-01
The laboratory rat has been used as a surrogate to study human biology for more than a century. Here we present the first genome-scale network reconstruction of Rattus norvegicus metabolism, iRno, and a significantly improved reconstruction of human metabolism, iHsa. These curated models comprehensively capture metabolic features known to distinguish rats from humans including vitamin C and bile acid synthesis pathways. After reconciling network differences between iRno and iHsa, we integrate toxicogenomics data from rat and human hepatocytes, to generate biomarker predictions in response to 76 drugs. We validate comparative predictions for xanthine derivatives with new experimental data and literature-based evidence delineating metabolite biomarkers unique to humans. Our results provide mechanistic insights into species-specific metabolism and facilitate the selection of biomarkers consistent with rat and human biology. These models can serve as powerful computational platforms for contextualizing experimental data and making functional predictions for clinical and basic science applications. PMID:28176778
Li, Zhao; Li, Jin; Yu, Peng
2018-01-01
Abstract Metadata curation has become increasingly important for biological discovery and biomedical research because a large amount of heterogeneous biological data is currently freely available. To facilitate efficient metadata curation, we developed an easy-to-use web-based curation application, GEOMetaCuration, for curating the metadata of Gene Expression Omnibus datasets. It can eliminate mechanical operations that consume precious curation time and can help coordinate curation efforts among multiple curators. It improves the curation process by introducing various features that are critical to metadata curation, such as a back-end curation management system and a curator-friendly front-end. The application is based on a commonly used web development framework of Python/Django and is open-sourced under the GNU General Public License V3. GEOMetaCuration is expected to benefit the biocuration community and to contribute to computational generation of biological insights using large-scale biological data. An example use case can be found at the demo website: http://geometacuration.yubiolab.org. Database URL: https://bitbucket.com/yubiolab/GEOMetaCuration PMID:29688376
OntoMate: a text-mining tool aiding curation at the Rat Genome Database
Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary
2015-01-01
The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558
Convergent genetic and expression data implicate immunity in Alzheimer's disease
Jones, Lesley; Lambert, Jean-Charles; Wang, Li-San; Choi, Seung-Hoan; Harold, Denise; Vedernikov, Alexey; Escott-Price, Valentina; Stone, Timothy; Richards, Alexander; Bellenguez, Céline; Ibrahim-Verbaas, Carla A; Naj, Adam C; Sims, Rebecca; Gerrish, Amy; Jun, Gyungah; DeStefano, Anita L; Bis, Joshua C; Beecham, Gary W; Grenier-Boley, Benjamin; Russo, Giancarlo; Thornton-Wells, Tricia A; Jones, Nicola; Smith, Albert V; Chouraki, Vincent; Thomas, Charlene; Ikram, M Arfan; Zelenika, Diana; Vardarajan, Badri N; Kamatani, Yoichiro; Lin, Chiao-Feng; Schmidt, Helena; Kunkle, Brian; Dunstan, Melanie L; Ruiz, Agustin; Bihoreau, Marie-Thérèse; Reitz, Christiane; Pasquier, Florence; Hollingworth, Paul; Hanon, Olivier; Fitzpatrick, Annette L; Buxbaum, Joseph D; Campion, Dominique; Crane, Paul K; Becker, Tim; Gudnason, Vilmundur; Cruchaga, Carlos; Craig, David; Amin, Najaf; Berr, Claudine; Lopez, Oscar L; De Jager, Philip L; Deramecourt, Vincent; Johnston, Janet A; Evans, Denis; Lovestone, Simon; Letteneur, Luc; Kornhuber, Johanes; Tárraga, Lluís; Rubinsztein, David C; Eiriksdottir, Gudny; Sleegers, Kristel; Goate, Alison M; Fiévet, Nathalie; Huentelman, Matthew J; Gill, Michael; Emilsson, Valur; Brown, Kristelle; Kamboh, M Ilyas; Keller, Lina; Barberger-Gateau, Pascale; McGuinness, Bernadette; Larson, Eric B; Myers, Amanda J; Dufouil, Carole; Todd, Stephen; Wallon, David; Love, Seth; Kehoe, Pat; Rogaeva, Ekaterina; Gallacher, John; George-Hyslop, Peter St; Clarimon, Jordi; Lleὀ, Alberti; Bayer, Anthony; Tsuang, Debby W; Yu, Lei; Tsolaki, Magda; Bossù, Paola; Spalletta, Gianfranco; Proitsi, Petra; Collinge, John; Sorbi, Sandro; Garcia, Florentino Sanchez; Fox, Nick; Hardy, John; Naranjo, Maria Candida Deniz; Razquin, Cristina; Bosco, Paola; Clarke, Robert; Brayne, Carol; Galimberti, Daniela; Mancuso, Michelangelo; Moebus, Susanne; Mecocci, Patrizia; del Zompo, Maria; Maier, Wolfgang; Hampel, Harald; Pilotto, Alberto; Bullido, Maria; Panza, Francesco; Caffarra, Paolo; Nacmias, Benedetta; Gilbert, John R; Mayhaus, Manuel; Jessen, Frank; Dichgans, Martin; Lannfelt, Lars; Hakonarson, Hakon; Pichler, Sabrina; Carrasquillo, Minerva M; Ingelsson, Martin; Beekly, Duane; Alavarez, Victoria; Zou, Fanggeng; Valladares, Otto; Younkin, Steven G; Coto, Eliecer; Hamilton-Nelson, Kara L; Mateo, Ignacio; Owen, Michael J; Faber, Kelley M; Jonsson, Palmi V; Combarros, Onofre; O'Donovan, Michael C; Cantwell, Laura B; Soininen, Hilkka; Blacker, Deborah; Mead, Simon; Mosley, Thomas H; Bennett, David A; Harris, Tamara B; Fratiglioni, Laura; Holmes, Clive; de Bruijn, Renee FAG; Passmore, Peter; Montine, Thomas J; Bettens, Karolien; Rotter, Jerome I; Brice, Alexis; Morgan, Kevin; Foroud, Tatiana M; Kukull, Walter A; Hannequin, Didier; Powell, John F; Nalls, Michael A; Ritchie, Karen; Lunetta, Kathryn L; Kauwe, John SK; Boerwinkle, Eric; Riemenschneider, Matthias; Boada, Mercè; Hiltunen, Mikko; Martin, Eden R; Pastor, Pau; Schmidt, Reinhold; Rujescu, Dan; Dartigues, Jean-François; Mayeux, Richard; Tzourio, Christophe; Hofman, Albert; Nöthen, Markus M; Graff, Caroline; Psaty, Bruce M; Haines, Jonathan L; Lathrop, Mark; Pericak-Vance, Margaret A; Launer, Lenore J; Farrer, Lindsay A; van Duijn, Cornelia M; Van Broekhoven, Christine; Ramirez, Alfredo; Schellenberg, Gerard D; Seshadri, Sudha; Amouyel, Philippe; Holmans, Peter A
2015-01-01
Background Late–onset Alzheimer's disease (AD) is heritable with 20 genes showing genome wide association in the International Genomics of Alzheimer's Project (IGAP). To identify the biology underlying the disease we extended these genetic data in a pathway analysis. Methods The ALIGATOR and GSEA algorithms were used in the IGAP data to identify associated functional pathways and correlated gene expression networks in human brain. Results ALIGATOR identified an excess of curated biological pathways showing enrichment of association. Enriched areas of biology included the immune response (p = 3.27×10-12 after multiple testing correction for pathways), regulation of endocytosis (p = 1.31×10-11), cholesterol transport (p = 2.96 × 10-9) and proteasome-ubiquitin activity (p = 1.34×10-6). Correlated gene expression analysis identified four significant network modules, all related to the immune response (corrected p 0.002 – 0.05). Conclusions The immune response, regulation of endocytosis, cholesterol transport and protein ubiquitination represent prime targets for AD therapeutics. PMID:25533204
Convergent genetic and expression data implicate immunity in Alzheimer's disease.
2015-06-01
Late-onset Alzheimer's disease (AD) is heritable with 20 genes showing genome-wide association in the International Genomics of Alzheimer's Project (IGAP). To identify the biology underlying the disease, we extended these genetic data in a pathway analysis. The ALIGATOR and GSEA algorithms were used in the IGAP data to identify associated functional pathways and correlated gene expression networks in human brain. ALIGATOR identified an excess of curated biological pathways showing enrichment of association. Enriched areas of biology included the immune response (P = 3.27 × 10(-12) after multiple testing correction for pathways), regulation of endocytosis (P = 1.31 × 10(-11)), cholesterol transport (P = 2.96 × 10(-9)), and proteasome-ubiquitin activity (P = 1.34 × 10(-6)). Correlated gene expression analysis identified four significant network modules, all related to the immune response (corrected P = .002-.05). The immune response, regulation of endocytosis, cholesterol transport, and protein ubiquitination represent prime targets for AD therapeutics. Copyright © 2015. Published by Elsevier Inc.
Kayala, Matthew A; Baldi, Pierre
2012-10-22
Proposing reasonable mechanisms and predicting the course of chemical reactions is important to the practice of organic chemistry. Approaches to reaction prediction have historically used obfuscating representations and manually encoded patterns or rules. Here we present ReactionPredictor, a machine learning approach to reaction prediction that models elementary, mechanistic reactions as interactions between approximate molecular orbitals (MOs). A training data set of productive reactions known to occur at reasonable rates and yields and verified by inclusion in the literature or textbooks is derived from an existing rule-based system and expanded upon with manual curation from graduate level textbooks. Using this training data set of complex polar, hypervalent, radical, and pericyclic reactions, a two-stage machine learning prediction framework is trained and validated. In the first stage, filtering models trained at the level of individual MOs are used to reduce the space of possible reactions to consider. In the second stage, ranking models over the filtered space of possible reactions are used to order the reactions such that the productive reactions are the top ranked. The resulting model, ReactionPredictor, perfectly ranks polar reactions 78.1% of the time and recovers all productive reactions 95.7% of the time when allowing for small numbers of errors. Pericyclic and radical reactions are perfectly ranked 85.8% and 77.0% of the time, respectively, rising to >93% recovery for both reaction types with a small number of allowed errors. Decisions about which of the polar, pericyclic, or radical reaction type ranking models to use can be made with >99% accuracy. Finally, for multistep reaction pathways, we implement the first mechanistic pathway predictor using constrained tree-search to discover a set of reasonable mechanistic steps from given reactants to given products. Webserver implementations of both the single step and pathway versions of ReactionPredictor are available via the chemoinformatics portal http://cdb.ics.uci.edu/.
Drug Evaluation in the Plasmodium Falciparum - Aotus Model
1986-10-01
blood schizonticidal/curative activity of experimental antimalarial drugs. WR 245082, an acridineainine, at similar doses cured infections of chloroquine ...Guinea - Chesson strain). The curative activity of WR 245082, an acridineamine, for chloroquine - sensitive and chloroquine -resistant strains of P...antimalarial activity of two analogues of the amino acid histidine was assessed against infections of the Uganda Palo Alto strain. WR 251853, 2-fluoro-l
BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions
2010-01-01
Background Genome-scale metabolic reconstructions under the Constraint Based Reconstruction and Analysis (COBRA) framework are valuable tools for analyzing the metabolic capabilities of organisms and interpreting experimental data. As the number of such reconstructions and analysis methods increases, there is a greater need for data uniformity and ease of distribution and use. Description We describe BiGG, a knowledgebase of Biochemically, Genetically and Genomically structured genome-scale metabolic network reconstructions. BiGG integrates several published genome-scale metabolic networks into one resource with standard nomenclature which allows components to be compared across different organisms. BiGG can be used to browse model content, visualize metabolic pathway maps, and export SBML files of the models for further analysis by external software packages. Users may follow links from BiGG to several external databases to obtain additional information on genes, proteins, reactions, metabolites and citations of interest. Conclusions BiGG addresses a need in the systems biology community to have access to high quality curated metabolic models and reconstructions. It is freely available for academic use at http://bigg.ucsd.edu. PMID:20426874
Development of Computational Tools for Metabolic Model Curation, Flux Elucidation and Strain Design
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maranas, Costas D
An overarching goal of the Department of Energy mission is the efficient deployment and engineering of microbial and plant systems to enable biomass conversion in pursuit of high energy density liquid biofuels. This has spurred the pace at which new organisms are sequenced and annotated. This torrent of genomic information has opened the door to understanding metabolism in not just skeletal pathways and a handful of microorganisms but for truly genome-scale reconstructions derived for hundreds of microbes and plants. Understanding and redirecting metabolism is crucial because metabolic fluxes are unique descriptors of cellular physiology that directly assess the current cellularmore » state and quantify the effect of genetic engineering interventions. At the same time, however, trying to keep pace with the rate of genomic data generation has ushered in a number of modeling and computational challenges related to (i) the automated assembly, testing and correction of genome-scale metabolic models, (ii) metabolic flux elucidation using labeled isotopes, and (iii) comprehensive identification of engineering interventions leading to the desired metabolism redirection.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tseng, Yolanda D., E-mail: ydt2@uw.edu; Chen, Yu-Hui; Catalano, Paul J.
Purpose: To evaluate the response rate (RR) and time to local recurrence (TTLR) among patients who received salvage radiation therapy for relapsed or refractory aggressive non-Hodgkin lymphoma (NHL) and investigate whether RR and TTLR differed according to disease characteristics. Methods and Materials: A retrospective review was performed for all patients who completed a course of salvage radiation therapy between January 2001 and May 2011 at Brigham and Women's Hospital/Dana-Farber Cancer Institute. Separate analyses were conducted for patients treated with palliative and curative intent. Predictors of RR for each subgroup were assessed using a generalized estimating equation model. For patients treatedmore » with curative intent, local control (LC) and progression-free survival were estimated with the Kaplan-Meier method; predictors for TTLR were evaluated using a Cox proportional hazards regression model. Results: Salvage radiation therapy was used to treat 110 patients to 121 sites (76 curative, 45 palliative). Salvage radiation therapy was given as part of consolidation in 18% of patients treated with curative intent. Median dose was 37.8 Gy, with 58% and 36% of curative and palliative patients, respectively, receiving 39.6 Gy or higher. The RR was high (86% curative, 84% palliative). With a median follow-up of 4.8 years among living patients, 5-year LC and progression-free survival for curative patients were 66% and 34%, respectively. Refractory disease (hazard ratio 3.3; P=.024) and lack of response to initial chemotherapy (hazard ratio 4.3; P=.007) but not dose (P=.93) were associated with shorter TTLR. Despite doses of 39.6 Gy or higher, 2-year LC was only 61% for definitive patients with refractory disease or disease that did not respond to initial chemotherapy. Conclusions: Relapsed or refractory aggressive NHL is responsive to salvage radiation therapy, and durable LC can be achieved in some cases. However, refractory disease is associated with a shorter TTLR, suggesting that radiation dose escalation, addition of radiosensitizers, or a combination of both may be indicated in these patients.« less
Signaling Network Map of Endothelial TEK Tyrosine Kinase
Sandhya, Varot K.; Singh, Priyata; Parthasarathy, Deepak; Kumar, Awinav; Gattu, Rudrappa; Mathur, Premendu Prakash; Mac Gabhann, F.; Pandey, Akhilesh
2014-01-01
TEK tyrosine kinase is primarily expressed on endothelial cells and is most commonly referred to as TIE2. TIE2 is a receptor tyrosine kinase modulated by its ligands, angiopoietins, to regulate the development and remodeling of vascular system. It is also one of the critical pathways associated with tumor angiogenesis and familial venous malformations. Apart from the vascular system, TIE2 signaling is also associated with postnatal hematopoiesis. Despite the involvement of TIE2-angiopoietin system in several diseases, the downstream molecular events of TIE2-angiopoietin signaling are not reported in any pathway repository. Therefore, carrying out a detailed review of published literature, we have documented molecular signaling events mediated by TIE2 in response to angiopoietins and developed a network map of TIE2 signaling. The pathway information is freely available to the scientific community through NetPath, a manually curated resource of signaling pathways. We hope that this pathway resource will provide an in-depth view of TIE2-angiopoietin signaling and will lead to identification of potential therapeutic targets for TIE2-angiopoietin associated disorders. PMID:25371820
A genome-scale metabolic model of the lipid-accumulating yeast Yarrowia lipolytica
2012-01-01
Background Yarrowia lipolytica is an oleaginous yeast which has emerged as an important microorganism for several biotechnological processes, such as the production of organic acids, lipases and proteases. It is also considered a good candidate for single-cell oil production. Although some of its metabolic pathways are well studied, its metabolic engineering is hindered by the lack of a genome-scale model that integrates the current knowledge about its metabolism. Results Combining in silico tools and expert manual curation, we have produced an accurate genome-scale metabolic model for Y. lipolytica. Using a scaffold derived from a functional metabolic model of the well-studied but phylogenetically distant yeast S. cerevisiae, we mapped conserved reactions, rewrote gene associations, added species-specific reactions and inserted specialized copies of scaffold reactions to account for species-specific expansion of protein families. We used physiological measures obtained under lab conditions to validate our predictions. Conclusions Y. lipolytica iNL895 represents the first well-annotated metabolic model of an oleaginous yeast, providing a base for future metabolic improvement, and a starting point for the metabolic reconstruction of other species in the Yarrowia clade and other oleaginous yeasts. PMID:22558935
2012-01-01
Background Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. Results In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and relationship errors in KEGG). We turn complicated and incompatible xml data formats and inconsistent gene and gene relationship representations from different source databases into normalized and unified pathway-gene and pathway-gene pair relationships neatly recorded in simple tab-delimited text format and MySQL tables, which facilitates convenient automatic computation and large-scale referencing in many related studies. IntPath data can be downloaded in text format or MySQL dump. IntPath data can also be retrieved and analyzed conveniently through web service by local programs or through web interface by mouse clicks. Several useful analysis tools are also provided in IntPath. Conclusions We have overcome in IntPath the issues of compatibility, consistency, and comprehensiveness that often hamper effective use of pathway databases. We have included four organisms in the current release of IntPath. Our methodology and programs described in this work can be easily applied to other organisms; and we will include more model organisms and important pathogens in future releases of IntPath. IntPath maintains regular updates and is freely available at http://compbio.ddns.comp.nus.edu.sg:8080/IntPath. PMID:23282057
Standards-based curation of a decade-old digital repository dataset of molecular information.
Harvey, Matthew J; Mason, Nicholas J; McLean, Andrew; Murray-Rust, Peter; Rzepa, Henry S; Stewart, James J P
2015-01-01
The desirable curation of 158,122 molecular geometries derived from the NCI set of reference molecules together with associated properties computed using the MOPAC semi-empirical quantum mechanical method and originally deposited in 2005 into the Cambridge DSpace repository as a data collection is reported. The procedures involved in the curation included annotation of the original data using new MOPAC methods, updating the syntax of the CML documents used to express the data to ensure schema conformance and adding new metadata describing the entries together with a XML schema transformation to map the metadata schema to that used by the DataCite organisation. We have adopted a granularity model in which a DataCite persistent identifier (DOI) is created for each individual molecule to enable data discovery and data metrics at this level using DataCite tools. We recommend that the future research data management (RDM) of the scientific and chemical data components associated with journal articles (the "supporting information") should be conducted in a manner that facilitates automatic periodic curation. Graphical abstractStandards and metadata-based curation of a decade-old digital repository dataset of molecular information.
ChlamyCyc: an integrative systems biology database and web-portal for Chlamydomonas reinhardtii.
May, Patrick; Christian, Jan-Ole; Kempa, Stefan; Walther, Dirk
2009-05-04
The unicellular green alga Chlamydomonas reinhardtii is an important eukaryotic model organism for the study of photosynthesis and plant growth. In the era of modern high-throughput technologies there is an imperative need to integrate large-scale data sets from high-throughput experimental techniques using computational methods and database resources to provide comprehensive information about the molecular and cellular organization of a single organism. In the framework of the German Systems Biology initiative GoFORSYS, a pathway database and web-portal for Chlamydomonas (ChlamyCyc) was established, which currently features about 250 metabolic pathways with associated genes, enzymes, and compound information. ChlamyCyc was assembled using an integrative approach combining the recently published genome sequence, bioinformatics methods, and experimental data from metabolomics and proteomics experiments. We analyzed and integrated a combination of primary and secondary database resources, such as existing genome annotations from JGI, EST collections, orthology information, and MapMan classification. ChlamyCyc provides a curated and integrated systems biology repository that will enable and assist in systematic studies of fundamental cellular processes in Chlamydomonas. The ChlamyCyc database and web-portal is freely available under http://chlamycyc.mpimp-golm.mpg.de.
The Biomolecular Interaction Network Database and related tools 2005 update
Alfarano, C.; Andrade, C. E.; Anthony, K.; Bahroos, N.; Bajec, M.; Bantoft, K.; Betel, D.; Bobechko, B.; Boutilier, K.; Burgess, E.; Buzadzija, K.; Cavero, R.; D'Abreo, C.; Donaldson, I.; Dorairajoo, D.; Dumontier, M. J.; Dumontier, M. R.; Earles, V.; Farrall, R.; Feldman, H.; Garderman, E.; Gong, Y.; Gonzaga, R.; Grytsan, V.; Gryz, E.; Gu, V.; Haldorsen, E.; Halupa, A.; Haw, R.; Hrvojic, A.; Hurrell, L.; Isserlin, R.; Jack, F.; Juma, F.; Khan, A.; Kon, T.; Konopinsky, S.; Le, V.; Lee, E.; Ling, S.; Magidin, M.; Moniakis, J.; Montojo, J.; Moore, S.; Muskat, B.; Ng, I.; Paraiso, J. P.; Parker, B.; Pintilie, G.; Pirone, R.; Salama, J. J.; Sgro, S.; Shan, T.; Shu, Y.; Siew, J.; Skinner, D.; Snyder, K.; Stasiuk, R.; Strumpf, D.; Tuekam, B.; Tao, S.; Wang, Z.; White, M.; Willis, R.; Wolting, C.; Wong, S.; Wrong, A.; Xin, C.; Yao, R.; Yates, B.; Zhang, S.; Zheng, K.; Pawson, T.; Ouellette, B. F. F.; Hogue, C. W. V.
2005-01-01
The Biomolecular Interaction Network Database (BIND) (http://bind.ca) archives biomolecular interaction, reaction, complex and pathway information. Our aim is to curate the details about molecular interactions that arise from published experimental research and to provide this information, as well as tools to enable data analysis, freely to researchers worldwide. BIND data are curated into a comprehensive machine-readable archive of computable information and provides users with methods to discover interactions and molecular mechanisms. BIND has worked to develop new methods for visualization that amplify the underlying annotation of genes and proteins to facilitate the study of molecular interaction networks. BIND has maintained an open database policy since its inception in 1999. Data growth has proceeded at a tremendous rate, approaching over 100 000 records. New services provided include a new BIND Query and Submission interface, a Standard Object Access Protocol service and the Small Molecule Interaction Database (http://smid.blueprint.org) that allows users to determine probable small molecule binding sites of new sequences and examine conserved binding residues. PMID:15608229
DEOP: a database on osmoprotectants and associated pathways
Bougouffa, Salim; Radovanovic, Aleksandar; Essack, Magbubah; Bajic, Vladimir B.
2014-01-01
Microorganisms are known to counteract salt stress through salt influx or by the accumulation of osmoprotectants (also called compatible solutes). Understanding the pathways that synthesize and/or breakdown these osmoprotectants is of interest to studies of crops halotolerance and to biotechnology applications that use microbes as cell factories for production of biomass or commercial chemicals. To facilitate the exploration of osmoprotectants, we have developed the first online resource, ‘Dragon Explorer of Osmoprotection associated Pathways’ (DEOP) that gathers and presents curated information about osmoprotectants, complemented by information about reactions and pathways that use or affect them. A combined total of 141 compounds were confirmed osmoprotectants, which were matched to 1883 reactions and 834 pathways. DEOP can also be used to map genes or microbial genomes to potential osmoprotection-associated pathways, and thus link genes and genomes to other associated osmoprotection information. Moreover, DEOP provides a text-mining utility to search deeper into the scientific literature for supporting evidence or for new associations of osmoprotectants to pathways, reactions, enzymes, genes or organisms. Two case studies are provided to demonstrate the usefulness of DEOP. The system can be accessed at. Database URL: http://www.cbrc.kaust.edu.sa/deop/ PMID:25326239
NemaPath: online exploration of KEGG-based metabolic pathways for nematodes
Wylie, Todd; Martin, John; Abubucker, Sahar; Yin, Yong; Messina, David; Wang, Zhengyuan; McCarter, James P; Mitreva, Makedonka
2008-01-01
Background Nematode.net is a web-accessible resource for investigating gene sequences from parasitic and free-living nematode genomes. Beyond the well-characterized model nematode C. elegans, over 500,000 expressed sequence tags (ESTs) and nearly 600,000 genome survey sequences (GSSs) have been generated from 36 nematode species as part of the Parasitic Nematode Genomics Program undertaken by the Genome Center at Washington University School of Medicine. However, these sequencing data are not present in most publicly available protein databases, which only include sequences in Swiss-Prot. Swiss-Prot, in turn, relies on GenBank/Embl/DDJP for predicted proteins from complete genomes or full-length proteins. Description Here we present the NemaPath pathway server, a web-based pathway-level visualization tool for navigating putative metabolic pathways for over 30 nematode species, including 27 parasites. The NemaPath approach consists of two parts: 1) a backend tool to align and evaluate nematode genomic sequences (curated EST contigs) against the annotated Kyoto Encyclopedia of Genes and Genomes (KEGG) protein database; 2) a web viewing application that displays annotated KEGG pathway maps based on desired confidence levels of primary sequence similarity as defined by a user. NemaPath also provides cross-referenced access to nematode genome information provided by other tools available on Nematode.net, including: detailed NemaGene EST cluster information; putative translations; GBrowse EST cluster views; links from nematode data to external databases for corresponding synonymous C. elegans counterparts, subject matches in KEGG's gene database, and also KEGG Ontology (KO) identification. Conclusion The NemaPath server hosts metabolic pathway mappings for 30 nematode species and is available on the World Wide Web at . The nematode source sequences used for the metabolic pathway mappings are available via FTP , as provided by the Genome Center at Washington University School of Medicine. PMID:18983679
NASA Astrophysics Data System (ADS)
Palmer, C. L.; Mayernik, M. S.; Weber, N.; Baker, K. S.; Kelly, K.; Marlino, M. R.; Thompson, C. A.
2013-12-01
The need for data curation is being recognized in numerous institutional settings as national research funding agencies extend data archiving mandates to cover more types of research grants. Data curation, however, is not only a practical challenge. It presents many conceptual and theoretical challenges that must be investigated to design appropriate technical systems, social practices and institutions, policies, and services. This presentation reports on outcomes from an investigation of research problems in data curation conducted as part of the Data Curation Education in Research Centers (DCERC) program. DCERC is developing a new model for educating data professionals to contribute to scientific research. The program is organized around foundational courses and field experiences in research and data centers for both master's and doctoral students. The initiative is led by the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign, in collaboration with the School of Information Sciences at the University of Tennessee, and library and data professionals at the National Center for Atmospheric Research (NCAR). At the doctoral level DCERC is educating future faculty and researchers in data curation and establishing a research agenda to advance the field. The doctoral seminar, Research Problems in Data Curation, was developed and taught in 2012 by the DCERC principal investigator and two doctoral fellows at the University of Illinois. It was designed to define the problem space of data curation, examine relevant concepts and theories related to both technical and social perspectives, and articulate research questions that are either unexplored or under theorized in the current literature. There was a particular emphasis on the Earth and environmental sciences, with guest speakers brought in from NCAR, National Snow and Ice Data Center (NSIDC), and Rensselaer Polytechnic Institute. Through the assignments, students constructed dozens of research questions informed by class readings, presentations, and discussions. A technical report is in progress on the resulting research agenda covering: data standards; infrastructure; research context; data reuse; sharing and access; preservation; and conceptual foundations. This presentation will discuss the agenda and its importance for the geosciences, highlighting high priority research questions. It will also introduce the related research to be undertaken by two DCERC doctoral students at NCAR during the 2013-2014 academic year and other data curation research in progress by the doctoral DCERC team.
Wright, Felicity A; Bebawy, Mary; O'Brien, Tracey A
2015-01-01
Hematopoietic stem cell transplantation is a high-risk procedure that is offered, with curative intent, to patients with malignant and nonmalignant disease. The clinical benefits of personalization of therapy by genotyping have been demonstrated by the reduction in transplant related mortality from donor-recipient HLA matching. However, defining the relationship between genotype and transplant conditioning agents is yet to be translated into clinical practice. A number of the therapeutic agents used in stem cell transplant preparative regimens have pharmacokinetic parameters that predict benefit of incorporating pharmacogenomic data into dosing strategies. Busulfan, cyclophosphamide, thio-TEPA and etoposide have well-described drug metabolism pathways, however candidate gene studies have identified there is a gap in the identification of pharmacogenomic data that can be used to improve transplant outcomes. Incorporating pharmacogenomics into pharmacokinetic modeling may demonstrate the therapeutic benefits of genotyping in transplant preparative regimen agents.
PDS4 - Some Principles for Agile Data Curation
NASA Astrophysics Data System (ADS)
Hughes, J. S.; Crichton, D. J.; Hardman, S. H.; Joyner, R.; Algermissen, S.; Padams, J.
2015-12-01
PDS4, a research data management and curation system for NASA's Planetary Science Archive, was developed using principles that promote the characteristics of agile development. The result is an efficient system that produces better research data products while using less resources (time, effort, and money) and maximizes their usefulness for current and future scientists. The key principle is architectural. The PDS4 information architecture is developed and maintained independent of the infrastructure's process, application and technology architectures. The information architecture is based on an ontology-based information model developed to leverage best practices from standard reference models for digital archives, digital object registries, and metadata registries and capture domain knowledge from a panel of planetary science domain experts. The information model provides a sharable, stable, and formal set of information requirements for the system and is the primary source for information to configure most system components, including the product registry, search engine, validation and display tools, and production pipelines. Multi-level governance is also allowed for the effective management of the informational elements at the common, discipline, and project level. This presentation will describe the development principles, components, and uses of the information model and how an information model-driven architecture exhibits characteristics of agile curation including early delivery, evolutionary development, adaptive planning, continuous improvement, and rapid and flexible response to change.
Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data
2011-01-01
Background With the advent of high-throughput targeted metabolic profiling techniques, the question of how to interpret and analyze the resulting vast amount of data becomes more and more important. In this work we address the reconstruction of metabolic reactions from cross-sectional metabolomics data, that is without the requirement for time-resolved measurements or specific system perturbations. Previous studies in this area mainly focused on Pearson correlation coefficients, which however are generally incapable of distinguishing between direct and indirect metabolic interactions. Results In our new approach we propose the application of a Gaussian graphical model (GGM), an undirected probabilistic graphical model estimating the conditional dependence between variables. GGMs are based on partial correlation coefficients, that is pairwise Pearson correlation coefficients conditioned against the correlation with all other metabolites. We first demonstrate the general validity of the method and its advantages over regular correlation networks with computer-simulated reaction systems. Then we estimate a GGM on data from a large human population cohort, covering 1020 fasting blood serum samples with 151 quantified metabolites. The GGM is much sparser than the correlation network, shows a modular structure with respect to metabolite classes, and is stable to the choice of samples in the data set. On the example of human fatty acid metabolism, we demonstrate for the first time that high partial correlation coefficients generally correspond to known metabolic reactions. This feature is evaluated both manually by investigating specific pairs of high-scoring metabolites, and then systematically on a literature-curated model of fatty acid synthesis and degradation. Our method detects many known reactions along with possibly novel pathway interactions, representing candidates for further experimental examination. Conclusions In summary, we demonstrate strong signatures of intracellular pathways in blood serum data, and provide a valuable tool for the unbiased reconstruction of metabolic reactions from large-scale metabolomics data sets. PMID:21281499
Fishing for causes and cures of motor neuron disorders
Patten, Shunmoogum A.; Armstrong, Gary A. B.; Lissouba, Alexandra; Kabashi, Edor; Parker, J. Alex; Drapeau, Pierre
2014-01-01
Motor neuron disorders (MNDs) are a clinically heterogeneous group of neurological diseases characterized by progressive degeneration of motor neurons, and share some common pathological pathways. Despite remarkable advances in our understanding of these diseases, no curative treatment for MNDs exists. To better understand the pathogenesis of MNDs and to help develop new treatments, the establishment of animal models that can be studied efficiently and thoroughly is paramount. The zebrafish (Danio rerio) is increasingly becoming a valuable model for studying human diseases and in screening for potential therapeutics. In this Review, we highlight recent progress in using zebrafish to study the pathology of the most common MNDs: spinal muscular atrophy (SMA), amyotrophic lateral sclerosis (ALS) and hereditary spastic paraplegia (HSP). These studies indicate the power of zebrafish as a model to study the consequences of disease-related genes, because zebrafish homologues of human genes have conserved functions with respect to the aetiology of MNDs. Zebrafish also complement other animal models for the study of pathological mechanisms of MNDs and are particularly advantageous for the screening of compounds with therapeutic potential. We present an overview of their potential usefulness in MND drug discovery, which is just beginning and holds much promise for future therapeutic development. PMID:24973750
Won, Young-Woong; Joo, Jungnam; Yun, Tak; Lee, Geon-Kook; Han, Ji-Youn; Kim, Heung Tae; Lee, Jin Soo; Kim, Moon Soo; Lee, Jong Mog; Lee, Hyun-Sung; Zo, Jae Ill; Kim, Sohee
2015-05-01
Development of brain metastasis results in a significant reduction in overall survival. However, there is no an effective tool to predict brain metastasis in non-small cell lung cancer (NSCLC) patients. We conducted this study to develop a feasible nomogram that can predict metastasis to the brain as the first relapse site in patients with curatively resected NSCLC. A retrospective review of NSCLC patients who had received curative surgery at National Cancer Center (Goyang, South Korea) between 2001 and 2008 was performed. We chose metastasis to the brain as the first relapse site after curative surgery as the primary endpoint of the study. A nomogram was modeled using logistic regression. Among 1218 patients, brain metastasis as the first relapse developed in 87 patients (7.14%) during the median follow-up of 43.6 months. Occurrence rates of brain metastasis were higher in patients with adenocarcinoma or those with a high pT and pN stage. Younger age appeared to be associated with brain metastasis, but this result was not statistically significant. The final prediction model included histology, smoking status, pT stage, and the interaction between adenocarcinoma and pN stage. The model showed fairly good discriminatory ability with a C-statistic of 69.3% and 69.8% for predicting brain metastasis within 2 years and 5 years, respectively. Internal validation using 2000 bootstrap samples resulted in C-statistics of 67.0% and 67.4% which still indicated good discriminatory performances. The nomogram presented here provides the individual risk estimate of developing metastasis to the brain as the first relapse site in patients with NSCLC who have undergone curative surgery. Surveillance programs or preventive treatment strategies for brain metastasis could be established based on this nomogram. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Interoperability Across the Stewardship Spectrum in the DataONE Repository Federation
NASA Astrophysics Data System (ADS)
Jones, M. B.; Vieglais, D.; Wilson, B. E.
2016-12-01
Thousands of earth and environmental science repositories serve many researchers and communities, each with their own community and legal mandates, sustainability models, and historical infrastructure. These repositories span the stewardship spectrum from highly curated collections that employ large numbers of staff members to review and improve data, to small, minimal budget repositories that accept data caveat emptor and where all responsibility for quality lies with the submitter. Each repository fills a niche, providing services that meet the stewardship tradeoffs of one or more communities. We have reviewed these stewardship tradeoffs for several DataONE member repositories ranging from minimally (KNB) to highly curated (Arctic Data Center), as well as general purpose (Dryad) to highly discipline or project specific (NEON). The rationale behind different levels of stewardship reflect resolution of these tradeoffs. Some repositories aim to encourage extensive uptake by keeping processes simple and minimizing the amount of information collected, but this limits the long-term utility of the data and the search, discovery, and integration systems that are possible. Other repositories require extensive metadata input, review, and assessment, allowing for excellent preservation, discovery, and integration but at the cost of significant time for submitters and expense for curatorial staff. DataONE recognizes these different levels of curation, and attempts to embrace them to create a federation that is useful across the stewardship spectrum. DataONE provides a tiered model for repositories with growing utility of DataONE services at higher tiers of curation. The lowest tier supports read-only access to data and requires little more than title and contact metadata. Repositories can gradually phase in support for higher levels of metadata and services as needed. These tiered capabilities are possible through flexible support for multiple metadata standards and services, where repositories can incrementally increase their requirements as they want to satisfy more use cases. Within DataONE, metadata search services support minimal metadata models, but significantly expanded precision and recall become possible when repositories provide more extensively curated metadata.
Zhang, Xi-Mei; Guo, Lin; Chi, Mei-Hua; Sun, Hong-Mei; Chen, Xiao-Wen
2015-03-07
Obesity-induced chronic inflammation plays a fundamental role in the pathogenesis of metabolic syndrome (MS). Recently, a growing body of evidence supports that miRNAs are largely dysregulated in obesity and that specific miRNAs regulate obesity-associated inflammation. We applied an approach aiming to identify active miRNA-TF-gene regulatory pathways in obesity. Firstly, we detected differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRs) from mRNA and miRNA expression profiles, respectively. Secondly, by mapping the DEGs and DEmiRs to the curated miRNA-TF-gene regulatory network as active seed nodes and connect them with their immediate neighbors, we obtained the potential active miRNA-TF-gene regulatory subnetwork in obesity. Thirdly, using a Breadth-First-Search (BFS) algorithm, we identified potential active miRNA-TF-gene regulatory pathways in obesity. Finally, through the hypergeometric test, we identified the active miRNA-TF-gene regulatory pathways that were significantly related to obesity. The potential active pathways with FDR < 0.0005 were considered to be the active miRNA-TF regulatory pathways in obesity. The union of the active pathways is visualized and identical nodes of the active pathways were merged. We identified 23 active miRNA-TF-gene regulatory pathways that were significantly related to obesity-related inflammation.
A guide for building biological pathways along with two case studies: hair and breast development.
Trindade, Daniel; Orsine, Lissur A; Barbosa-Silva, Adriano; Donnard, Elisa R; Ortega, J Miguel
2015-03-01
Genomic information is being underlined in the format of biological pathways. Building these biological pathways is an ongoing demand and benefits from methods for extracting information from biomedical literature with the aid of text-mining tools. Here we hopefully guide you in the attempt of building a customized pathway or chart representation of a system. Our manual is based on a group of software designed to look at biointeractions in a set of abstracts retrieved from PubMed. However, they aim to support the work of someone with biological background, who does not need to be an expert on the subject and will play the role of manual curator while designing the representation of the system, the pathway. We therefore illustrate with two challenging case studies: hair and breast development. They were chosen for focusing on recent acquisitions of human evolution. We produced sub-pathways for each study, representing different phases of development. Differently from most charts present in current databases, we present detailed descriptions, which will additionally guide PESCADOR users along the process. The implementation as a web interface makes PESCADOR a unique tool for guiding the user along the biointeractions, which will constitute a novel pathway. Copyright © 2014 Elsevier Inc. All rights reserved.
Advanced Curation Preparation for Mars Sample Return and Cold Curation
NASA Technical Reports Server (NTRS)
Fries, M. D.; Harrington, A. D.; McCubbin, F. M.; Mitchell, J.; Regberg, A. B.; Snead, C.
2017-01-01
NASA Curation is tasked with the care and distribution of NASA's sample collections, such as the Apollo lunar samples and cometary material collected by the Stardust spacecraft. Curation is also mandated to perform Advanced Curation research and development, which includes improving the curation of existing collections as well as preparing for future sample return missions. Advanced Curation has identified a suite of technologies and techniques that will require attention ahead of Mars sample return (MSR) and missions with cold curation (CCur) requirements, perhaps including comet sample return missions.
Curation of food-relevant chemicals in ToxCast.
Karmaus, Agnes L; Trautman, Thomas D; Krishan, Mansi; Filer, Dayne L; Fix, Laurel A
2017-05-01
High-throughput in vitro assays and exposure prediction efforts are paving the way for modeling chemical risk; however, the utility of such extensive datasets can be limited or misleading when annotation fails to capture current chemical usage. To address this data gap and provide context for food-use in the United States (US), manual curation of food-relevant chemicals in ToxCast was conducted. Chemicals were categorized into three food-use categories: (1) direct food additives, (2) indirect food additives, or (3) pesticide residues. Manual curation resulted in 30% of chemicals having new annotation as well as the removal of 319 chemicals, most due to cancellation or only foreign usage. These results highlight that manual curation of chemical use information provided significant insight affecting the overall inventory and chemical categorization. In total, 1211 chemicals were confirmed as current day food-use in the US by manual curation; 1154 of these chemicals were also identified as food-related in the globally sourced chemical use information from Chemical/Product Categories database (CPCat). The refined list of food-use chemicals and the sources highlighted for compiling annotated information required to confirm food-use are valuable resources for providing needed context when evaluating large-scale inventories such as ToxCast. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Talking Cure Models: A Framework of Analysis
Marx, Christopher; Benecke, Cord; Gumz, Antje
2017-01-01
Psychotherapy is commonly described as a “talking cure,” a treatment method that operates through linguistic action and interaction. The operative specifics of therapeutic language use, however, are insufficiently understood, mainly due to a multitude of disparate approaches that advance different notions of what “talking” means and what “cure” implies in the respective context. Accordingly, a clarification of the basic theoretical structure of “talking cure models,” i.e., models that describe therapeutic processes with a focus on language use, is a desideratum of language-oriented psychotherapy research. Against this background the present paper suggests a theoretical framework of analysis which distinguishes four basic components of “talking cure models”: (1) a foundational theory (which suggests how linguistic activity can affect and transform human experience), (2) an experiential problem state (which defines the problem or pathology of the patient), (3) a curative linguistic activity (which defines linguistic activities that are supposed to effectuate a curative transformation of the experiential problem state), and (4) a change mechanism (which defines the processes and effects involved in such transformations). The purpose of the framework is to establish a terminological foundation that allows for systematically reconstructing basic properties and operative mechanisms of “talking cure models.” To demonstrate the applicability and utility of the framework, five distinct “talking cure models” which spell out the details of curative “talking” processes in terms of (1) catharsis, (2) symbolization, (3) narrative, (4) metaphor, and (5) neurocognitive inhibition are introduced and discussed in terms of the framework components. In summary, we hope that our framework will prove useful for the objective of clarifying the theoretical underpinnings of language-oriented psychotherapy research and help to establish a more comprehensive understanding of how curative language use contributes to the process of therapeutic change. PMID:28955286
Jung, Da Hyun; Lee, Yong Chan; Kim, Jie-Hyun; Lee, Sang Kil; Shin, Sung Kwan; Park, Jun Chul; Chung, Hyunsoo; Park, Jae Jun; Youn, Young Hoon; Park, Hyojin
2017-03-01
Endoscopic resection (ER) is accepted as a curative treatment option for selected cases of early gastric cancer (EGC). Although additional surgery is often recommended for patients who have undergone non-curative ER, clinicians are cautious when managing elderly patients with GC because of comorbid conditions. The aim of the study was to investigate clinical outcomes in elderly patients following non-curative ER with and without additive treatment. Subjects included 365 patients (>75 years old) who were diagnosed with EGC and underwent ER between 2007 and 2015. Clinical outcomes of three patient groups [curative ER (n = 246), non-curative ER with additive treatment (n = 37), non-curative ER without additive treatment (n = 82)] were compared. Among the patients who underwent non-curative ER with additive treatment, 28 received surgery, three received a repeat ER, and six experienced argon plasma coagulation. Patients who underwent non-curative ER alone were significantly older than those who underwent additive treatment. Overall 5-year survival rates in the curative ER, non-curative ER with treatment, and non-curative ER without treatment groups were 84, 86, and 69 %, respectively. No significant difference in overall survival was found between patients in the curative ER and non-curative ER with additive treatment groups. The non-curative ER groups were categorized by lymph node metastasis risk factors to create a high-risk group that exhibited positive lymphovascular invasion or deep submucosal invasion greater than SM2 and a low-risk group without risk factors. Overall 5-year survival rate was lowest (60 %) in the high-risk group with non-curative ER and no additive treatment. Elderly patients who underwent non-curative ER with additive treatment showed better survival outcome than those without treatment. Therefore, especially with LVI or deep submucosal invasion, additive treatment is recommended in patients undergoing non-curative ER, even if they are older than 75 years.
Linder, Gustav; Sandin, Fredrik; Johansson, Jan; Lindblad, Mats; Lundell, Lars; Hedberg, Jakob
2018-02-01
Low socioeconomic status and poor education elevate the risk of developing esophageal- and junctional cancer. High education level also increases survival after curative surgery. The present study aimed to investigate associations, if any, between patient education-level and treatment allocation after diagnosis of esophageal- and junctional cancer and its subsequent impact on survival. A nation-wide cohort study was undertaken. Data from a Swedish national quality register for esophageal cancer (NREV) was linked to the National Cancer Register, National Patient Register, Prescribed Drug Register, Cause of Death Register and educational data from Statistics Sweden. The effect of education level (low; ≤9 years, intermediate; 10-12 years and high >12 years) on the probability of allocation to curative treatment was analyzed with logistic regression. The Kaplan-Meier-method and Cox proportional hazard models were used to assess the effect of education on survival. A total of 4112 patients were included. In a multivariate logistic regression model, high education level was associated with greater probability of allocation to curative treatment (adjusted OR: 1.48, 95% CI: 1.08-2.03, p = 0,014) as was adherence to a multidisciplinary treatment-conference (adjusted OR: 3.13, 95% CI: 2.40-4.08, p < 0,001). High education level was associated with improved survival in the patients allocated to curative treatment (HR: 0.82, 95% CI: 0.69-0.99, p = 0,036). In this nation-wide cohort of esophageal- and junctional cancer patients, including data regarding many confounders, high education level was associated with greater probability of being offered curative treatment and improved survival. Copyright © 2017 Elsevier Ltd. All rights reserved.
Curating NASA's Past, Present, and Future Extraterrestrial Sample Collections
NASA Technical Reports Server (NTRS)
McCubbin, F. M.; Allton, J. H.; Evans, C. A.; Fries, M. D.; Nakamura-Messenger, K.; Righter, K.; Zeigler, R. A.; Zolensky, M.; Stansbery, E. K.
2016-01-01
The Astromaterials Acquisition and Curation Office (henceforth referred to herein as NASA Curation Office) at NASA Johnson Space Center (JSC) is responsible for curating all of NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials", JSC is charged with "...curation of all extra-terrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including "...documentation, preservation, preparation, and distribution of samples for research, education, and public outreach." Here we describe some of the past, present, and future activities of the NASA Curation Office.
Anderson, Abigail M.; Bailetti, Alessandro A.; Rodkin, Elizabeth; De, Atish; Bach, Erika A.
2017-01-01
A gain-of-function mutation in the tyrosine kinase JAK2 (JAK2V617F) causes human myeloproliferative neoplasms (MPNs). These patients present with high numbers of myeloid lineage cells and have numerous complications. Since current MPN therapies are not curative, there is a need to find new regulators and targets of Janus kinase/Signal transducer and activator of transcription (JAK/STAT) signaling that may represent additional clinical interventions . Drosophila melanogaster offers a low complexity model to study MPNs as JAK/STAT signaling is simplified with only one JAK [Hopscotch (Hop)] and one STAT (Stat92E). hopTumorous-lethal (Tum-l) is a gain-of-function mutation that causes dramatic expansion of myeloid cells, which then form lethal melanotic tumors. Through an F1 deficiency (Df) screen, we identified 11 suppressors and 35 enhancers of melanotic tumors in hopTum-l animals. Dfs that uncover the Hippo (Hpo) pathway genes expanded (ex) and warts (wts) strongly enhanced the hopTum-l tumor burden, as did mutations in ex, wts, and other Hpo pathway genes. Target genes of the Hpo pathway effector Yorkie (Yki) were significantly upregulated in hopTum-l blood cells, indicating that Yki signaling was increased. Ectopic hematopoietic activation of Yki in otherwise wild-type animals increased hemocyte proliferation but did not induce melanotic tumors. However, hematopoietic depletion of Yki significantly reduced the hopTum-l tumor burden, demonstrating that Yki is required for melanotic tumors in this background. These results support a model in which elevated Yki signaling increases the number of hemocytes, which become melanotic tumors as a result of elevated JAK/STAT signaling. PMID:28620086
NASA Astrophysics Data System (ADS)
Hedstrom, M. L.; Kumar, P.; Myers, J.; Plale, B. A.
2012-12-01
In data science, the most common sequence of steps for data curation are to 1) curate data, 2) enable data discovery, and 3) provide for data reuse. The Sustainable Environments - Actionable Data (SEAD) project, funded through NSF's DataNet program, is creating an environment for sustainability scientists to discover data first, reuse data next, and curate data though an on-going process that we call Active and Social Curation. For active curation we are developing tools and services that support data discovery, data management, and data enhancement for the community while the data is still being used actively for research. We are creating an Active Content Repository, using drop box, semantic web technologies, and a Flickr-like interface for researchers to "drop" data into a repository where it will be replicated and minimally discoverable. For social curation, we are deploying a social networking tool, VIVO, which will allow researchers to discover data-publications-people (e.g. expertise) through a route that can start at any of those entry points. The other dimension of social curation is developing mechanisms to open data for community input, for example, using ranking and commenting mechanisms for data sets and a community-sourcing capability to add tags, clean up and validate data sets. SEAD's strategies and services are aimed at the sustainability science community, which faces numerous challenges including discovery of useful data, cleaning noisy observational data, synthesizing data of different types, defining appropriate models, managing and preserving their research data, and conveying holistic results to colleagues, students, decision makers, and the public. Sustainability researchers make significant use of centrally managed data from satellites and national sensor networks, national scientific and statistical agencies, and data archives. At the same time, locally collected data and custom derived data products that combine observations and measurements from local, national, and global sources are critical resources that have disproportionately high value relative to their size. Sustainability science includes a diverse and growing community of domain scientists, policy makers, private sector investors, green manufacturers, citizen scientists, and informed consumers. These communities need actionable data in order to assess the impacts of alternate scenarios, evaluate the cost-benefit tradeoffs of different solutions, and defend their recommendations and decisions. SEAD's goal is to extend its services to other communities in the "long tail" that may benefit from new approaches to infrastructure development which take into account the social and economic characteristics of diverse and dispersed data producers and consumers. For example, one barrier to data reuse is the difficulty of discovering data that might be valuable for a particular study, model, or decision. Making data minimally discoverable saves the community time expended on futile searches and creates a market, of sorts, for the data. Creating very low barriers to entry to a network where data can be discovered and acted upon vastly reduces this disincentive to sharing data. SEAD's approach allows communities to make small incremental improvements in data curation based on their own priorities and needs.
Lamontagne, Maxime; Timens, Wim; Hao, Ke; Bossé, Yohan; Laviolette, Michel; Steiling, Katrina; Campbell, Joshua D; Couture, Christian; Conti, Massimo; Sherwood, Karen; Hogg, James C; Brandsma, Corry-Anke; van den Berge, Maarten; Sandford, Andrew; Lam, Stephen; Lenburg, Marc E; Spira, Avrum; Paré, Peter D; Nickle, David; Sin, Don D; Postma, Dirkje S
2014-11-01
COPD is a complex chronic disease with poorly understood pathogenesis. Integrative genomic approaches have the potential to elucidate the biological networks underlying COPD and lung function. We recently combined genome-wide genotyping and gene expression in 1111 human lung specimens to map expression quantitative trait loci (eQTL). To determine causal associations between COPD and lung function-associated single nucleotide polymorphisms (SNPs) and lung tissue gene expression changes in our lung eQTL dataset. We evaluated causality between SNPs and gene expression for three COPD phenotypes: FEV(1)% predicted, FEV(1)/FVC and COPD as a categorical variable. Different models were assessed in the three cohorts independently and in a meta-analysis. SNPs associated with a COPD phenotype and gene expression were subjected to causal pathway modelling and manual curation. In silico analyses evaluated functional enrichment of biological pathways among newly identified causal genes. Biologically relevant causal genes were validated in two separate gene expression datasets of lung tissues and bronchial airway brushings. High reliability causal relations were found in SNP-mRNA-phenotype triplets for FEV(1)% predicted (n=169) and FEV(1)/FVC (n=80). Several genes of potential biological relevance for COPD were revealed. eQTL-SNPs upregulating cystatin C (CST3) and CD22 were associated with worse lung function. Signalling pathways enriched with causal genes included xenobiotic metabolism, apoptosis, protease-antiprotease and oxidant-antioxidant balance. By using integrative genomics and analysing the relationships of COPD phenotypes with SNPs and gene expression in lung tissue, we identified CST3 and CD22 as potential causal genes for airflow obstruction. This study also augmented the understanding of previously described COPD pathways. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
NASA Astrophysics Data System (ADS)
Kawata, Y.; Niki, N.; Ohmatsu, H.; Satake, M.; Kusumoto, M.; Tsuchida, T.; Aokage, K.; Eguchi, K.; Kaneko, M.; Moriyama, N.
2014-03-01
In this work, we investigate a potential usefulness of a topic model-based categorization of lung cancers as quantitative CT biomarkers for predicting the recurrence risk after curative resection. The elucidation of the subcategorization of a pulmonary nodule type in CT images is an important preliminary step towards developing the nodule managements that are specific to each patient. We categorize lung cancers by analyzing volumetric distributions of CT values within lung cancers via a topic model such as latent Dirichlet allocation. Through applying our scheme to 3D CT images of nonsmall- cell lung cancer (maximum lesion size of 3 cm) , we demonstrate the potential usefulness of the topic model-based categorization of lung cancers as quantitative CT biomarkers.
PomBase: a comprehensive online resource for fission yeast
Wood, Valerie; Harris, Midori A.; McDowall, Mark D.; Rutherford, Kim; Vaughan, Brendan W.; Staines, Daniel M.; Aslett, Martin; Lock, Antonia; Bähler, Jürg; Kersey, Paul J.; Oliver, Stephen G.
2012-01-01
PomBase (www.pombase.org) is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance. PMID:22039153
BCL-2 Antagonism to Target the Intrinsic Mitochondrial Pathway of Apoptosis
Gibson, Christopher J.; Davids, Matthew S.
2015-01-01
Despite significant improvements in treatment, cure rates for many cancers remain suboptimal. The rise of cytotoxic chemotherapy has led to curative therapy for a subset of cancers, though intrinsic treatment resistance is difficult to predict for individual patients. The recent wave of molecularly targeted therapies has focused on druggable activating mutations, and is thus limited to specific subsets of patients. The lessons learned from these two disparate approaches suggest the need for therapies that borrow aspects of both, targeting biological properties of cancer that are at once distinct from normal cells and yet common enough to make the drugs widely applicable across a range of cancer subtypes. The intrinsic mitochondrial pathway of apoptosis represents one such promising target for new therapies, and successfully targeting this pathway has the potential to alter the therapeutic landscape of therapy for a variety of cancers. Here, we discuss the biology of the intrinsic pathway of apoptosis, an assay known as BH3 profiling that can interrogate this pathway, early attempts to target BCL-2 clinically, and the recent promising results with the BCL-2 antagonist venetoclax (ABT-199) in clinical trials in hematologic malignancies. PMID:26567361
The Listeria monocytogenes strain 10403S BioCyc database
Orsi, Renato H.; Bergholz, Teresa M.; Wiedmann, Martin; Boor, Kathryn J.
2015-01-01
Listeria monocytogenes is a food-borne pathogen of humans and other animals. The striking ability to survive several stresses usually used for food preservation makes L. monocytogenes one of the biggest concerns to the food industry, while the high mortality of listeriosis in specific groups of humans makes it a great concern for public health. Previous studies have shown that a regulatory network involving alternative sigma (σ) factors and transcription factors is pivotal to stress survival. However, few studies have evaluated at the metabolic networks controlled by these regulatory mechanisms. The L. monocytogenes BioCyc database uses the strain 10403S as a model. Computer-generated initial annotation for all genes also allowed for identification, annotation and display of predicted reactions and pathways carried out by a single cell. Further ongoing manual curation based on published data as well as database mining for selected genes allowed the more refined annotation of functions, which, in turn, allowed for annotation of new pathways and fine-tuning of previously defined pathways to more L. monocytogenes-specific pathways. Using RNA-Seq data, several transcription start sites and promoter regions were mapped to the 10403S genome and annotated within the database. Additionally, the identification of promoter regions and a comprehensive review of available literature allowed the annotation of several regulatory interactions involving σ factors and transcription factors. The L. monocytogenes 10403S BioCyc database is a new resource for researchers studying Listeria and related organisms. It allows users to (i) have a comprehensive view of all reactions and pathways predicted to take place within the cell in the cellular overview, as well as to (ii) upload their own data, such as differential expression data, to visualize the data in the scope of predicted pathways and regulatory networks and to carry on enrichment analyses using several different annotations available within the database. Database URL: http://biocyc.org/organism-summary?object=10403S_RAST PMID:25819074
The Listeria monocytogenes strain 10403S BioCyc database.
Orsi, Renato H; Bergholz, Teresa M; Wiedmann, Martin; Boor, Kathryn J
2015-01-01
Listeria monocytogenes is a food-borne pathogen of humans and other animals. The striking ability to survive several stresses usually used for food preservation makes L. monocytogenes one of the biggest concerns to the food industry, while the high mortality of listeriosis in specific groups of humans makes it a great concern for public health. Previous studies have shown that a regulatory network involving alternative sigma (σ) factors and transcription factors is pivotal to stress survival. However, few studies have evaluated at the metabolic networks controlled by these regulatory mechanisms. The L. monocytogenes BioCyc database uses the strain 10403S as a model. Computer-generated initial annotation for all genes also allowed for identification, annotation and display of predicted reactions and pathways carried out by a single cell. Further ongoing manual curation based on published data as well as database mining for selected genes allowed the more refined annotation of functions, which, in turn, allowed for annotation of new pathways and fine-tuning of previously defined pathways to more L. monocytogenes-specific pathways. Using RNA-Seq data, several transcription start sites and promoter regions were mapped to the 10403S genome and annotated within the database. Additionally, the identification of promoter regions and a comprehensive review of available literature allowed the annotation of several regulatory interactions involving σ factors and transcription factors. The L. monocytogenes 10403S BioCyc database is a new resource for researchers studying Listeria and related organisms. It allows users to (i) have a comprehensive view of all reactions and pathways predicted to take place within the cell in the cellular overview, as well as to (ii) upload their own data, such as differential expression data, to visualize the data in the scope of predicted pathways and regulatory networks and to carry on enrichment analyses using several different annotations available within the database. © The Author(s) 2015. Published by Oxford University Press.
BioModels.net Web Services, a free and integrated toolkit for computational modelling software.
Li, Chen; Courtot, Mélanie; Le Novère, Nicolas; Laibe, Camille
2010-05-01
Exchanging and sharing scientific results are essential for researchers in the field of computational modelling. BioModels.net defines agreed-upon standards for model curation. A fundamental one, MIRIAM (Minimum Information Requested in the Annotation of Models), standardises the annotation and curation process of quantitative models in biology. To support this standard, MIRIAM Resources maintains a set of standard data types for annotating models, and provides services for manipulating these annotations. Furthermore, BioModels.net creates controlled vocabularies, such as SBO (Systems Biology Ontology) which strictly indexes, defines and links terms used in Systems Biology. Finally, BioModels Database provides a free, centralised, publicly accessible database for storing, searching and retrieving curated and annotated computational models. Each resource provides a web interface to submit, search, retrieve and display its data. In addition, the BioModels.net team provides a set of Web Services which allows the community to programmatically access the resources. A user is then able to perform remote queries, such as retrieving a model and resolving all its MIRIAM Annotations, as well as getting the details about the associated SBO terms. These web services use established standards. Communications rely on SOAP (Simple Object Access Protocol) messages and the available queries are described in a WSDL (Web Services Description Language) file. Several libraries are provided in order to simplify the development of client software. BioModels.net Web Services make one step further for the researchers to simulate and understand the entirety of a biological system, by allowing them to retrieve biological models in their own tool, combine queries in workflows and efficiently analyse models.
WormBase 2014: new views of curated biology
Harris, Todd W.; Baran, Joachim; Bieri, Tamberlyn; Cabunoc, Abigail; Chan, Juancarlos; Chen, Wen J.; Davis, Paul; Done, James; Grove, Christian; Howe, Kevin; Kishore, Ranjana; Lee, Raymond; Li, Yuling; Muller, Hans-Michael; Nakamura, Cecilia; Ozersky, Philip; Paulini, Michael; Raciti, Daniela; Schindelman, Gary; Tuli, Mary Ann; Auken, Kimberly Van; Wang, Daniel; Wang, Xiaodong; Williams, Gary; Wong, J. D.; Yook, Karen; Schedl, Tim; Hodgkin, Jonathan; Berriman, Matthew; Kersey, Paul; Spieth, John; Stein, Lincoln; Sternberg, Paul W.
2014-01-01
WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest. PMID:24194605
Curating NASA's Extraterrestrial Samples - Past, Present, and Future
NASA Technical Reports Server (NTRS)
Allen, Carlton; Allton, Judith; Lofgren, Gary; Righter, Kevin; Zolensky, Michael
2011-01-01
Curation of extraterrestrial samples is the critical interface between sample return missions and the international research community. The Astromaterials Acquisition and Curation Office at the NASA Johnson Space Center (JSC) is responsible for curating NASA s extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials", JSC is charged with ". . . curation of all extraterrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including "documentation, preservation, preparation, and distribution of samples for research, education, and public outreach."
Curating NASA's Extraterrestrial Samples - Past, Present, and Future
NASA Technical Reports Server (NTRS)
Allen, Carlton; Allton, Judith; Lofgren, Gary; Righter, Kevin; Zolensky, Michael
2010-01-01
Curation of extraterrestrial samples is the critical interface between sample return missions and the international research community. The Astromaterials Acquisition and Curation Office at the NASA Johnson Space Center (JSC) is responsible for curating NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials," JSC is charged with ". . . curation of all extraterrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including documentation, preservation, preparation, and distribution of samples for research, education, and public outreach.
Melroy-Greif, Whitney E; Simonson, Matthew A; Corley, Robin P; Lutz, Sharon M; Hokanson, John E; Ehringer, Marissa A
2017-04-01
Cigarette smoking is a physiologically harmful habit. Nicotinic acetylcholine receptors (nAChRs) are bound by nicotine and upregulated in response to chronic exposure to nicotine. It is known that upregulation of these receptors is not due to a change in mRNA of these genes, however, more precise details on the process are still uncertain, with several plausible hypotheses describing how nAChRs are upregulated. We have manually curated a set of genes believed to play a role in nicotine-induced nAChR upregulation. Here, we test the hypothesis that these genes are associated with and contribute risk for nicotine dependence (ND) and the number of cigarettes smoked per day (CPD). Studies with genotypic data on European and African Americans (EAs and AAs, respectively) were collected and a gene-based test was run to test for an association between each gene and ND and CPD. Although several novel genes were associated with CPD and ND at P < 0.05 in EAs and AAs, these associations did not survive correction for multiple testing. Previous associations between CHRNA3, CHRNA5, CHRNB4 and CPD in EAs were replicated. Our hypothesis-driven approach avoided many of the limitations inherent in pathway analyses and provided nominal evidence for association between cholinergic-related genes and nicotine behaviors. We evaluated the evidence for association between a manually curated set of genes and nicotine behaviors in European and African Americans. Although no genes were associated after multiple testing correction, this study has several strengths: by manually curating a set of genes we circumvented the limitations inherent in many pathway analyses and tested several genes that had not yet been examined in a human genetic study; gene-based tests are a useful way to test for association with a set of genes; and these genes were collected based on literature review and conversations with experts, highlighting the importance of scientific collaboration. © The Author 2016. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Szostak, Justyna; Martin, Florian; Talikka, Marja; Peitsch, Manuel C; Hoeng, Julia
2016-01-01
The cellular and molecular mechanisms behind the process of atherosclerotic plaque destabilization are complex, and molecular data from aortic plaques are difficult to interpret. Biological network models may overcome these difficulties and precisely quantify the molecular mechanisms impacted during disease progression. The atherosclerosis plaque destabilization biological network model was constructed with the semiautomated curation pipeline, BELIEF. Cellular and molecular mechanisms promoting plaque destabilization or rupture were captured in the network model. Public transcriptomic data sets were used to demonstrate the specificity of the network model and to capture the different mechanisms that were impacted in ApoE -/- mouse aorta at 6 and 32 weeks. We concluded that network models combined with the network perturbation amplitude algorithm provide a sensitive, quantitative method to follow disease progression at the molecular level. This approach can be used to investigate and quantify molecular mechanisms during plaque progression.
Curating NASA's Future Extraterrestrial Sample Collections: How Do We Achieve Maximum Proficiency?
NASA Technical Reports Server (NTRS)
McCubbin, Francis; Evans, Cynthia; Zeigler, Ryan; Allton, Judith; Fries, Marc; Righter, Kevin; Zolensky, Michael
2016-01-01
The Astromaterials Acquisition and Curation Office (henceforth referred to herein as NASA Curation Office) at NASA Johnson Space Center (JSC) is responsible for curating all of NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials", JSC is charged with "The curation of all extraterrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including "... documentation, preservation, preparation, and distribution of samples for research, education, and public outreach." Here we describe some of the ongoing efforts to ensure that the future activities of the NASA Curation Office are working towards a state of maximum proficiency.
Steuer, Ralf
2017-01-01
Anabaena sp. PCC 7120 is a nitrogen-fixing filamentous cyanobacterium. Under nitrogen-limiting conditions, a fraction of the vegetative cells in each filament terminally differentiate to nongrowing heterocysts. Heterocysts are metabolically and structurally specialized to enable O2-sensitive nitrogen fixation. The functionality of the filament, as an association of vegetative cells and heterocysts, is postulated to depend on metabolic exchange of electrons, carbon, and fixed nitrogen. In this study, we compile and evaluate a comprehensive curated stoichiometric model of this two-cell system, with the objective function based on the growth of the filament under diazotrophic conditions. The predicted growth rate under nitrogen-replete and -deplete conditions, as well as the effect of external carbon and nitrogen sources, was thereafter verified. Furthermore, the model was utilized to comprehensively evaluate the optimality of putative metabolic exchange reactions between heterocysts and vegetative cells. The model suggested that optimal growth requires at least four exchange metabolites. Several combinations of exchange metabolites resulted in predicted growth rates that are higher than growth rates achieved by only considering exchange of metabolites previously suggested in the literature. The curated model of the metabolic network of Anabaena sp. PCC 7120 enhances our ability to understand the metabolic organization of multicellular cyanobacteria and provides a platform for further study and engineering of their metabolism. PMID:27899536
Molecular classification of gastric cancer: Towards a pathway-driven targeted therapy
Espinoza, Jaime A.; Weber, Helga; García, Patricia; Nervi, Bruno; Garrido, Marcelo; Corvalán, Alejandro H.; Roa, Juan Carlos; Bizama, Carolina
2015-01-01
Gastric cancer (GC) is the third leading cause of cancer mortality worldwide. Although surgical resection is a potentially curative approach for localized cases of GC, most cases of GC are diagnosed in an advanced, non-curable stage and the response to traditional chemotherapy is limited. Fortunately, recent advances in our understanding of the molecular mechanisms that mediate GC hold great promise for the development of more effective treatment strategies. In this review, an overview of the morphological classification, current treatment approaches, and molecular alterations that have been characterized for GC are provided. In particular, the most recent molecular classification of GC and alterations identified in relevant signaling pathways, including ErbB, VEGF, PI3K/AKT/mTOR, and HGF/MET signaling pathways, are described, as well as inhibitors of these pathways. An overview of the completed and active clinical trials related to these signaling pathways are also summarized. Finally, insights regarding emerging stem cell pathways are described, and may provide additional novel markers for the development of therapeutic agents against GC. The development of more effective agents and the identification of biomarkers that can be used for the diagnosis, prognosis, and individualized therapy for GC patients, have the potential to improve the efficacy, safety, and cost-effectiveness for GC treatments. PMID:26267324
Song, Zhenhua; Zhang, Chi; He, Lingxiao; Sui, Yanfang; Lin, Xiafei; Pan, Jingjing
2018-06-12
Osteoarthritis (OA) is the most common form of joint disease. The development of inflammation have been considered to play a key role during the progression of OA. Regulatory pathways are known to play crucial roles in many pathogenic processes. Thus, deciphering these risk regulatory pathways is critical for elucidating the mechanisms underlying OA. We constructed an OA-specific regulatory network by integrating comprehensive curated transcription and post-transcriptional resource involving transcription factor (TF) and microRNA (miRNA). To deepen our understanding of underlying molecular mechanisms of OA, we developed an integrated systems approach to identify OA-specific risk regulatory pathways. In this study, we identified 89 significantly differentially expressed genes between normal and inflamed areas of OA patients. We found the OA-specific regulatory network was a standard scale-free network with small-world properties. It significant enriched many immune response-related functions including leukocyte differentiation, myeloid differentiation and T cell activation. Finally, 141 risk regulatory pathways were identified based on OA-specific regulatory network, which contains some known regulator of OA. The risk regulatory pathways may provide clues for the etiology of OA and be a potential resource for the discovery of novel OA-associated disease genes. Copyright © 2018 Elsevier Inc. All rights reserved.
The Role of Community-Driven Data Curation for Enterprises
NASA Astrophysics Data System (ADS)
Curry, Edward; Freitas, Andre; O'Riáin, Sean
With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.
Australian contemporary management of synchronous metastatic colorectal cancer.
Malouf, Phillip; Gibbs, Peter; Shapiro, Jeremy; Sockler, Jim; Bell, Stephen
2018-01-01
This article outlines the current Australian multidisciplinary treatment of synchronous metastatic colorectal adenocarcinoma and assesses the factors that influence patient outcome. This is a retrospective analysis of the prospective 'Treatment of Recurrent and Advanced Colorectal Cancer' registry, describing the patient treatment pathway and documenting the extent of disease, resection of the colorectal primary and metastases, chemotherapy and biological therapy use. Cox regression models for progression-free and overall survival were constructed with a comprehensive set of clinical variables. Analysis was intentionn-ton-treat, quantifying the effect of treatment intent decided at the multidisciplinary team meeting (MDT). One thousand one hundred and nine patients presented with synchronous metastatic disease between July 2009 and November 2015. Median follow-up was 15.8 months; 4.4% (group 1) had already curative resections of primary and metastases prior to MDT, 22.2% (group 2) were considered curative but were referred to MDT for opinion and/or medical oncology treatment prior to resection and 70.2% were considered palliative at MDT (group 3). Overall, 83% received chemotherapy, 55% had their primary resected and 23% had their metastases resected; 13% of resections were synchronous, 20% were staged with primary resected first and 62% had only the colorectal primary managed surgically. Performance status, metastasis resection (R0 versus R1 versus R2 versus no resection), resection of the colorectal primary and treatment intent determined at MDT were the most significant factors for progression-free and overall survival. This is the largest Australian series of synchronous metastatic colorectal adenocarcinoma and offers insight into the nature and utility of contemporary practice. © 2016 Royal Australasian College of Surgeons.
Zhang, Peifen; Dreher, Kate; Karthikeyan, A.; Chi, Anjo; Pujar, Anuradha; Caspi, Ron; Karp, Peter; Kirkup, Vanessa; Latendresse, Mario; Lee, Cynthia; Mueller, Lukas A.; Muller, Robert; Rhee, Seung Yon
2010-01-01
Metabolic networks reconstructed from sequenced genomes or transcriptomes can help visualize and analyze large-scale experimental data, predict metabolic phenotypes, discover enzymes, engineer metabolic pathways, and study metabolic pathway evolution. We developed a general approach for reconstructing metabolic pathway complements of plant genomes. Two new reference databases were created and added to the core of the infrastructure: a comprehensive, all-plant reference pathway database, PlantCyc, and a reference enzyme sequence database, RESD, for annotating metabolic functions of protein sequences. PlantCyc (version 3.0) includes 714 metabolic pathways and 2,619 reactions from over 300 species. RESD (version 1.0) contains 14,187 literature-supported enzyme sequences from across all kingdoms. We used RESD, PlantCyc, and MetaCyc (an all-species reference metabolic pathway database), in conjunction with the pathway prediction software Pathway Tools, to reconstruct a metabolic pathway database, PoplarCyc, from the recently sequenced genome of Populus trichocarpa. PoplarCyc (version 1.0) contains 321 pathways with 1,807 assigned enzymes. Comparing PoplarCyc (version 1.0) with AraCyc (version 6.0, Arabidopsis [Arabidopsis thaliana]) showed comparable numbers of pathways distributed across all domains of metabolism in both databases, except for a higher number of AraCyc pathways in secondary metabolism and a 1.5-fold increase in carbohydrate metabolic enzymes in PoplarCyc. Here, we introduce these new resources and demonstrate the feasibility of using them to identify candidate enzymes for specific pathways and to analyze metabolite profiling data through concrete examples. These resources can be searched by text or BLAST, browsed, and downloaded from our project Web site (http://plantcyc.org). PMID:20522724
Miyata, Tatsunori; Yamashita, Yo-Ichi; Yamao, Takanobu; Umezaki, Naoki; Tsukamoto, Masayo; Kitano, Yuki; Yamamura, Kensuke; Arima, Kota; Kaida, Takayoshi; Nakagawa, Shigeki; Imai, Katsunori; Hashimoto, Daisuke; Chikamoto, Akira; Ishiko, Takatoshi; Baba, Hideo
2017-06-01
The postoperative complication is one of an indicator of poor prognosis in patients with several gastroenterological cancers after curative operations. We, herein, examined prognostic impacts of postoperative complications in patients with intrahepatic cholangiocarcinoma after curative operations. We retrospectively analyzed 60 patients with intrahepatic cholangiocarcinoma who underwent primary curative operations from June 2002 to February 2016. Prognostic impacts of postoperative complications were analyzed using log-rank test and Cox proportional hazard model. Postoperative complications (Clavien-Dindo classification grade 3 or more) occurred in 13 patients (21.7%). Overall survival of patients without postoperative complications was significantly better than that of patients with postoperative complications (p = 0.025). Postoperative complications are independent prognostic factor of overall survival (hazard ratio 3.02; p = 0.030). In addition, bile duct resection and reconstruction (Odds ratio 59.1; p = 0.002) and hepatitis C virus antibody positive (Odds ratio 7.14; p= 0.022), and lymph node dissection (Odds ratio 6.28; p = 0.040) were independent predictors of postoperative complications. Postoperative complications may be an independent predictor of poorer survival in patients with intrahepatic cholangiocarcinoma after curative operations. Lymph node dissection and bile duct resection and reconstruction were risk factors for postoperative complications, therefore we should pay attentions to perform lymph node dissections, bile duct resection and reconstruction in patients with intrahepatic cholangiocarcinoma.
2013-01-01
Background The objective of this study was to compare the socioeconomic and family characteristics of underprivileged schoolchildren with and without curative dental needs participating in a dental health program. Methods A random sample of 1411 of 8-to-10 year-old Brazilian schoolchildren was examined and two sample groups were included in the cross-sectional study: 544 presented curative dental needs and the other 867 schoolchildren were without curative dental needs. The schoolchildren were examined for the presence of caries lesions using the DMFT index and their parents were asked to answer questions about socioenvironmental characteristics of their families. Logistic regression models were adjusted estimating the Odds Ratios (OR), their 95% confidence intervals (CI), and significance levels. Results After adjusting for potential confounders, it was found that families earning more than one Brazilian minimum wage, having fewer than four residents in the house, families living in homes owned by them, and children living with both biological parents were protective factors for the presence of dental caries, and consequently, curative dental needs. Conclusions Socioeconomic status and family structure influences the curative dental needs of children from underprivileged communities. In this sense, dental health programs should plan and implement strategic efforts to reduce inequities in oral health status and access to oral health services of vulnerable schoolchildren and their families. PMID:24138683
Zhao, Min; Li, Zhe; Qu, Hong
2015-01-01
Metastasis suppressor genes (MS genes) are genes that play important roles in inhibiting the process of cancer metastasis without preventing growth of the primary tumor. Identification of these genes and understanding their functions are critical for investigation of cancer metastasis. Recent studies on cancer metastasis have identified many new susceptibility MS genes. However, the comprehensive illustration of diverse cellular processes regulated by metastasis suppressors during the metastasis cascade is lacking. Thus, the relationship between MS genes and cancer risk is still unclear. To unveil the cellular complexity of MS genes, we have constructed MSGene (http://MSGene.bioinfo-minzhao.org/), the first literature-based gene resource for exploring human MS genes. In total, we manually curated 194 experimentally verified MS genes and mapped to 1448 homologous genes from 17 model species. Follow-up functional analyses associated 194 human MS genes with epithelium/tissue morphogenesis and epithelia cell proliferation. In addition, pathway analysis highlights the prominent role of MS genes in activation of platelets and coagulation system in tumor metastatic cascade. Moreover, global mutation pattern of MS genes across multiple cancers may reveal common cancer metastasis mechanisms. All these results illustrate the importance of MSGene to our understanding on cell development and cancer metastasis. PMID:26486520
Metabolic pathways for the whole community.
Hanson, Niels W; Konwar, Kishori M; Hawley, Alyse K; Altman, Tomer; Karp, Peter D; Hallam, Steven J
2014-07-22
A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change. Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools' performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients. This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.
Ding, Yi; Qiao, Youbei; Wang, Min; Zhang, Huinan; Li, Liang; Zhang, Yikai; Ge, Jie; Song, Ying; Li, Yuwen; Wen, Aidong
2016-08-01
Acetyl-11-keto-β-boswellic acid (AKBA), a main active constituent from Boswellia serrata resin, is a novel candidate for therapy of cerebral ischemia-reperfusion (I/R) injury. Nevertheless, its poor solubility in aqueous solvent, bioavailability, and rapid clearance limit its curative efficacy. To enhance its potency, in our study, AKBA-loaded o-carboxymethyl chitosan nanoparticle (AKBA-NP) delivery system was synthesized. The transmission electron microscopy and transmission electron microscope images of AKBA-NPs suggested that particle size was 132 ± 18 nm, and particles were spherical in shape with smooth morphology. In pharmacokinetics study, AKBA-NPs apparently increases the area under the curve of plasma concentration-time and prolonged half-life compared with AKBA. The tissue distribution study confirmed that AKBA-NPs had a better brain delivery efficacy in comparison with AKBA. The results from our pharmacodynamic studies showed that AKBA-NPs possess better neuroprotection compared with AKBA in primary neurons with oxygen-glucose deprivation (OGD) model and in animals with middle cerebral artery occlusion (MCAO) model. Additionally, AKBA-NPs modulate antioxidant and anti-inflammatory pathways more effectively than AKBA by increasing nuclear erythroid 2-related factor 2 and heme oxygenase-1 expression, and by decreasing nuclear factor-kappa B and 5-lipoxygenase expression. Collectively, our results suggest that AKBA-NPs serve as a potent delivery vehicle for AKBA in cerebral ischemic therapy.
Prostate Cancer Radiation Therapy and Risk of Thromboembolic Events
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bosco, Cecilia, E-mail: Cecilia.t.bosco@kcl.ac.uk; Garmo, Hans; Regional Cancer Centre, Uppsala, Akademiska Sjukhuset, Uppsala
Purpose: To investigate the risk of thromboembolic disease (TED) after radiation therapy (RT) with curative intent for prostate cancer (PCa). Patients and Methods: We identified all men who received RT as curative treatment (n=9410) and grouped according to external beam RT (EBRT) or brachytherapy (BT). By comparing with an age- and county-matched comparison cohort of PCa-free men (n=46,826), we investigated risk of TED after RT using Cox proportional hazard regression models. The model was adjusted for tumor characteristics, demographics, comorbidities, PCa treatments, and known risk factors of TED, such as recent surgery and disease progression. Results: Between 2006 and 2013, 6232more » men with PCa received EBRT, and 3178 underwent BT. A statistically significant association was found between EBRT and BT and risk of pulmonary embolism in the crude analysis. However, upon adjusting for known TED risk factors these associations disappeared. No significant associations were found between BT or EBRT and deep venous thrombosis. Conclusion: Curative RT for prostate cancer using contemporary methodologies was not associated with an increased risk of TED.« less
A Rich-Club Organization in Brain Ischemia Protein Interaction Network
Alawieh, Ali; Sabra, Zahraa; Sabra, Mohammed; Tomlinson, Stephen; Zaraket, Fadi A.
2015-01-01
Ischemic stroke involves multiple pathophysiological mechanisms with complex interactions. Efforts to decipher those mechanisms and understand the evolution of cerebral injury is key for developing successful interventions. In an innovative approach, we use literature mining, natural language processing and systems biology tools to construct, annotate and curate a brain ischemia interactome. The curated interactome includes proteins that are deregulated after cerebral ischemia in human and experimental stroke. Network analysis of the interactome revealed a rich-club organization indicating the presence of a densely interconnected hub structure of prominent contributors to disease pathogenesis. Functional annotation of the interactome uncovered prominent pathways and highlighted the critical role of the complement and coagulation cascade in the initiation and amplification of injury starting by activation of the rich-club. We performed an in-silico screen for putative interventions that have pleiotropic effects on rich-club components and we identified estrogen as a prominent candidate. Our findings show that complex network analysis of disease related interactomes may lead to a better understanding of pathogenic mechanisms and provide cost-effective and mechanism-based discovery of candidate therapeutics. PMID:26310627
MGDB: a comprehensive database of genes involved in melanoma.
Zhang, Di; Zhu, Rongrong; Zhang, Hanqian; Zheng, Chun-Hou; Xia, Junfeng
2015-01-01
The Melanoma Gene Database (MGDB) is a manually curated catalog of molecular genetic data relating to genes involved in melanoma. The main purpose of this database is to establish a network of melanoma related genes and to facilitate the mechanistic study of melanoma tumorigenesis. The entries describing the relationships between melanoma and genes in the current release were manually extracted from PubMed abstracts, which contains cumulative to date 527 human melanoma genes (422 protein-coding and 105 non-coding genes). Each melanoma gene was annotated in seven different aspects (General Information, Expression, Methylation, Mutation, Interaction, Pathway and Drug). In addition, manually curated literature references have also been provided to support the inclusion of the gene in MGDB and establish its association with melanoma. MGDB has a user-friendly web interface with multiple browse and search functions. We hoped MGDB will enrich our knowledge about melanoma genetics and serve as a useful complement to the existing public resources. Database URL: http://bioinfo.ahu.edu.cn:8080/Melanoma/index.jsp. © The Author(s) 2015. Published by Oxford University Press.
Using Data From Ontario's Episode-Based Funding Model to Assess Quality of Chemotherapy.
Kaizer, Leonard; Simanovski, Vicky; Lalonde, Carlin; Tariq, Huma; Blais, Irene; Evans, William K
2016-10-01
A new episode-based funding model for ambulatory systemic therapy was implemented in Ontario, Canada on April 1, 2014, after a comprehensive knowledge transfer and exchange strategy with providers and administrators. An analysis of the data from the first year of the new funding model provided an opportunity to assess the quality of chemotherapy, which was not possible under the old funding model. Options for chemotherapy regimens given with adjuvant/curative intent or palliative intent were informed by input from disease site groups. Bundles were developed and priced to enable evidence-informed best practice. Analysis of systemic therapy utilization after model implementation was performed to assess the concordance rate of the treatments chosen with recommended practice. The actual number of cycles of treatment delivered was also compared with expert recommendations. Significant improvement compared with baseline was seen in the proportion of adjuvant/curative regimens that aligned with disease site group-recommended options (98% v 90%). Similar improvement was seen for palliative regimens (94% v 89%). However, overall, the number of cycles of adjuvant/curative therapy delivered was lower than recommended best practice in 57.5% of patients. There was significant variation by disease site and between facilities. Linking funding to quality, supported by knowledge transfer and exchange, resulted in a rapid improvement in the quality of systemic treatment in Ontario. This analysis has also identified further opportunities for improvement and the need for model refinement.
Xtalk: a path-based approach for identifying crosstalk between signaling pathways
Tegge, Allison N.; Sharp, Nicholas; Murali, T. M.
2016-01-01
Motivation: Cells communicate with their environment via signal transduction pathways. On occasion, the activation of one pathway can produce an effect downstream of another pathway, a phenomenon known as crosstalk. Existing computational methods to discover such pathway pairs rely on simple overlap statistics. Results: We present Xtalk, a path-based approach for identifying pairs of pathways that may crosstalk. Xtalk computes the statistical significance of the average length of multiple short paths that connect receptors in one pathway to the transcription factors in another. By design, Xtalk reports the precise interactions and mechanisms that support the identified crosstalk. We applied Xtalk to signaling pathways in the KEGG and NCI-PID databases. We manually curated a gold standard set of 132 crosstalking pathway pairs and a set of 140 pairs that did not crosstalk, for which Xtalk achieved an area under the receiver operator characteristic curve of 0.65, a 12% improvement over the closest competing approach. The area under the receiver operator characteristic curve varied with the pathway, suggesting that crosstalk should be evaluated on a pathway-by-pathway level. We also analyzed an extended set of 658 pathway pairs in KEGG and to a set of more than 7000 pathway pairs in NCI-PID. For the top-ranking pairs, we found substantial support in the literature (81% for KEGG and 78% for NCI-PID). We provide examples of networks computed by Xtalk that accurately recovered known mechanisms of crosstalk. Availability and implementation: The XTALK software is available at http://bioinformatics.cs.vt.edu/~murali/software. Crosstalk networks are available at http://graphspace.org/graphs?tags=2015-bioinformatics-xtalk. Contact: ategge@vt.edu, murali@cs.vt.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26400040
Text mining and expert curation to develop a database on psychiatric diseases and their genes
Gutiérrez-Sacristán, Alba; Bravo, Àlex; Portero-Tresserra, Marta; Valverde, Olga; Armario, Antonio; Blanco-Gandía, M.C.; Farré, Adriana; Fernández-Ibarrondo, Lierni; Fonseca, Francina; Giraldo, Jesús; Leis, Angela; Mané, Anna; Mayer, M.A.; Montagud-Romero, Sandra; Nadal, Roser; Ortiz, Jordi; Pavon, Francisco Javier; Perez, Ezequiel Jesús; Rodríguez-Arias, Marta; Serrano, Antonia; Torrens, Marta; Warnault, Vincent; Sanz, Ferran
2017-01-01
Abstract Psychiatric disorders constitute one of the main causes of disability worldwide. During the past years, considerable research has been conducted on the genetic architecture of such diseases, although little understanding of their etiology has been achieved. The difficulty to access up-to-date, relevant genotype-phenotype information has hampered the application of this wealth of knowledge to translational research and clinical practice in order to improve diagnosis and treatment of psychiatric patients. PsyGeNET (http://www.psygenet.org/) has been developed with the aim of supporting research on the genetic architecture of psychiatric diseases, by providing integrated and structured accessibility to their genotype–phenotype association data, together with analysis and visualization tools. In this article, we describe the protocol developed for the sustainable update of this knowledge resource. It includes the recruitment of a team of domain experts in order to perform the curation of the data extracted by text mining. Annotation guidelines and a web-based annotation tool were developed to support the curators’ tasks. A curation workflow was designed including a pilot phase and two rounds of curation and analysis phases. Negative evidence from the literature on gene–disease associations (GDAs) was taken into account in the curation process. We report the results of the application of this workflow to the curation of GDAs for PsyGeNET, including the analysis of the inter-annotator agreement and suggest this model as a suitable approach for the sustainable development and update of knowledge resources. Database URL: http://www.psygenet.org PsyGeNET corpus: http://www.psygenet.org/ds/PsyGeNET/results/psygenetCorpus.tar PMID:29220439
Meunier, Katy; Ferron, Marianne; Calmel, Claire; Fléjou, Jean-François; Pocard, Marc; Praz, Françoise
2017-09-01
Colorectal cancers (CRCs) displaying microsatellite instability (MSI) most often result from MLH1 deficiency. The aim of this study was to assess the impact of MLH1 expression per se on tumor evolution after curative surgical resection using a xenograft tumor model. Transplantable tumors established with the human MLH1-deficient HCT116 cell line and its MLH1-complemented isogenic clone, mlh1-3, were implanted onto the caecum of NOD/SCID mice. Curative surgical resection was performed at day 10 in half of the animals. The HCT116-derived tumors were more voluminous compared to the mlh1-3 ones (P = .001). Lymph node metastases and peritoneal carcinomatosis occurred significantly more often in the group of mice grafted with HCT116 (P = .007 and P = .035, respectively). Mlh1-3-grafted mice did not develop peritoneal carcinomatosis or liver metastasis. After surgical resection, lymph node metastases only arose in the group of mice implanted with HCT116 and the rate of cure was significantly lower than in the mlh1-3 group (P = .047). The murine orthotopic xenograft model based on isogenic human CRC cell lines allowed us to reveal the impact of MLH1 expression on tumor evolution in mice who underwent curative surgical resection and in mice whose tumor was left in situ. Our data indicate that the behavior of MLH1-deficient CRC is not only governed by mutations arising in genes harboring microsatellite repeated sequences but also from their defect in MLH1 as such. © 2017 Wiley Periodicals, Inc.
Lucidi, Valerio; Hendlisz, Alain; Van Laethem, Jean-Luc; Donckier, Vincent
2016-04-21
In oncosurgical approach to colorectal liver metastases, surgery remains considered as the only potentially curative option, while chemotherapy alone represents a strictly palliative treatment. However, missing metastases, defined as metastases disappearing after chemotherapy, represent a unique model to evaluate the curative potential of chemotherapy and to challenge current therapeutic algorithms. We reviewed recent series on missing colorectal liver metastases to evaluate incidence of this phenomenon, predictive factors and rates of cure defined by complete pathologic response in resected missing metastases and sustained clinical response when they were left unresected. According to the progresses in the efficacy of chemotherapeutic regimen, the incidence of missing liver metastases regularly increases these last years. Main predictive factors are small tumor size, low marker level, duration of chemotherapy, and use of intra-arterial chemotherapy. Initial series showed low rates of complete pathologic response in resected missing metastases and high recurrence rates when unresected. However, recent reports describe complete pathologic responses and sustained clinical responses reaching 50%, suggesting that chemotherapy could be curative in some cases. Accordingly, in case of missing colorectal liver metastases, the classical recommendation to resect initial tumor sites might have become partially obsolete. Furthermore, the curative effect of chemotherapy in selected cases could lead to a change of paradigm in patients with unresectable liver-only metastases, using intensive first-line chemotherapy to intentionally induce missing metastases, followed by adjuvant surgery on remnant chemoresistant tumors and close surveillance of initial sites that have been left unresected.
ERIC Educational Resources Information Center
Shorish, Yasmeen
2012-01-01
This article describes the fundamental challenges to data curation, how these challenges may be compounded for smaller institutions, and how data management is an essential and manageable component of data curation. Data curation is often discussed within the confines of large research universities. As a result, master's and baccalaureate…
2012-01-01
Background Huntington’s disease (HD) is a fatal progressive neurodegenerative disorder caused by the expansion of the polyglutamine repeat region in the huntingtin gene. Although the disease is triggered by the mutation of a single gene, intensive research has linked numerous other genes to its pathogenesis. To obtain a systematic overview of these genes, which may serve as therapeutic targets, CHDI Foundation has recently established the HD Research Crossroads database. With currently over 800 cataloged genes, this web-based resource constitutes the most extensive curation of genes relevant to HD. It provides us with an unprecedented opportunity to survey molecular mechanisms involved in HD in a holistic manner. Methods To gain a synoptic view of therapeutic targets for HD, we have carried out a variety of bioinformatical and statistical analyses to scrutinize the functional association of genes curated in the HD Research Crossroads database. In particular, enrichment analyses were performed with respect to Gene Ontology categories, KEGG signaling pathways, and Pfam protein families. For selected processes, we also analyzed differential expression, using published microarray data. Additionally, we generated a candidate set of novel genetic modifiers of HD by combining information from the HD Research Crossroads database with previous genome-wide linkage studies. Results Our analyses led to a comprehensive identification of molecular mechanisms associated with HD. Remarkably, we not only recovered processes and pathways, which have frequently been linked to HD (such as cytotoxicity, apoptosis, and calcium signaling), but also found strong indications for other potentially disease-relevant mechanisms that have been less intensively studied in the context of HD (such as the cell cycle and RNA splicing, as well as Wnt and ErbB signaling). For follow-up studies, we provide a regularly updated compendium of molecular mechanism, that are associated with HD, at http://hdtt.sysbiolab.eu Additionally, we derived a candidate set of 24 novel genetic modifiers, including histone deacetylase 3 (HDAC3), metabotropic glutamate receptor 1 (GRM1), CDK5 regulatory subunit 2 (CDK5R2), and coactivator 1ß of the peroxisome proliferator-activated receptor gamma (PPARGC1B). Conclusions The results of our study give us an intriguing picture of the molecular complexity of HD. Our analyses can be seen as a first step towards a comprehensive list of biological processes, molecular functions, and pathways involved in HD, and may provide a basis for the development of more holistic disease models and new therapeutics. PMID:22741533
Ramanan, Vijay K; Kim, Sungeun; Holohan, Kelly; Shen, Li; Nho, Kwangsik; Risacher, Shannon L; Foroud, Tatiana M; Mukherjee, Shubhabrata; Crane, Paul K; Aisen, Paul S; Petersen, Ronald C; Weiner, Michael W; Saykin, Andrew J
2012-12-01
Memory deficits are prominent features of mild cognitive impairment (MCI) and Alzheimer's disease (AD). The genetic architecture underlying these memory deficits likely involves the combined effects of multiple genetic variants operative within numerous biological pathways. In order to identify functional pathways associated with memory impairment, we performed a pathway enrichment analysis on genome-wide association data from 742 Alzheimer's Disease Neuroimaging Initiative (ADNI) participants. A composite measure of memory was generated as the phenotype for this analysis by applying modern psychometric theory to item-level data from the ADNI neuropsychological test battery. Using the GSA-SNP software tool, we identified 27 canonical, expertly-curated pathways with enrichment (FDR-corrected p-value < 0.05) against this composite memory score. Processes classically understood to be involved in memory consolidation, such as neurotransmitter receptor-mediated calcium signaling and long-term potentiation, were highly represented among the enriched pathways. In addition, pathways related to cell adhesion, neuronal differentiation and guided outgrowth, and glucose- and inflammation-related signaling were also enriched. Among genes that were highly-represented in these enriched pathways, we found indications of coordinated relationships, including one large gene set that is subject to regulation by the SP1 transcription factor, and another set that displays co-localized expression in normal brain tissue along with known AD risk genes. These results 1) demonstrate that psychometrically-derived composite memory scores are an effective phenotype for genetic investigations of memory impairment and 2) highlight the promise of pathway analysis in elucidating key mechanistic targets for future studies and for therapeutic interventions.
Recommendations for Locus-Specific Databases and Their Curation
Cotton, R.G.H.; Auerbach, A.D.; Beckmann, J.S.; Blumenfeld, O.O.; Brookes, A.J.; Brown, A.F.; Carrera, P.; Cox, D.W.; Gottlieb, B.; Greenblatt, M.S.; Hilbert, P.; Lehvaslaiho, H.; Liang, P.; Marsh, S.; Nebert, D.W.; Povey, S.; Rossetti, S.; Scriver, C.R.; Summar, M.; Tolan, D.R.; Verma, I.C.; Vihinen, M.; den Dunnen, J.T.
2009-01-01
Expert curation and complete collection of mutations in genes that affect human health is essential for proper genetic healthcare and research. Expert curation is given by the curators of gene-specific mutation databases or locus-specific databases (LSDBs). While there are over 700 such databases, they vary in their content, completeness, time available for curation, and the expertise of the curator. Curation and LSDBs have been discussed, written about, and protocols have been provided for over 10 years, but there have been no formal recommendations for the ideal form of these entities. This work initiates a discussion on this topic to assist future efforts in human genetics. Further discussion is welcome. PMID:18157828
Recommendations for locus-specific databases and their curation.
Cotton, R G H; Auerbach, A D; Beckmann, J S; Blumenfeld, O O; Brookes, A J; Brown, A F; Carrera, P; Cox, D W; Gottlieb, B; Greenblatt, M S; Hilbert, P; Lehvaslaiho, H; Liang, P; Marsh, S; Nebert, D W; Povey, S; Rossetti, S; Scriver, C R; Summar, M; Tolan, D R; Verma, I C; Vihinen, M; den Dunnen, J T
2008-01-01
Expert curation and complete collection of mutations in genes that affect human health is essential for proper genetic healthcare and research. Expert curation is given by the curators of gene-specific mutation databases or locus-specific databases (LSDBs). While there are over 700 such databases, they vary in their content, completeness, time available for curation, and the expertise of the curator. Curation and LSDBs have been discussed, written about, and protocols have been provided for over 10 years, but there have been no formal recommendations for the ideal form of these entities. This work initiates a discussion on this topic to assist future efforts in human genetics. Further discussion is welcome. (c) 2007 Wiley-Liss, Inc.
Fishing for causes and cures of motor neuron disorders.
Patten, Shunmoogum A; Armstrong, Gary A B; Lissouba, Alexandra; Kabashi, Edor; Parker, J Alex; Drapeau, Pierre
2014-07-01
Motor neuron disorders (MNDs) are a clinically heterogeneous group of neurological diseases characterized by progressive degeneration of motor neurons, and share some common pathological pathways. Despite remarkable advances in our understanding of these diseases, no curative treatment for MNDs exists. To better understand the pathogenesis of MNDs and to help develop new treatments, the establishment of animal models that can be studied efficiently and thoroughly is paramount. The zebrafish (Danio rerio) is increasingly becoming a valuable model for studying human diseases and in screening for potential therapeutics. In this Review, we highlight recent progress in using zebrafish to study the pathology of the most common MNDs: spinal muscular atrophy (SMA), amyotrophic lateral sclerosis (ALS) and hereditary spastic paraplegia (HSP). These studies indicate the power of zebrafish as a model to study the consequences of disease-related genes, because zebrafish homologues of human genes have conserved functions with respect to the aetiology of MNDs. Zebrafish also complement other animal models for the study of pathological mechanisms of MNDs and are particularly advantageous for the screening of compounds with therapeutic potential. We present an overview of their potential usefulness in MND drug discovery, which is just beginning and holds much promise for future therapeutic development. © 2014. Published by The Company of Biologists Ltd.
Reactome diagram viewer: data structures and strategies to boost performance.
Fabregat, Antonio; Sidiropoulos, Konstantinos; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning
2018-04-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. For web-based pathway visualization, Reactome uses a custom pathway diagram viewer that has been evolved over the past years. Here, we present comprehensive enhancements in usability and performance based on extensive usability testing sessions and technology developments, aiming to optimize the viewer towards the needs of the community. The pathway diagram viewer version 3 achieves consistently better performance, loading and rendering of 97% of the diagrams in Reactome in less than 1 s. Combining the multi-layer html5 canvas strategy with a space partitioning data structure minimizes CPU workload, enabling the introduction of new features that further enhance user experience. Through the use of highly optimized data structures and algorithms, Reactome has boosted the performance and usability of the new pathway diagram viewer, providing a robust, scalable and easy-to-integrate solution to pathway visualization. As graph-based visualization of complex data is a frequent challenge in bioinformatics, many of the individual strategies presented here are applicable to a wide range of web-based bioinformatics resources. Reactome is available online at: https://reactome.org. The diagram viewer is part of the Reactome pathway browser (https://reactome.org/PathwayBrowser/) and also available as a stand-alone widget at: https://reactome.org/dev/diagram/. The source code is freely available at: https://github.com/reactome-pwp/diagram. fabregat@ebi.ac.uk or hhe@ebi.ac.uk. Supplementary data are available at Bioinformatics online.
AtomPy: an open atomic-data curation environment
NASA Astrophysics Data System (ADS)
Bautista, Manuel; Mendoza, Claudio; Boswell, Josiah S; Ajoku, Chukwuemeka
2014-06-01
We present a cloud-computing environment for atomic data curation, networking among atomic data providers and users, teaching-and-learning, and interfacing with spectral modeling software. The system is based on Google-Drive Sheets, Pandas (Python Data Analysis Library) DataFrames, and IPython Notebooks for open community-driven curation of atomic data for scientific and technological applications. The atomic model for each ionic species is contained in a multi-sheet Google-Drive workbook, where the atomic parameters from all known public sources are progressively stored. Metadata (provenance, community discussion, etc.) accompanying every entry in the database are stored through Notebooks. Education tools on the physics of atomic processes as well as their relevance to plasma and spectral modeling are based on IPython Notebooks that integrate written material, images, videos, and active computer-tool workflows. Data processing workflows and collaborative software developments are encouraged and managed through the GitHub social network. Relevant issues this platform intends to address are: (i) data quality by allowing open access to both data producers and users in order to attain completeness, accuracy, consistency, provenance and currentness; (ii) comparisons of different datasets to facilitate accuracy assessment; (iii) downloading to local data structures (i.e. Pandas DataFrames) for further manipulation and analysis by prospective users; and (iv) data preservation by avoiding the discard of outdated sets.
Leung, Ada W. Y.; Hung, Stacy S.; Backstrom, Ian; Ricaurte, Daniel; Kwok, Brian; Poon, Steven; McKinney, Steven; Segovia, Romulo; Rawji, Jenna; Qadir, Mohammed A.; Aparicio, Samuel; Stirling, Peter C.; Steidl, Christian; Bally, Marcel B.
2016-01-01
Platinum-based combination chemotherapy is the standard treatment for advanced non-small cell lung cancer (NSCLC). While cisplatin is effective, its use is not curative and resistance often emerges. As a consequence of microenvironmental heterogeneity, many tumour cells are exposed to sub-lethal doses of cisplatin. Further, genomic heterogeneity and unique tumor cell sub-populations with reduced sensitivities to cisplatin play a role in its effectiveness within a site of tumor growth. Being exposed to sub-lethal doses will induce changes in gene expression that contribute to the tumour cell’s ability to survive and eventually contribute to the selective pressures leading to cisplatin resistance. Such changes in gene expression, therefore, may contribute to cytoprotective mechanisms. Here, we report on studies designed to uncover how tumour cells respond to sub-lethal doses of cisplatin. A microarray study revealed changes in gene expressions that occurred when A549 cells were exposed to a no-observed-effect level (NOEL) of cisplatin (e.g. the IC10). These data were integrated with results from a genome-wide siRNA screen looking for novel therapeutic targets that when inhibited transformed a NOEL of cisplatin into one that induced significant increases in lethality. Pathway analyses were performed to identify pathways that could be targeted to enhance cisplatin activity. We found that over 100 genes were differentially expressed when A549 cells were exposed to a NOEL of cisplatin. Pathways associated with apoptosis and DNA repair were activated. The siRNA screen revealed the importance of the hedgehog, cell cycle regulation, and insulin action pathways in A549 cell survival and response to cisplatin treatment. Results from both datasets suggest that RRM2B, CABYR, ALDH3A1, and FHL2 could be further explored as cisplatin-enhancing gene targets. Finally, pathways involved in repairing double-strand DNA breaks and INO80 chromatin remodeling were enriched in both datasets, warranting further research into combinations of cisplatin and therapeutics targeting these pathways. PMID:26938915
Leung, Ada W Y; Hung, Stacy S; Backstrom, Ian; Ricaurte, Daniel; Kwok, Brian; Poon, Steven; McKinney, Steven; Segovia, Romulo; Rawji, Jenna; Qadir, Mohammed A; Aparicio, Samuel; Stirling, Peter C; Steidl, Christian; Bally, Marcel B
2016-01-01
Platinum-based combination chemotherapy is the standard treatment for advanced non-small cell lung cancer (NSCLC). While cisplatin is effective, its use is not curative and resistance often emerges. As a consequence of microenvironmental heterogeneity, many tumour cells are exposed to sub-lethal doses of cisplatin. Further, genomic heterogeneity and unique tumor cell sub-populations with reduced sensitivities to cisplatin play a role in its effectiveness within a site of tumor growth. Being exposed to sub-lethal doses will induce changes in gene expression that contribute to the tumour cell's ability to survive and eventually contribute to the selective pressures leading to cisplatin resistance. Such changes in gene expression, therefore, may contribute to cytoprotective mechanisms. Here, we report on studies designed to uncover how tumour cells respond to sub-lethal doses of cisplatin. A microarray study revealed changes in gene expressions that occurred when A549 cells were exposed to a no-observed-effect level (NOEL) of cisplatin (e.g. the IC10). These data were integrated with results from a genome-wide siRNA screen looking for novel therapeutic targets that when inhibited transformed a NOEL of cisplatin into one that induced significant increases in lethality. Pathway analyses were performed to identify pathways that could be targeted to enhance cisplatin activity. We found that over 100 genes were differentially expressed when A549 cells were exposed to a NOEL of cisplatin. Pathways associated with apoptosis and DNA repair were activated. The siRNA screen revealed the importance of the hedgehog, cell cycle regulation, and insulin action pathways in A549 cell survival and response to cisplatin treatment. Results from both datasets suggest that RRM2B, CABYR, ALDH3A1, and FHL2 could be further explored as cisplatin-enhancing gene targets. Finally, pathways involved in repairing double-strand DNA breaks and INO80 chromatin remodeling were enriched in both datasets, warranting further research into combinations of cisplatin and therapeutics targeting these pathways.
Data Curation Education in Research Centers (DCERC)
NASA Astrophysics Data System (ADS)
Marlino, M. R.; Mayernik, M. S.; Kelly, K.; Allard, S.; Tenopir, C.; Palmer, C.; Varvel, V. E., Jr.
2012-12-01
Digital data both enable and constrain scientific research. Scientists are enabled by digital data to develop new research methods, utilize new data sources, and investigate new topics, but they also face new data collection, management, and preservation burdens. The current data workforce consists primarily of scientists who receive little formal training in data management and data managers who are typically educated through on-the-job training. The Data Curation Education in Research Centers (DCERC) program is investigating a new model for educating data professionals to contribute to scientific research. DCERC is a collaboration between the University of Illinois at Urbana-Champaign Graduate School of Library and Information Science, the University of Tennessee School of Information Sciences, and the National Center for Atmospheric Research. The program is organized around a foundations course in data curation and provides field experiences in research and data centers for both master's and doctoral students. This presentation will outline the aims and the structure of the DCERC program and discuss results and lessons learned from the first set of summer internships in 2012. Four masters students participated and worked with both data mentors and science mentors, gaining first hand experiences in the issues, methods, and challenges of scientific data curation. They engaged in a diverse set of topics, including climate model metadata, observational data management workflows, and data cleaning, documentation, and ingest processes within a data archive. The students learned current data management practices and challenges while developing expertise and conducting research. They also made important contributions to NCAR data and science teams by evaluating data management workflows and processes, preparing data sets to be archived, and developing recommendations for particular data management activities. The master's student interns will return in summer of 2013, and two Ph.D. students will conduct data curation-related dissertation fieldwork during the 2013-2014 academic year.
Recognising discourse causality triggers in the biomedical domain.
Mihăilă, Claudiu; Ananiadou, Sophia
2013-12-01
Current domain-specific information extraction systems represent an important resource for biomedical researchers, who need to process vast amounts of knowledge in a short time. Automatic discourse causality recognition can further reduce their workload by suggesting possible causal connections and aiding in the curation of pathway models. We describe here an approach to the automatic identification of discourse causality triggers in the biomedical domain using machine learning. We create several baselines and experiment with and compare various parameter settings for three algorithms, i.e. Conditional Random Fields (CRF), Support Vector Machines (SVM) and Random Forests (RF). We also evaluate the impact of lexical, syntactic, and semantic features on each of the algorithms, showing that semantics improves the performance in all cases. We test our comprehensive feature set on two corpora containing gold standard annotations of causal relations, and demonstrate the need for more gold standard data. The best performance of 79.35% F-score is achieved by CRFs when using all three feature types.
ZikaBase: An integrated ZIKV- Human Interactome Map database.
Gurumayum, Sanathoi; Brahma, Rahul; Naorem, Leimarembi Devi; Muthaiyan, Mathavan; Gopal, Jeyakodi; Venkatesan, Amouda
2018-01-15
Re-emergence of ZIKV has caused infections in more than 1.5 million people. The molecular mechanism and pathogenesis of ZIKV is not well explored due to unavailability of adequate model and lack of publically accessible resources to provide information of ZIKV-Human protein interactome map till today. This study made an attempt to curate the ZIKV-Human interaction proteins from published literatures and RNA-Seq data. 11 direct interaction, 12 associated genes are retrieved from literatures and 3742 Differentially Expressed Genes (DEGs) are obtained from RNA-Seq analysis. The genes have been analyzed to construct the ZIKV-Human Interactome Map. The importance of the study has been illustrated by the enrichment analysis and observed that direct interaction and associated genes are enriched in viral entry into host cell. Also, ZIKV infection modulates 32% signal and 27% immune system pathways. The integrated database, ZikaBase has been developed to help the virology research community and accessible at https://test5.bicpu.edu.in. Copyright © 2017 Elsevier Inc. All rights reserved.
Diplomatic Assistance: Can Helminth-Modulated Macrophages Act as Treatment for Inflammatory Disease?
Steinfelder, Svenja; O’Regan, Noëlle Louise; Hartmann, Susanne
2016-01-01
Helminths have evolved numerous pathways to prevent their expulsion or elimination from the host to ensure long-term survival. During infection, they target numerous host cells, including macrophages, to induce an alternatively activated phenotype, which aids elimination of infection, tissue repair, and wound healing. Multiple animal-based studies have demonstrated a significant reduction or complete reversal of disease by helminth infection, treatment with helminth products, or helminth-modulated macrophages in models of allergy, autoimmunity, and sepsis. Experimental studies of macrophage and helminth therapies are being translated into clinical benefits for patients undergoing transplantation and those with multiple sclerosis. Thus, helminths or helminth-modulated macrophages present great possibilities as therapeutic applications for inflammatory diseases in humans. Macrophage-based helminth therapies and the underlying mechanisms of their therapeutic or curative effects represent an under-researched area with the potential to open new avenues of treatment. This review explores the application of helminth-modulated macrophages as a new therapy for inflammatory diseases. PMID:27101372
The EPA Comptox Chemistry Dashboard . (BOSC)
A consolidated web platform is necessary for researchers to access chemical information look-up, models and model predictions and linkages to Agency and public resources. This will provide access to: curated chemical structures, computed and measured physchem properties, exposure...
ERIC Educational Resources Information Center
Mihailidis, Paul
2015-01-01
Despite the increased role of digital curation tools and platforms in the daily life of social network users, little research has focused on the competencies and dispositions that young people develop to effectively curate content online. This paper details the results of a mixed method study exploring the curation competencies of young people in…
Saccharomyces genome database informs human biology
Skrzypek, Marek S; Nash, Robert S; Wong, Edith D; MacPherson, Kevin A; Karra, Kalpana; Binkley, Gail; Simison, Matt; Miyasato, Stuart R
2018-01-01
Abstract The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD. PMID:29140510
NASA Technical Reports Server (NTRS)
McCubbin, Francis M.; Zeigler, Ryan A.
2017-01-01
The Astromaterials Acquisition and Curation Office (henceforth referred to herein as NASA Curation Office) at NASA Johnson Space Center (JSC) is responsible for curating all of NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10F JSC is charged with curation of all extraterrestrial material under NASA control, including future NASA missions. The Directive goes on to define Curation as including documentation, preservation, preparation, and distribution of samples for research, education, and public outreach. Here we briefly describe NASA's astromaterials collections and our ongoing efforts related to enhancing the utility of our current collections as well as our efforts to prepare for future sample return missions. We collectively refer to these efforts as advanced curation.
NASA Technical Reports Server (NTRS)
McCubbin, F. M.; Evans, C. A.; Fries, M. D.; Harrington, A. D.; Regberg, A. B.; Snead, C. J.; Zeigler, R. A.
2017-01-01
The Astromaterials Acquisition and Curation Office (henceforth referred to herein as NASA Curation Office) at NASA Johnson Space Center (JSC) is responsible for curating all of NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10F JSC is charged with curation of all extraterrestrial material under NASA control, including future NASA missions. The Directive goes on to define Curation as including documentation, preservation, preparation, and distribution of samples for re-search, education, and public outreach. Here we briefly describe NASA's astromaterials collections and our ongoing efforts related to enhancing the utility of our current collections as well as our efforts to prepare for future sample return missions. We collectively refer to these efforts as advanced curation.
NASA Technical Reports Server (NTRS)
McCubbin, F. M.; Allton, J. H.; Barnes, J. J.; Boyce, J. W.; Burton, A. S.; Draper, D. S.; Evans, C. A.; Fries, M. D.; Jones, J. H.; Keller, L. P.;
2017-01-01
The Astromaterials Acquisition and Curation Office (henceforth referred to herein as NASA Curation Office) at NASA Johnson Space Center (JSC) is responsible for curating all of NASA's extraterrestrial samples. JSC presently curates 9 different astromaterials collections: (1) Apollo samples, (2) LUNA samples, (3) Antarctic meteorites, (4) Cosmic dust particles, (5) Microparticle Impact Collection [formerly called Space Exposed Hardware], (6) Genesis solar wind, (7) Star-dust comet Wild-2 particles, (8) Stardust interstellar particles, and (9) Hayabusa asteroid Itokawa particles. In addition, the next missions bringing carbonaceous asteroid samples to JSC are Hayabusa 2/ asteroid Ryugu and OSIRIS-Rex/ asteroid Bennu, in 2021 and 2023, respectively. The Hayabusa 2 samples are provided as part of an international agreement with JAXA. The NASA Curation Office plans for the requirements of future collections in an "Advanced Curation" program. Advanced Curation is tasked with developing procedures, technology, and data sets necessary for curating new types of collections as envisioned by NASA exploration goals. Here we review the science value and sample curation needs of some potential targets for sample return missions over the next 35 years.
Comprehensive analysis of a Metabolic Model for lipid production in Rhodosporidium toruloides.
Castañeda, María Teresita; Nuñez, Sebastián; Garelli, Fabricio; Voget, Claudio; Battista, Hernán De
2018-05-19
The yeast Rhodosporidium toruloides has been extensively studied for its application in biolipid production. The knowledge of its metabolism capabilities and the application of constraint-based flux analysis methodology provide useful information for process prediction and optimization. The accuracy of the resulting predictions is highly dependent on metabolic models. A metabolic reconstruction for R. toruloides metabolism has been recently published. On the basis of this model, we developed a curated version that unblocks the central nitrogen metabolism and, in addition, completes charge and mass balances in some reactions neglected in the former model. Then, a comprehensive analysis of network capability was performed with the curated model and compared with the published metabolic reconstruction. The flux distribution obtained by lipid optimization with Flux Balance Analysis was able to replicate the internal biochemical changes that lead to lipogenesis in oleaginous microorganisms. These results motivate the development of a genome-scale model for complete elucidation of R. toruloides metabolism. Copyright © 2018 Elsevier B.V. All rights reserved.
The Importance of Contamination Knowledge in Curation - Insights into Mars Sample Return
NASA Technical Reports Server (NTRS)
Harrington, A. D.; Calaway, M. J.; Regberg, A. B.; Mitchell, J. L.; Fries, M. D.; Zeigler, R. A.; McCubbin, F. M.
2018-01-01
The Astromaterials Acquisition and Curation Office at NASA Johnson Space Center (JSC), in Houston, TX (henceforth Curation Office) manages the curation of extraterrestrial samples returned by NASA missions and shared collections from international partners, preserving their integrity for future scientific study while providing the samples to the international community in a fair and unbiased way. The Curation Office also curates flight and non-flight reference materials and other materials from spacecraft assembly (e.g., lubricants, paints and gases) of sample return missions that would have the potential to cross-contaminate a present or future NASA astromaterials collection.
NASA Technical Reports Server (NTRS)
Fletcher, L. A.; Allen, C. C.; Bastien, R.
2008-01-01
NASA's Johnson Space Center (JSC) and the Astromaterials Curator are charged by NPD 7100.10D with the curation of all of NASA s extraterrestrial samples, including those from future missions. This responsibility includes the development of new sample handling and preparation techniques; therefore, the Astromaterials Curator must begin developing procedures to preserve, prepare and ship samples at sub-freezing temperatures in order to enable future sample return missions. Such missions might include the return of future frozen samples from permanently-shadowed lunar craters, the nuclei of comets, the surface of Mars, etc. We are demonstrating the ability to curate samples under cold conditions by designing, installing and testing a cold curation glovebox. This glovebox will allow us to store, document, manipulate and subdivide frozen samples while quantifying and minimizing contamination throughout the curation process.
ITEP: an integrated toolkit for exploration of microbial pan-genomes.
Benedict, Matthew N; Henriksen, James R; Metcalf, William W; Whitaker, Rachel J; Price, Nathan D
2014-01-03
Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes. We have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP's capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution. ITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts.
Overview of the gene ontology task at BioCreative IV.
Mao, Yuqing; Van Auken, Kimberly; Li, Donghui; Arighi, Cecilia N; McQuilton, Peter; Hayman, G Thomas; Tweedie, Susan; Schaeffer, Mary L; Laulederkind, Stanley J F; Wang, Shur-Jen; Gobeill, Julien; Ruch, Patrick; Luu, Anh Tuan; Kim, Jung-Jae; Chiang, Jung-Hsien; Chen, Yu-De; Yang, Chia-Jung; Liu, Hongfang; Zhu, Dongqing; Li, Yanpeng; Yu, Hong; Emadzadeh, Ehsan; Gonzalez, Graciela; Chen, Jian-Ming; Dai, Hong-Jie; Lu, Zhiyong
2014-01-01
Gene ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Updated regulation curation model at the Saccharomyces Genome Database
Engel, Stacia R; Skrzypek, Marek S; Hellerstedt, Sage T; Wong, Edith D; Nash, Robert S; Weng, Shuai; Binkley, Gail; Sheppard, Travis K; Karra, Kalpana; Cherry, J Michael
2018-01-01
Abstract The Saccharomyces Genome Database (SGD) provides comprehensive, integrated biological information for the budding yeast Saccharomyces cerevisiae, along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms. We have recently expanded our data model for regulation curation to address regulation at the protein level in addition to transcription, and are presenting the expanded data on the ‘Regulation’ pages at SGD. These pages include a summary describing the context under which the regulator acts, manually curated and high-throughput annotations showing the regulatory relationships for that gene and a graphical visualization of its regulatory network and connected networks. For genes whose products regulate other genes or proteins, the Regulation page includes Gene Ontology enrichment analysis of the biological processes in which those targets participate. For DNA-binding transcription factors, we also provide other information relevant to their regulatory function, such as DNA binding site motifs and protein domains. As with other data types at SGD, all regulatory relationships and accompanying data are available through YeastMine, SGD’s data warehouse based on InterMine. Database URL: http://www.yeastgenome.org PMID:29688362
Survival from colorectal cancer in Victoria: 10-year follow up of the 1987 management survey.
McLeish, John A; Thursfield, Vicky J; Giles, Graham G
2002-05-01
In 1987, the Victorian Cancer Registry identified a population-based sample of patients who underwent surgery for colorectal cancer for an audit of management following resection. Over 10 years have passed since this survey, and data on the survival of these patients (incorporating various prognostic indicators collected at the time of the survey) are now discussed in the present report. Relative survival analysis was conducted for each prognostic indicator separately and then combined in a multivariate model. Relative survival at 5 years for patients undergoing curative resections was 76% compared with 7% for those whose treatment was considered palliative. Survival at 10 years was little changed (73% and 7% respectively). Survival did not differ significantly by sex or age irrespective of treatment intention. In the curative group, only stage was a significant predictor of survival. Multivariate analysis was performed only for the curative group. Adjusting for all variables simultaneously,stage was the only -significant predictor of survival. Patients with Dukes' stage C disease were at a significantly greater risk (OR 5.5 (1.7-17.6)) than those with Dukes' A. Neither tumour site, sex, age, surgeon activity level nor adjuvant therapies made a significant contribution to the model.
ERIC Educational Resources Information Center
McCoy, Floyd W.
1977-01-01
Reports on a recent meeting of marine curators in which data dissemination, standardization of marine curating techniques and methods, responsibilities of curators, funding problems, and sampling equipment were the main areas of discussion. A listing of the major deep sea sample collections in the United States is also provided. (CP)
Chen, Jianxiang; Rajasekaran, Muthukumar; Hui, Kam M
2017-06-01
Hepatocellular carcinoma is one of the most common causes of cancer-related death worldwide. Hepatocellular carcinoma development depends on the inhibition and activation of multiple vital pathways, including the Wnt signaling pathway. The Wnt/β-catenin pathway lies at the center of various signaling pathways that regulate embryonic development, tissue homeostasis and cancers. Activation of the Wnt/β-catenin pathway has been observed frequently in hepatocellular carcinoma. However, activating mutations in β-catenin, Axin and Adenomatous Polyposis Coli only contribute to a portion of the Wnt signaling hyper-activation observed in hepatocellular carcinoma. Therefore, besides mutations in the canonical Wnt components, there must be additional atypical regulation or regulators during Wnt signaling activation that promote liver carcinogenesis. In this mini-review, we have tried to summarize some of these well-established factors and to highlight some recently identified novel factors in the Wnt/β-catenin signaling pathway in hepatocellular carcinoma. Impact statement Early recurrence of human hepatocellular carcinoma (HCC) is a frequent cause of poor survival after potentially curative liver resection. Among the deregulated signaling cascades in HCC, evidence indicates that alterations in the Wnt/β-catenin signaling pathway play key roles in hepatocarcinogenesis. In this review, we summarize the potential molecular mechanisms how the microtubule-associated Protein regulator of cytokinesis 1 (PRC1), a direct Wnt signaling target previously identified in our laboratory to be up-regulated in HCC, in promoting cancer proliferation, stemness, metastasis and tumorigenesis through a complex regulatory circuitry of Wnt3a activities.
Lv, Yufeng; Wei, Wenhao; Huang, Zhong; Chen, Zhichao; Fang, Yuan; Pan, Lili; Han, Xueqiong; Xu, Zihai
2018-06-20
The aim of this study was to develop a novel long non-coding RNA (lncRNA) expression signature to accurately predict early recurrence for patients with hepatocellular carcinoma (HCC) after curative resection. Using expression profiles downloaded from The Cancer Genome Atlas database, we identified multiple lncRNAs with differential expression between early recurrence (ER) group and non-early recurrence (non-ER) group of HCC. Least absolute shrinkage and selection operator (LASSO) for logistic regression models were used to develop a lncRNA-based classifier for predicting ER in the training set. An independent test set was used to validated the predictive value of this classifier. Futhermore, a co-expression network based on these lncRNAs and its highly related genes was constructed and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of genes in the network were performed. We identified 10 differentially expressed lncRNAs, including 3 that were upregulated and 7 that were downregulated in ER group. The lncRNA-based classifier was constructed based on 7 lncRNAs (AL035661.1, PART1, AC011632.1, AC109588.1, AL365361.1, LINC00861 and LINC02084), and its accuracy was 0.83 in training set, 0.87 in test set and 0.84 in total set. And ROC curve analysis showed the AUROC was 0.741 in training set, 0.824 in the test set and 0.765 in total set. A functional enrichment analysis suggested that the genes of which is highly related to 4 lncRNAs were involved in immune system. This 7-lncRNA expression profile can effectively predict the early recurrence after surgical resection for HCC. This article is protected by copyright. All rights reserved.
Can we replace curation with information extraction software?
Karp, Peter D
2016-01-01
Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL. © The Author(s) 2016. Published by Oxford University Press.
The Role of the Curator in Modern Hospitals: A Transcontinental Perspective.
Moss, Hilary; O'Neill, Desmond
2016-12-13
This paper explores the role of the curator in hospitals. The arts play a significant role in every society; however, recent studies indicate a neglect of the aesthetic environment of healthcare. This international study explores the complex role of the curator in modern hospitals. Semi-structured interviews were conducted with ten arts specialists in hospitals across five countries and three continents for a qualitative, phenomenological study. Five themes arose from the data: (1) Patient involvement and influence on the arts programme in hospital (2) Understanding the role of the curator in hospital (3) Influences on arts programming in hospital (4) Types of arts programmes (5) Limitations to effective curation in hospital. Recommendations arising from the research included recognition of the specialised role of the curator in hospitals; building positive links with clinical staff to effect positive hospital arts programmes and increasing formal involvement of patients in arts planning in hospital. Hospital curation can be a vibrant arena for arts development, and the role of the hospital curator is a ground-breaking specialist role that can bring benefits to hospital life. The role of curator in hospital deserves to be supported and developed by both the arts and health sectors.
Curating NASA's Past, Present, and Future Astromaterial Sample Collections
NASA Technical Reports Server (NTRS)
Zeigler, R. A.; Allton, J. H.; Evans, C. A.; Fries, M. D.; McCubbin, F. M.; Nakamura-Messenger, K.; Righter, K.; Zolensky, M.; Stansbery, E. K.
2016-01-01
The Astromaterials Acquisition and Curation Office at NASA Johnson Space Center (hereafter JSC curation) is responsible for curating all of NASA's extraterrestrial samples. JSC presently curates 9 different astromaterials collections in seven different clean-room suites: (1) Apollo Samples (ISO (International Standards Organization) class 6 + 7); (2) Antarctic Meteorites (ISO 6 + 7); (3) Cosmic Dust Particles (ISO 5); (4) Microparticle Impact Collection (ISO 7; formerly called Space-Exposed Hardware); (5) Genesis Solar Wind Atoms (ISO 4); (6) Stardust Comet Particles (ISO 5); (7) Stardust Interstellar Particles (ISO 5); (8) Hayabusa Asteroid Particles (ISO 5); (9) OSIRIS-REx Spacecraft Coupons and Witness Plates (ISO 7). Additional cleanrooms are currently being planned to house samples from two new collections, Hayabusa 2 (2021) and OSIRIS-REx (2023). In addition to the labs that house the samples, we maintain a wide variety of infra-structure facilities required to support the clean rooms: HEPA-filtered air-handling systems, ultrapure dry gaseous nitrogen systems, an ultrapure water system, and cleaning facilities to provide clean tools and equipment for the labs. We also have sample preparation facilities for making thin sections, microtome sections, and even focused ion-beam sections. We routinely monitor the cleanliness of our clean rooms and infrastructure systems, including measurements of inorganic or organic contamination, weekly airborne particle counts, compositional and isotopic monitoring of liquid N2 deliveries, and daily UPW system monitoring. In addition to the physical maintenance of the samples, we track within our databases the current and ever changing characteristics (weight, location, etc.) of more than 250,000 individually numbered samples across our various collections, as well as more than 100,000 images, and countless "analog" records that record the sample processing records of each individual sample. JSC Curation is co-located with JSC's Astromaterials Research Office, which houses a world-class suite of analytical instrumentation and scientists. We leverage these labs and personnel to better curate the samples. Part of the cu-ration process is planning for the future, and we refer to these planning efforts as "advanced curation". Advanced Curation is tasked with developing procedures, technology, and data sets necessary for curating new types of collections as envi-sioned by NASA exploration goals. We are (and have been) planning for future cu-ration, including cold curation, extended curation of ices and volatiles, curation of samples with special chemical considerations such as perchlorate-rich samples, and curation of organically- and biologically-sensitive samples.
Wang, Yanchao; Sunderraman, Rajshekhar
2006-01-01
In this paper, we propose two architectures for curating PDB data to improve its quality. The first one, PDB Data Curation System, is developed by adding two parts, Checking Filter and Curation Engine, between User Interface and Database. This architecture supports the basic PDB data curation. The other one, PDB Data Curation System with XCML, is designed for further curation which adds four more parts, PDB-XML, PDB, OODB, Protin-OODB, into the previous one. This architecture uses XCML language to automatically check errors of PDB data that enables PDB data more consistent and accurate. These two tools can be used for cleaning existing PDB files and creating new PDB files. We also show some ideas how to add constraints and assertions with XCML to get better data. In addition, we discuss the data provenance that may affect data accuracy and consistency.
Molecular cues for development and regeneration of salivary glands
Liu, Fei; Wang, Songlin
2015-01-01
The hypofunction of salivary glands caused by Sjögren’s Syndrome or radiotherapy for head and neck cancer significantly compromises the quality of life of millions patients. Currently no curative treatment is available for the irreversible hyposalivation, whereas regenerative strategies targeting salivary stem/progenitor cells are promising. However, the success of these strategies is constrained by the lack of insights on the molecular cues of salivary gland regeneration. Recent advances in the molecular controls of salivary gland morphogenesis provided valuable clues for identifying potential regenerative cues. A complicated network of signaling molecules between epithelia, mesenchyme, endothelia, extracellular matrix and innervating nerves orchestrate the salivary gland organogenesis. Here we discuss the roles of several cross-talking intercellular signaling pathways, i.e., FGF, Wnt, Hedgehog, Eda, Notch, Chrm1/HB-EGF and Laminin/Integrin pathways, in the development of salivary glands and their potentials to promote salivary regeneration. PMID:24189993
[AV-reentrant tachycardia and Wolff-Parkinson-White syndrome : Diagnosis and treatment].
Voss, Frederik; Eckardt, Lars; Busch, Sonia; Estner, Heidi L; Steven, Daniel; Sommer, Philipp; von Bary, Christian; Neuberger, Hans-Ruprecht
2016-12-01
The AV-reentrant tachycardia (AVRT) is a supraventricular tachycardia with an incidence of 1-3/1000. The pathophysiological basis is an accessory atrioventricular pathway (AP). Patients with AVRT typically present with palpitations, an on-off characteristic, anxiety, dyspnea, and polyuria. This type of tachycardia may often be terminated by vagal maneuvers. Although the clinical presentation of AVRT is quite similar to AV-nodal reentrant tachycardias, the correct diagnosis is often facilitated by analyzing a standard 12-lead ECG at normal heart rate showing ventricular preexcitation. Curative catheter ablation of the AP represents the therapy of choice in symptomatic patients. This article is the fourth part of a series written to improve the professional education of young electrophysiologists. It explains pathophysiology, symptoms, and electrophysiological findings of an invasive EP study. It focusses on mapping and ablation of accessory pathways.
Genome-Scale Reconstruction of the Human Astrocyte Metabolic Network
Martín-Jiménez, Cynthia A.; Salazar-Barreto, Diego; Barreto, George E.; González, Janneth
2017-01-01
Astrocytes are the most abundant cells of the central nervous system; they have a predominant role in maintaining brain metabolism. In this sense, abnormal metabolic states have been found in different neuropathological diseases. Determination of metabolic states of astrocytes is difficult to model using current experimental approaches given the high number of reactions and metabolites present. Thus, genome-scale metabolic networks derived from transcriptomic data can be used as a framework to elucidate how astrocytes modulate human brain metabolic states during normal conditions and in neurodegenerative diseases. We performed a Genome-Scale Reconstruction of the Human Astrocyte Metabolic Network with the purpose of elucidating a significant portion of the metabolic map of the astrocyte. This is the first global high-quality, manually curated metabolic reconstruction network of a human astrocyte. It includes 5,007 metabolites and 5,659 reactions distributed among 8 cell compartments, (extracellular, cytoplasm, mitochondria, endoplasmic reticle, Golgi apparatus, lysosome, peroxisome and nucleus). Using the reconstructed network, the metabolic capabilities of human astrocytes were calculated and compared both in normal and ischemic conditions. We identified reactions activated in these two states, which can be useful for understanding the astrocytic pathways that are affected during brain disease. Additionally, we also showed that the obtained flux distributions in the model, are in accordance with literature-based findings. Up to date, this is the most complete representation of the human astrocyte in terms of inclusion of genes, proteins, reactions and metabolic pathways, being a useful guide for in-silico analysis of several metabolic behaviors of the astrocyte during normal and pathologic states. PMID:28243200
The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program develops and utilizes QSAR modeling approaches across a broad range of applications. In terms of physical chemistry we have a particular interest in the prediction of basic physicochemical parameters ...
Preparing to Receive and Handle Martian Samples When They Arrive on Earth
NASA Technical Reports Server (NTRS)
McCubbin, Francis M.
2017-01-01
The Astromaterials Acquisition and Curation Office at NASA Johnson Space Center (JSC) is responsible for curating all of NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10F+ derivative NPR 'Curation of Extraterrestrial Materials', JSC is charged with 'The curation of all extraterrestrial material under NASA control, including future NASA missions. 'The Directive goes on to define Curation as including'...documentation, preservation, preparation, and distribution of samples for research, education, and public outreach."
Li, Yan; Rashid, Azhar; Wang, Hongjie; Hu, Anyi; Lin, Lifeng; Yu, Chang-Ping; Chen, Meng; Sun, Qian
2018-08-15
Sulfamethoxazole (SMX) is a sulfonamide antibiotic, widely used as curative and preventive drug for human, animal, and aquaculture bacterial infections. Its residues have been ubiquitously detected in the surface waters and sediments. In the present study, SMX dissipation and kinetics was studied in the natural water samples from Jiulong River under simulated complex natural conditions as well as conditions to mimic various biotic and abiotic environmental conditions in isolation. Structural equation modeling (SEM) by employing partial least square technique in path coefficient analysis was used to investigate the direct and indirect contributions of different environmental factors in the natural attenuation of SMX. The model explained 81% of the variability in natural attenuation as a dependent variable under the influence of sole effects of direct photo-degradation, indirect photo-degradation, hydrolysis, microbial degradation and bacterial degradation. The results of SEM suggested that the direct and indirect photo-degradation were the major pathways in the SMX natural attenuation. However, other biotic and abiotic factors also play a mediatory role during the natural attenuation and other processes. Furthermore, the potential transformation products of SMX were identified and their toxicity was evaluated. Copyright © 2018 Elsevier B.V. All rights reserved.
Text mining for metabolic pathways, signaling cascades, and protein networks.
Hoffmann, Robert; Krallinger, Martin; Andres, Eduardo; Tamames, Javier; Blaschke, Christian; Valencia, Alfonso
2005-05-10
The complexity of the information stored in databases and publications on metabolic and signaling pathways, the high throughput of experimental data, and the growing number of publications make it imperative to provide systems to help the researcher navigate through these interrelated information resources. Text-mining methods have started to play a key role in the creation and maintenance of links between the information stored in biological databases and its original sources in the literature. These links will be extremely useful for database updating and curation, especially if a number of technical problems can be solved satisfactorily, including the identification of protein and gene names (entities in general) and the characterization of their types of interactions. The first generation of openly accessible text-mining systems, such as iHOP (Information Hyperlinked over Proteins), provides additional functions to facilitate the reconstruction of protein interaction networks, combine database and text information, and support the scientist in the formulation of novel hypotheses. The next challenge is the generation of comprehensive information regarding the general function of signaling pathways and protein interaction networks.
Curating Big Data Made Simple: Perspectives from Scientific Communities.
Sowe, Sulayman K; Zettsu, Koji
2014-03-01
The digital universe is exponentially producing an unprecedented volume of data that has brought benefits as well as fundamental challenges for enterprises and scientific communities alike. This trend is inherently exciting for the development and deployment of cloud platforms to support scientific communities curating big data. The excitement stems from the fact that scientists can now access and extract value from the big data corpus, establish relationships between bits and pieces of information from many types of data, and collaborate with a diverse community of researchers from various domains. However, despite these perceived benefits, to date, little attention is focused on the people or communities who are both beneficiaries and, at the same time, producers of big data. The technical challenges posed by big data are as big as understanding the dynamics of communities working with big data, whether scientific or otherwise. Furthermore, the big data era also means that big data platforms for data-intensive research must be designed in such a way that research scientists can easily search and find data for their research, upload and download datasets for onsite/offsite use, perform computations and analysis, share their findings and research experience, and seamlessly collaborate with their colleagues. In this article, we present the architecture and design of a cloud platform that meets some of these requirements, and a big data curation model that describes how a community of earth and environmental scientists is using the platform to curate data. Motivation for developing the platform, lessons learnt in overcoming some challenges associated with supporting scientists to curate big data, and future research directions are also presented.
Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K.; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C.; Hoeng, Julia
2015-01-01
With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com PMID:25887162
Integrating text mining into the MGI biocuration workflow
Dowell, K.G.; McAndrews-Hill, M.S.; Hill, D.P.; Drabkin, H.J.; Blake, J.A.
2009-01-01
A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals. In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen ∼1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database. Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature. PMID:20157492
Integrating text mining into the MGI biocuration workflow.
Dowell, K G; McAndrews-Hill, M S; Hill, D P; Drabkin, H J; Blake, J A
2009-01-01
A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature.
Modeling tandem AAG8-MEK inhibition in melanoma cells
Sun, Bing; Kawahara, Masahiro; Nagamune, Teruyuki
2014-01-01
Drug resistance presents a challenge to the treatment of cancer patients, especially for melanomas, most of which are caused by the hyperactivation of MAPK signaling pathway. Innate or acquired drug-resistant relapse calls for the investigation of the resistant mechanisms and new anti-cancer drugs to provide implications for the ultimate goal of curative therapy. Aging-associated gene 8 (AAG8, encoded by the SIGMAR1 gene) is a chaperone protein profoundly elaborated in neurology. However, roles of AAG8 in carcinogenesis remain unclear. Herein, we discover AAG8 antagonists as new MEK inhibitors in melanoma cells and propose a novel drug combination strategy for melanoma therapy by presenting the experimental evidences. We report that specific antagonism of AAG8, efficiently suppresses melanoma cell growth and migration through, at least in part, the inactivation of the RAS-CRAF-MEK signaling pathway. We further demonstrate that melanoma cells that are resistant to AAG8 antagonist harbor refractory CRAF-MEK activity. MEK acts as a central mediator for anti-cancer effects and also for the resistance mechanism, leading to our proposal of tandem AAG8-MEK inhibition in melanoma cells. Combination of AAG8 antagonist and very low concentration of a MEK inhibitor synergistically restricts the growth of drug-resistant cells. These data collectively pinpoint AAG8 as a potential target and delineate a promising drug combination strategy for melanoma therapy. PMID:24634165
Ruan, Yunfeng; Jiang, Jie; Guo, Liang; Li, Yan; Huang, Hailiang; Shen, Lu; Luan, Mengqi; Li, Mo; Du, Huihui; Ma, Cheng; He, Lin; Zhang, Xiaoqing; Qin, Shengying
2016-01-01
Epidermal growth factor receptor (EGFR) Tyrosine kinase inhibitor (TKI) is an effective targeted therapy for advanced non-small cell lung cancer (NSCLC) but also causes adverse drug reactions (ADRs) e.g., skin rash and diarrhea. SNPs in the EGFR signal pathway, drug metabolism/ transport pathways and miRNA might contribute to the interpersonal difference in ADRs but biomarkers for therapeutic responses and ADRs to TKIs in Chinese population are yet to be fully investigated. We recruited 226 Chinese advanced NSCLC patients who received TKIs erlotinib, gefitinib and icotinib hydrochloride and systematically studied the genetic factors associated with therapeutic responses and ADRs. Rs884225 (T > C) in EGFR 3′ UTR was significantly associated with lower risk of ADRs to erlotinib (p value = 0.0010, adjusted p value = 0.042). A multivariant interaction four-SNP model (rs884225 in EGFR 3′UTR, rs7787082 in ABCB1 intron, rs38845 in MET intron and rs3803300 in AKT1 5′UTR) was associated with ADRs in general and the more specific drug induced skin injury. The SNPs associated with both therapeutic responses and ADRs indicates they might share a common genetic basis. Our study provided potential biomarkers and clues for further research of biomarkers for therapeutic responses and ADRs in Chinese NSCLC patients. PMID:26988277
LipidPedia: a comprehensive lipid knowledgebase.
Kuo, Tien-Chueh; Tseng, Yufeng Jane
2018-04-10
Lipids are divided into fatty acyls, glycerolipids, glycerophospholipids, sphingolipids, saccharolipids, sterols, prenol lipids and polyketides. Fatty acyls and glycerolipids are commonly used as energy storage, whereas glycerophospholipids, sphingolipids, sterols and saccharolipids are common used as components of cell membranes. Lipids in fatty acyls, glycerophospholipids, sphingolipids and sterols classes play important roles in signaling. Although more than 36 million lipids can be identified or computationally generated, no single lipid database provides comprehensive information on lipids. Furthermore, the complex systematic or common names of lipids make the discovery of related information challenging. Here, we present LipidPedia, a comprehensive lipid knowledgebase. The content of this database is derived from integrating annotation data with full-text mining of 3,923 lipids and more than 400,000 annotations of associated diseases, pathways, functions, and locations that are essential for interpreting lipid functions and mechanisms from over 1,400,000 scientific publications. Each lipid in LipidPedia also has its own entry containing a text summary curated from the most frequently cited diseases, pathways, genes, locations, functions, lipids and experimental models in the biomedical literature. LipidPedia aims to provide an overall synopsis of lipids to summarize lipid annotations and provide a detailed listing of references for understanding complex lipid functions and mechanisms. LipidPedia is available at http://lipidpedia.cmdm.tw. yjtseng@csie.ntu.edu.tw. Supplementary data are available at Bioinformatics online.
Development of Gene Therapy for Thalassemia
Nienhuis, Arthur W.; Persons, Derek A.
2012-01-01
Retroviral vector–mediated gene transfer into hematopoietic stem cells provides a potentially curative therapy for severe β-thalassemia. Lentiviral vectors based on human immunodeficiency virus have been developed for this purpose and have been shown to be effective in curing thalassemia in mouse models. One participant in an ongoing clinical trial has achieved transfusion independence after gene transfer into bone marrow stem cells owing, in part, to a genetically modified, dominant clone. Ongoing efforts are focused on improving the efficiency of lentiviral vector–mediated gene transfer into stem cells so that the curative potential of gene transfer can be consistently achieved. PMID:23125203
Jayakrishnan, Thejus T; Nadeem, Hasan; Groeschl, Ryan T; George, Ben; Thomas, James P; Ritch, Paul S; Christians, Kathleen K; Tsai, Susan; Evans, Douglas B; Pappas, Sam G; Gamblin, T Clark; Turaga, Kiran K
2015-02-01
Laparoscopy is recommended to detect radiographically occult metastases in patients with pancreatic cancer before curative resection. This study was conducted to test the hypothesis that diagnostic laparoscopy (DL) is cost-effective in patients undergoing curative resection with or without neoadjuvant therapy (NAT). Decision tree modelling compared routine DL with exploratory laparotomy (ExLap) at the time of curative resection in resectable cancer treated with surgery first, (SF) and borderline resectable cancer treated with NAT. Costs (US$) from the payer's perspective, quality-adjusted life months (QALMs) and incremental cost-effectiveness ratios (ICERs) were calculated. Base case estimates and multi-way sensitivity analyses were performed. Willingness to pay (WtP) was US$4166/QALM (or US$50,000/quality-adjusted life year). Base case costs were US$34,921 for ExLap and US$33,442 for DL in SF patients, and US$39,633 for ExLap and US$39,713 for DL in NAT patients. Routine DL is the dominant (preferred) strategy in both treatment types: it allows for cost reductions of US$10,695/QALM in SF and US$4158/QALM in NAT patients. The present analysis supports the cost-effectiveness of routine DL before curative resection in pancreatic cancer patients treated with either SF or NAT. © 2014 International Hepato-Pancreato-Biliary Association.
Ruusmann, Villu; Maran, Uko
2013-07-01
The scientific literature is important source of experimental and chemical structure data. Very often this data has been harvested into smaller or bigger data collections leaving the data quality and curation issues on shoulders of users. The current research presents a systematic and reproducible workflow for collecting series of data points from scientific literature and assembling a database that is suitable for the purposes of high quality modelling and decision support. The quality assurance aspect of the workflow is concerned with the curation of both chemical structures and associated toxicity values at (1) single data point level and (2) collection of data points level. The assembly of a database employs a novel "timeline" approach. The workflow is implemented as a software solution and its applicability is demonstrated on the example of the Tetrahymena pyriformis acute aquatic toxicity endpoint. A literature collection of 86 primary publications for T. pyriformis was found to contain 2,072 chemical compounds and 2,498 unique toxicity values, which divide into 2,440 numerical and 58 textual values. Every chemical compound was assigned to a preferred toxicity value. Examples for most common chemical and toxicological data curation scenarios are discussed.
Curation accuracy of model organism databases
Keseler, Ingrid M.; Skrzypek, Marek; Weerasinghe, Deepika; Chen, Albert Y.; Fulcher, Carol; Li, Gene-Wei; Lemmer, Kimberly C.; Mladinich, Katherine M.; Chow, Edmond D.; Sherlock, Gavin; Karp, Peter D.
2014-01-01
Manual extraction of information from the biomedical literature—or biocuration—is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org// PMID:24923819
NASA Astrophysics Data System (ADS)
Stevens, T.
2016-12-01
NASA's Global Change Master Directory (GCMD) curates a hierarchical set of controlled vocabularies (keywords) covering Earth sciences and associated information (data centers, projects, platforms, and instruments). The purpose of the keywords is to describe Earth science data and services in a consistent and comprehensive manner, allowing for precise metadata search and subsequent retrieval of data and services. The keywords are accessible in a standardized SKOS/RDF/OWL representation and are used as an authoritative taxonomy, as a source for developing ontologies, and to search and access Earth Science data within online metadata catalogs. The keyword curation approach involves: (1) receiving community suggestions; (2) triaging community suggestions; (3) evaluating keywords against a set of criteria coordinated by the NASA Earth Science Data and Information System (ESDIS) Standards Office; (4) implementing the keywords; and (5) publication/notification of keyword changes. This approach emphasizes community input, which helps ensure a high quality, normalized, and relevant keyword structure that will evolve with users' changing needs. The Keyword Community Forum, which promotes a responsive, open, and transparent process, is an area where users can discuss keyword topics and make suggestions for new keywords. Others could potentially use this formalized approach as a model for keyword curation.
Dimitrov, Dobromir T; Kiem, Hans-Peter; Jerome, Keith R; Johnston, Christine; Schiffer, Joshua T
2016-02-24
HIV curative strategies currently under development aim to eradicate latent provirus, or prevent viral replication, progression to AIDS, and transmission. The impact of implementing curative programs on HIV epidemics has not been considered. We developed a mathematical model of heterosexual HIV transmission to evaluate the independent and synergistic impact of ART, HIV prevention interventions and cure on HIV prevalence and incidence. The basic reproduction number was calculated to study the potential for the epidemic to be eliminated. We explored scenarios with and without the assumption that patients enrolled into HIV cure programs need to be on antiretroviral treatment (ART). In our simulations, curative regimes had limited impact on HIV incidence if only ART patients were eligible for cure. Cure implementation had a significant impact on HIV incidence if ART-untreated patients were enrolled directly into cure programs. Concurrent HIV prevention programs moderately decreased the percent of ART treated or cured patients needed to achieve elimination. We project that widespread implementation of HIV cure would decrease HIV prevalence under all scenarios but would only lower rate of new infections if ART-untreated patients were targeted. Current efforts to identify untreated HIV patients will gain even further relevance upon availability of an HIV cure.
Is an aggressive surgical approach to the patient with gastric lymphoma warranted?
Rosen, C B; van Heerden, J A; Martin, J K; Wold, L E; Ilstrup, D M
1987-01-01
At the Mayo Clinic, from 1970 through 1979, 84 patients (52 males and 32 females) had abdominal exploration for primary gastric lymphoma. All patients were observed a minimum of 5 years or until death. The histologic findings for all 84 patients were reviewed. Forty-four patients had "curative resection," and 40 patients had either biopsy alone or a palliative procedure. The probability of surviving 5 years was 75% for patients after potentially curative resection and 32% for patients after biopsy and palliation (p less than 0.001). The operative mortality rate was 5% overall and 2% after potentially curative resection. Increased tumor size (p less than 0.02), increased tumor penetration (p less than 0.01), and lymph node involvement (p less than 0.02) decreased the probability of survival, whereas histologic classification did not affect survival. Radiation therapy after surgery did not significantly affect the survival rate for the entire group or the survival rate for patients who had potentially curative resection. Resectability was associated with increased patient survival--independent of other prognostic factors--when our experience was analyzed by the Cox proportional-hazards model (p less than 0.005). It was concluded that an aggressive surgical attitude in the treatment of primary gastric lymphoma is warranted. The role of radiotherapy remains in question. PMID:3592805
Lovell, Peter V; Huizinga, Nicole A; Getachew, Abel; Mees, Brianna; Friedrich, Samantha R; Wirthlin, Morgan; Mello, Claudio V
2018-05-18
Zebra finches are a major model organism for investigating mechanisms of vocal learning, a trait that enables spoken language in humans. The development of cDNA collections with expressed sequence tags (ESTs) and microarrays has allowed for extensive molecular characterizations of circuitry underlying vocal learning and production. However, poor database curation can lead to errors in transcriptome and bioinformatics analyses, limiting the impact of these resources. Here we used genomic alignments and synteny analysis for orthology verification to curate and reannotate ~ 35% of the oligonucleotides and corresponding ESTs/cDNAs that make-up Agilent microarrays for gene expression analysis in finches. We found that: (1) 5475 out of 43,084 oligos (a) failed to align to the zebra finch genome, (b) aligned to multiple loci, or (c) aligned to Chr_un only, and thus need to be flagged until a better genome assembly is available, or (d) reflect cloning artifacts; (2) Out of 9635 valid oligos examined further, 3120 were incorrectly named, including 1533 with no known orthologs; and (3) 2635 oligos required name update. The resulting curated dataset provides a reference for correcting gene identification errors in previous finch microarrays studies, and avoiding such errors in future studies.
Zix-Kieffer, I; Langer, B; Eyer, D; Acar, G; Racadot, E; Schlaeder, G; Oberlin, F; Lutz, P
1996-07-01
Congenital erythropoietic porphyria (Gunther's disease, GD) is a rare autosomal recessive disease. It results from the deficiency of uroporphyrinogen III synthase, the fourth enzyme on the metabolic pathway of heme synthesis. GD leads to severe scarring of the face and hands as a result of photosensitivity and fragility of the skin due to uroporphyrin I and coproporphyrin I accumulation. It also causes erythrocyte fragility leading to haemolytic anaemia. The other clinical features include hirsutism, red discolouration of teeth, finger-nails and urine and stunted growth. The outcome is poor, and the disfiguring nature of GD may partly explain the legend of the werewolf. No curative treatment was known until 1991, when the first case of BMT in GD was reported. The clinical and biological outcome after transplantation was encouraging, with an important regression of the symptoms of the disease, but the child died of CMV-infection 11 months after BMT. We report the second case of GD treated successfully by stem cell transplantation using umbilical cord blood from an HLA-identical brother in a 4-year-old girl suffering from severe GD. Our patient is very well 10 months after transplantation. We confirm that stem cell transplantation is curative for GD.
Lu, Zhiyong
2012-01-01
Today’s biomedical research has become heavily dependent on access to the biological knowledge encoded in expert curated biological databases. As the volume of biological literature grows rapidly, it becomes increasingly difficult for biocurators to keep up with the literature because manual curation is an expensive and time-consuming endeavour. Past research has suggested that computer-assisted curation can improve efficiency, but few text-mining systems have been formally evaluated in this regard. Through participation in the interactive text-mining track of the BioCreative 2012 workshop, we developed PubTator, a PubMed-like system that assists with two specific human curation tasks: document triage and bioconcept annotation. On the basis of evaluation results from two external user groups, we find that the accuracy of PubTator-assisted curation is comparable with that of manual curation and that PubTator can significantly increase human curatorial speed. These encouraging findings warrant further investigation with a larger number of publications to be annotated. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/ PMID:23160414
The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases
Orchard, Sandra; Ammari, Mais; Aranda, Bruno; Breuza, Lionel; Briganti, Leonardo; Broackes-Carter, Fiona; Campbell, Nancy H.; Chavali, Gayatri; Chen, Carol; del-Toro, Noemi; Duesbury, Margaret; Dumousseau, Marine; Galeota, Eugenia; Hinz, Ursula; Iannuccelli, Marta; Jagannathan, Sruthi; Jimenez, Rafael; Khadake, Jyoti; Lagreid, Astrid; Licata, Luana; Lovering, Ruth C.; Meldal, Birgit; Melidoni, Anna N.; Milagros, Mila; Peluso, Daniele; Perfetto, Livia; Porras, Pablo; Raghunath, Arathi; Ricard-Blum, Sylvie; Roechert, Bernd; Stutz, Andre; Tognolli, Michael; van Roey, Kim; Cesareni, Gianni; Hermjakob, Henning
2014-01-01
IntAct (freely available at http://www.ebi.ac.uk/intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMEx- and MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriate levels of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium (http://www.imexconsortium.org). PMID:24234451
HypoxiaDB: a database of hypoxia-regulated proteins
Khurana, Pankaj; Sugadev, Ragumani; Jain, Jaspreet; Singh, Shashi Bala
2013-01-01
There has been intense interest in the cellular response to hypoxia, and a large number of differentially expressed proteins have been identified through various high-throughput experiments. These valuable data are scattered, and there have been no systematic attempts to document the various proteins regulated by hypoxia. Compilation, curation and annotation of these data are important in deciphering their role in hypoxia and hypoxia-related disorders. Therefore, we have compiled HypoxiaDB, a database of hypoxia-regulated proteins. It is a comprehensive, manually-curated, non-redundant catalog of proteins whose expressions are shown experimentally to be altered at different levels and durations of hypoxia. The database currently contains 72 000 manually curated entries taken on 3500 proteins extracted from 73 peer-reviewed publications selected from PubMed. HypoxiaDB is distinctive from other generalized databases: (i) it compiles tissue-specific protein expression changes under different levels and duration of hypoxia. Also, it provides manually curated literature references to support the inclusion of the protein in the database and establish its association with hypoxia. (ii) For each protein, HypoxiaDB integrates data on gene ontology, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, protein–protein interactions, protein family (Pfam), OMIM (Online Mendelian Inheritance in Man), PDB (Protein Data Bank) structures and homology to other sequenced genomes. (iii) It also provides pre-compiled information on hypoxia-proteins, which otherwise requires tedious computational analysis. This includes information like chromosomal location, identifiers like Entrez, HGNC, Unigene, Uniprot, Ensembl, Vega, GI numbers and Genbank accession numbers associated with the protein. These are further cross-linked to respective public databases augmenting HypoxiaDB to the external repositories. (iv) In addition, HypoxiaDB provides an online sequence-similarity search tool for users to compare their protein sequences with HypoxiaDB protein database. We hope that HypoxiaDB will enrich our knowledge about hypoxia-related biology and eventually will lead to the development of novel hypothesis and advancements in diagnostic and therapeutic activities. HypoxiaDB is freely accessible for academic and non-profit users via http://www.hypoxiadb.com. Database URL: http://www.hypoxiadb.com PMID:24178989
Culto: AN Ontology-Based Annotation Tool for Data Curation in Cultural Heritage
NASA Astrophysics Data System (ADS)
Garozzo, R.; Murabito, F.; Santagati, C.; Pino, C.; Spampinato, C.
2017-08-01
This paper proposes CulTO, a software tool relying on a computational ontology for Cultural Heritage domain modelling, with a specific focus on religious historical buildings, for supporting cultural heritage experts in their investigations. It is specifically thought to support annotation, automatic indexing, classification and curation of photographic data and text documents of historical buildings. CULTO also serves as a useful tool for Historical Building Information Modeling (H-BIM) by enabling semantic 3D data modeling and further enrichment with non-geometrical information of historical buildings through the inclusion of new concepts about historical documents, images, decay or deformation evidence as well as decorative elements into BIM platforms. CulTO is the result of a joint research effort between the Laboratory of Surveying and Architectural Photogrammetry "Luigi Andreozzi" and the PeRCeiVe Lab (Pattern Recognition and Computer Vision Lab) of the University of Catania,
On expert curation and scalability: UniProtKB/Swiss-Prot as a case study
Arighi, Cecilia N; Magrane, Michele; Bateman, Alex; Wei, Chih-Hsuan; Lu, Zhiyong; Boutet, Emmanuel; Bye-A-Jee, Hema; Famiglietti, Maria Livia; Roechert, Bernd; UniProt Consortium, The
2017-01-01
Abstract Motivation Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. Results With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000–10 000 papers are curated in UniProt each year while curators evaluate 50 000–70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2–3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. Availability and implementation UniProt is freely available at http://www.uniprot.org/. Contact sylvain.poux@sib.swiss Supplementary information Supplementary data are available at Bioinformatics online. PMID:29036270
The curation of genetic variants: difficulties and possible solutions.
Pandey, Kapil Raj; Maden, Narendra; Poudel, Barsha; Pradhananga, Sailendra; Sharma, Amit Kumar
2012-12-01
The curation of genetic variants from biomedical articles is required for various clinical and research purposes. Nowadays, establishment of variant databases that include overall information about variants is becoming quite popular. These databases have immense utility, serving as a user-friendly information storehouse of variants for information seekers. While manual curation is the gold standard method for curation of variants, it can turn out to be time-consuming on a large scale thus necessitating the need for automation. Curation of variants described in biomedical literature may not be straightforward mainly due to various nomenclature and expression issues. Though current trends in paper writing on variants is inclined to the standard nomenclature such that variants can easily be retrieved, we have a massive store of variants in the literature that are present as non-standard names and the online search engines that are predominantly used may not be capable of finding them. For effective curation of variants, knowledge about the overall process of curation, nature and types of difficulties in curation, and ways to tackle the difficulties during the task are crucial. Only by effective curation, can variants be correctly interpreted. This paper presents the process and difficulties of curation of genetic variants with possible solutions and suggestions from our work experience in the field including literature support. The paper also highlights aspects of interpretation of genetic variants and the importance of writing papers on variants following standard and retrievable methods. Copyright © 2012. Published by Elsevier Ltd.
The Curation of Genetic Variants: Difficulties and Possible Solutions
Pandey, Kapil Raj; Maden, Narendra; Poudel, Barsha; Pradhananga, Sailendra; Sharma, Amit Kumar
2012-01-01
The curation of genetic variants from biomedical articles is required for various clinical and research purposes. Nowadays, establishment of variant databases that include overall information about variants is becoming quite popular. These databases have immense utility, serving as a user-friendly information storehouse of variants for information seekers. While manual curation is the gold standard method for curation of variants, it can turn out to be time-consuming on a large scale thus necessitating the need for automation. Curation of variants described in biomedical literature may not be straightforward mainly due to various nomenclature and expression issues. Though current trends in paper writing on variants is inclined to the standard nomenclature such that variants can easily be retrieved, we have a massive store of variants in the literature that are present as non-standard names and the online search engines that are predominantly used may not be capable of finding them. For effective curation of variants, knowledge about the overall process of curation, nature and types of difficulties in curation, and ways to tackle the difficulties during the task are crucial. Only by effective curation, can variants be correctly interpreted. This paper presents the process and difficulties of curation of genetic variants with possible solutions and suggestions from our work experience in the field including literature support. The paper also highlights aspects of interpretation of genetic variants and the importance of writing papers on variants following standard and retrievable methods. PMID:23317699
Pathway Distiller - multisource biological pathway consolidation
2012-01-01
Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. Conclusions By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments. PMID:23134636
Pathway Distiller - multisource biological pathway consolidation.
Doderer, Mark S; Anguiano, Zachry; Suresh, Uthra; Dashnamoorthy, Ravi; Bishop, Alexander J R; Chen, Yidong
2012-01-01
One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.
Investigating Astromaterials Curation Applications for Dexterous Robotic Arms
NASA Technical Reports Server (NTRS)
Snead, C. J.; Jang, J. H.; Cowden, T. R.; McCubbin, F. M.
2018-01-01
The Astromaterials Acquisition and Curation office at NASA Johnson Space Center is currently investigating tools and methods that will enable the curation of future astromaterials collections. Size and temperature constraints for astromaterials to be collected by current and future proposed missions will require the development of new robotic sample and tool handling capabilities. NASA Curation has investigated the application of robot arms in the past, and robotic 3-axis micromanipulators are currently in use for small particle curation in the Stardust and Cosmic Dust laboratories. While 3-axis micromanipulators have been extremely successful for activities involving the transfer of isolated particles in the 5-20 micron range (e.g. from microscope slide to epoxy bullet tip, beryllium SEM disk), their limited ranges of motion and lack of yaw, pitch, and roll degrees of freedom restrict their utility in other applications. For instance, curators removing particles from cosmic dust collectors by hand often employ scooping and rotating motions to successfully free trapped particles from the silicone oil coatings. Similar scooping and rotating motions are also employed when isolating a specific particle of interest from an aliquot of crushed meteorite. While cosmic dust curators have been remarkably successful with these kinds of particle manipulations using handheld tools, operator fatigue limits the number of particles that can be removed during a given extraction session. The challenges for curation of small particles will be exacerbated by mission requirements that samples be processed in N2 sample cabinets (i.e. gloveboxes). We have been investigating the use of compact robot arms to facilitate sample handling within gloveboxes. Six-axis robot arms potentially have applications beyond small particle manipulation. For instance, future sample return missions may involve biologically sensitive astromaterials that can be easily compromised by physical interaction with a curator; other potential future returned samples may require cryogenic curation. Robot arms may be combined with high resolution cameras within a sample cabinet and controlled remotely by curator. Sophisticated robot arm and hand combination systems can be programmed to mimic the movements of a curator wearing a data glove; successful implementation of such a system may ultimately allow a curator to virtually operate in a nitrogen, cryogenic, or biologically sensitive environment with dexterity comparable to that of a curator physically handling samples in a glove box.
The Astromaterials X-Ray Computed Tomography Laboratory at Johnson Space Center
NASA Technical Reports Server (NTRS)
Zeigler, R. A.; Coleff, D. M.; McCubbin, F. M.
2017-01-01
The Astromaterials Acquisition and Curation Office at NASA's Johnson Space Center (hereafter JSC curation) is the past, present, and future home of all of NASA's astromaterials sample collections. JSC curation currently houses all or part of nine different sample collections: (1) Apollo samples (1969), (2) Lunar samples (1972), (3) Antarctic meteorites (1976), (4) Cosmic Dust particles (1981), (5) Microparticle Impact Collection (1985), (6) Genesis solar wind atoms (2004); (7) Stardust comet Wild-2 particles (2006), (8) Stardust interstellar particles (2006), and (9) Hayabusa asteroid Itokawa particles (2010). Each sample collection is housed in a dedicated clean room, or suite of clean rooms, that is tailored to the requirements of that sample collection. Our primary goals are to maintain the long-term integrity of the samples and ensure that the samples are distributed for scientific study in a fair, timely, and responsible manner, thus maximizing the return on each sample. Part of the curation process is planning for the future, and we also perform fundamental research in advanced curation initiatives. Advanced Curation is tasked with developing procedures, technology, and data sets necessary for curating new types of sample collections, or getting new results from existing sample collections [2]. We are (and have been) planning for future curation, including cold curation, extended curation of ices and volatiles, curation of samples with special chemical considerations such as perchlorate-rich samples, and curation of organically- and biologically-sensitive samples. As part of these advanced curation efforts we are augmenting our analytical facilities as well. A micro X-Ray computed tomography (micro-XCT) laboratory dedicated to the study of astromaterials will be coming online this spring within the JSC Curation office, and we plan to add additional facilities that will enable nondestructive (or minimally-destructive) analyses of astromaterials in the near future (micro-XRF, confocal imaging Raman Spectroscopy). These facilities will be available to: (1) develop sample handling and storage techniques for future sample return missions; (2) be utilized by PET for future sample return missions; (3) be used for retroactive PET (Positron Emission Tomography)-style analyses of our existing collections; and (4) for periodic assessments of the existing sample collections. Here we describe the new micro-XCT system, as well as some of the ongoing or anticipated applications of the instrument.
Two Analogues of Fenarimol Show Curative Activity in an Experimental Model of Chagas Disease
2013-01-01
Chagas disease, caused by the protozoan parasite Trypanosoma cruzi (T. cruzi), is an increasing threat to global health. Available medicines were introduced over 40 years ago, have undesirable side effects, and give equivocal results of cure in the chronic stage of the disease. We report the development of two compounds, 6 and (S)-7, with PCR-confirmed curative activity in a mouse model of established T. cruzi infection after once daily oral dosing for 20 days at 20 mg/kg 6 and 10 mg/kg (S)-7. Compounds 6 and (S)-7 have potent in vitro activity, are noncytotoxic, show no adverse effects in vivo following repeat dosing, are prepared by a short synthetic route, and have druglike properties suitable for preclinical development. PMID:24304150
Ambure, Pravin; Bhat, Jyotsna; Puzyn, Tomasz; Roy, Kunal
2018-04-23
Alzheimer's disease (AD) is a multi-factorial disease, which can be simply outlined as an irreversible and progressive neurodegenerative disorder with an unclear root cause. It is a major cause of dementia in old aged people. In the present study, utilizing the structural and biological activity information of ligands for five important and mostly studied vital targets (i.e. cyclin-dependant kinase 5, β-secretase, monoamine oxidase B, glycogen synthase kinase 3β, acetylcholinesterase) that are believed to be effective against AD, we have developed five classification models using linear discriminant analysis (LDA) technique. Considering the importance of data curation, we have given more attention towards the chemical and biological data curation, which is a difficult task especially in case of big data-sets. Thus, to ease the curation process we have designed Konstanz Information Miner (KNIME) workflows, which are made available at http://teqip.jdvu.ac.in/QSAR_Tools/ . The developed models were appropriately validated based on the predictions for experiment derived data from test sets, as well as true external set compounds including known multi-target compounds. The domain of applicability for each classification model was checked based on a confidence estimation approach. Further, these validated models were employed for screening of natural compounds collected from the InterBioScreen natural database ( https://www.ibscreen.com/natural-compounds ). Further, the natural compounds that were categorized as 'actives' in at least two classification models out of five developed models were considered as multi-target leads, and these compounds were further screened using the drug-like filter, molecular docking technique and then thoroughly analyzed using molecular dynamics studies. Finally, the most potential multi-target natural compounds against AD are suggested.
Zhong, Fang; Liu, Xia; Zhou, Qiao; Hao, Xu; Lu, Ying; Guo, Shanmai; Wang, Weiming; Lin, Donghai; Chen, Nan
2012-02-01
The number of patients with chronic kidney disease (CKD) is continuously growing worldwide. Treatment with traditional Chinese medicine might slow the progression of CKD. In this study, we evaluated the renal protective effects of the Chinese herb Cordyceps sinensis in rats with 5/6 nephrectomy. Male Sprague-Dawley mice (weighing 150-200 g) were subjected to 5/6 nephrectomy. The rats were divided into three groups: (i) untreated nephrectomized group (OP group, n = 16), (ii) oral administration of C. sinensis-treated (4 mg/kg/day) nephrectomized group (CS group, n = 16) and (iii) sham-operated group (SO group, n = 16). The rats were sacrificed at 4 and 8 weeks after 5/6 nephrectomy, and the kidneys, serum and urine were collected for (1)H nuclear magnetic resonance spectral analysis. Multivariate statistical techniques and statistical metabolic correlation comparison analysis were performed to identify metabolic changes in aqueous kidney extracts between these groups. Significant differences between these groups were discovered in the metabolic profiles of the biofluids and kidney extracts. Pathways including the citrate cycle, branched-chain amino acid metabolism and the metabolites that regulate permeate pressure were disturbed in the OP group compared to the SO group; in addition, these pathways were reversed by C. sinensis treatment. Biochemistry and electron microscopic images verified that C. sinensis has curative effects on chronic renal failure. These results were confirmed by metabonomics results. Our study demonstrates that C. sinensis has potential curative effects on CKD, and our metabonomics results provided new insight into the mechanism of treatment of this traditional Chinese medicine.
Triage by ranking to support the curation of protein interactions
Pasche, Emilie; Gobeill, Julien; Rech de Laval, Valentine; Gleizes, Anne; Michel, Pierre-André; Bairoch, Amos
2017-01-01
Abstract Today, molecular biology databases are the cornerstone of knowledge sharing for life and health sciences. The curation and maintenance of these resources are labour intensive. Although text mining is gaining impetus among curators, its integration in curation workflow has not yet been widely adopted. The Swiss Institute of Bioinformatics Text Mining and CALIPHO groups joined forces to design a new curation support system named nextA5. In this report, we explore the integration of novel triage services to support the curation of two types of biological data: protein–protein interactions (PPIs) and post-translational modifications (PTMs). The recognition of PPIs and PTMs poses a special challenge, as it not only requires the identification of biological entities (proteins or residues), but also that of particular relationships (e.g. binding or position). These relationships cannot be described with onto-terminological descriptors such as the Gene Ontology for molecular functions, which makes the triage task more challenging. Prioritizing papers for these tasks thus requires the development of different approaches. In this report, we propose a new method to prioritize articles containing information specific to PPIs and PTMs. The new resources (RESTful APIs, semantically annotated MEDLINE library) enrich the neXtA5 platform. We tuned the article prioritization model on a set of 100 proteins previously annotated by the CALIPHO group. The effectiveness of the triage service was tested with a dataset of 200 annotated proteins. We defined two sets of descriptors to support automatic triage: the first set to enrich for papers with PPI data, and the second for PTMs. All occurrences of these descriptors were marked-up in MEDLINE and indexed, thus constituting a semantically annotated version of MEDLINE. These annotations were then used to estimate the relevance of a particular article with respect to the chosen annotation type. This relevance score was combined with a local vector-space search engine to generate a ranked list of PMIDs. We also evaluated a query refinement strategy, which adds specific keywords (such as ‘binds’ or ‘interacts’) to the original query. Compared to PubMed, the search effectiveness of the nextA5 triage service is improved by 190% for the prioritization of papers with PPIs information and by 260% for papers with PTMs information. Combining advanced retrieval and query refinement strategies with automatically enriched MEDLINE contents is effective to improve triage in complex curation tasks such as the curation of protein PPIs and PTMs. Database URL: http://candy.hesge.ch/nextA5 PMID:29220432
Reasons doctors provide futile treatment at the end of life: a qualitative study.
Willmott, Lindy; White, Benjamin; Gallois, Cindy; Parker, Malcolm; Graves, Nicholas; Winch, Sarah; Callaway, Leonie Kaye; Shepherd, Nicole; Close, Eliana
2016-08-01
Futile treatment, which by definition cannot benefit a patient, is undesirable. This research investigated why doctors believe that treatment that they consider to be futile is sometimes provided at the end of a patient's life. Semistructured in-depth interviews. Three large tertiary public hospitals in Brisbane, Australia. 96 doctors from emergency, intensive care, palliative care, oncology, renal medicine, internal medicine, respiratory medicine, surgery, cardiology, geriatric medicine and medical administration departments. Participants were recruited using purposive maximum variation sampling. Doctors attributed the provision of futile treatment to a wide range of inter-related factors. One was the characteristics of treating doctors, including their orientation towards curative treatment, discomfort or inexperience with death and dying, concerns about legal risk and poor communication skills. Second, the attributes of the patient and family, including their requests or demands for further treatment, prognostic uncertainty and lack of information about patient wishes. Third, there were hospital factors including a high degree of specialisation, the availability of routine tests and interventions, and organisational barriers to diverting a patient from a curative to a palliative pathway. Doctors nominated family or patient request and doctors being locked into a curative role as the main reasons for futile care. Doctors believe that a range of factors contribute to the provision of futile treatment. A combination of strategies is necessary to reduce futile treatment, including better training for doctors who treat patients at the end of life, educating the community about the limits of medicine and the need to plan for death and dying, and structural reform at the hospital level. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Expanding on Successful Concepts, Models, and Organization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Teeguarden, Justin G.; Tan, Yu-Mei; Edwards, Stephen W.
In her letter to the editor1 regarding our recent Feature Article “Completing the Link between Exposure Science and Toxicology for Improved Environmental Health Decision Making: The Aggregate Exposure Pathway Framework” 2, Dr. von Göetz expressed several concerns about terminology, and the perception that we propose the replacement of successful approaches and models for exposure assessment with a concept. We are glad to have the opportunity to address these issues here. If the goal of the AEP framework was to replace existing exposure models or databases for organizing exposure data with a concept, we would share Dr. von Göetz concerns. Instead,more » the outcome we promote is broader use of an organizational framework for exposure science. The framework would support improved generation, organization, and interpretation of data as well as modeling and prediction, not replacement of models. The field of toxicology has seen the benefits of wide use of one or more organizational frameworks (e.g., mode and mechanism of action, adverse outcome pathway). These frameworks influence how experiments are designed, data are collected, curated, stored and interpreted and ultimately how data are used in risk assessment. Exposure science is poised to similarly benefit from broader use of a parallel organizational framework, which Dr. von Göetz correctly points out, is currently used in the exposure modeling community. In our view, the concepts used so effectively in the exposure modeling community, expanded upon in the AEP framework, could see wider adoption by the field as a whole. The value of such a framework was recognized by the National Academy of Sciences.3 Replacement of models, databases, or any application with the AEP framework was not proposed in our article. The positive role broader more consistent use of such a framework might have in enabling and advancing “general activities such as data acquisition, organization…,” and exposure modeling was discussed in some detail. Like Dr. von Göetz, we recognized the challenges associated with acceptance of the terminology, definitions, and structure proposed in the paper. To address these challenges, an expert workshop was held in May, 2016 to consider and revise the “basic elements” outlined in the paper. The attendees produced revisions to the terminology (e.g., key events) that align with terminology currently in use in the field. We were also careful in our paper to acknowledge a point raised by Dr. von Göetz, that the term AEP implies aggregation, providing these clarifications: “The simplest form of an AEP represents a single source and a single pathway and may more commonly be referred to as an exposure pathway,”; and “An aggregate exposure pathway may represent multiple sources and transfer through single pathways to the TSE, single sources and transfer through multiple pathways to the target site exposure (TSE), or any combination of these.” These clarifications address the concern that the AEP term is not accurate or logical, and further expands upon the word “aggregate” in a broader context. Our use of AEP is consistent with the definition for “aggregate exposure”, which refers to the combined exposures to a single chemical across multiple routes and pathways.3 The AEP framework embraces existing methods for collection, prediction, organization, and interpretation of human and ecological exposure data cited by Dr. von Göetz. We remain hopeful that wider recognition and use of an organizing concept for exposure information across the exposure science, toxicology and epidemiology communities advances the development of the kind of infrastructure and models Dr. von Göetz discusses. This outcome would be a step forward, rather than a step backward.« less
Quality of Computationally Inferred Gene Ontology Annotations
Škunca, Nives; Altenhoff, Adrian; Dessimoz, Christophe
2012-01-01
Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon—an important outcome given that >98% of all annotations are inferred without direct curation. PMID:22693439
Saccharomyces genome database informs human biology.
Skrzypek, Marek S; Nash, Robert S; Wong, Edith D; MacPherson, Kevin A; Hellerstedt, Sage T; Engel, Stacia R; Karra, Kalpana; Weng, Shuai; Sheppard, Travis K; Binkley, Gail; Simison, Matt; Miyasato, Stuart R; Cherry, J Michael
2018-01-04
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.
Marchler-Bauer, Aron; Bo, Yu; Han, Lianyi; He, Jane; Lanczycki, Christopher J; Lu, Shennan; Chitsaz, Farideh; Derbyshire, Myra K; Geer, Renata C; Gonzales, Noreen R; Gwadz, Marc; Hurwitz, David I; Lu, Fu; Marchler, Gabriele H; Song, James S; Thanki, Narmada; Wang, Zhouxi; Yamashita, Roxanne A; Zhang, Dachuan; Zheng, Chanjuan; Geer, Lewis Y; Bryant, Stephen H
2017-01-04
NCBI's Conserved Domain Database (CDD) aims at annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints, and functional sites inferred from such footprints. An archive of pre-computed domain annotation is maintained for proteins tracked by NCBI's Entrez database, and live search services are offered as well. CDD curation staff supplements a comprehensive collection of protein domain and protein family models, which have been imported from external providers, with representations of selected domain families that are curated in-house and organized into hierarchical classifications of functionally distinct families and sub-families. CDD also supports comparative analyses of protein families via conserved domain architectures, and a recent curation effort focuses on providing functional characterizations of distinct subfamily architectures using SPARCLE: Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Heylen, Marthe; Ruyssers, Nathalie E.; De Man, Joris G.; Timmermans, Jean-Pierre; Pelckmans, Paul A.; Moreels, Tom G.; De Winter, Benedicte Y.
2014-01-01
Although helminthic therapy as a possible new option to treat inflammatory bowel disease is a well-established concept by now, the search for immunomodulatory helminth-derived compounds and their mechanisms of action is still ongoing. We investigated the therapeutic potential and the underlying immunological mechanisms of Schistosoma mansoni soluble worm proteins (SmSWP) in an adoptive T cell transfer mouse model of chronic colitis. Both a curative and a preventive treatment protocol were included in this study. The curative administration of SmSWP (started when colitis was established), resulted in a significant improvement of the clinical disease score, colonoscopy, macroscopic and microscopic inflammation score, colon length and myeloperoxidase activity. The therapeutic potential of the preventive SmSWP treatment (started before colitis was established), was less pronounced compared with the curative SmSWP treatment but still resulted in an improved clinical disease score, body weight loss, colon length and microscopic inflammation score. Both the curative and preventive SmSWP treatment downregulated the mRNA expression of the proinflammatory cytokines IFN-γ and IL-17A and upregulated the mRNA expression of the anti-inflammatory cytokine IL-4 in the colon at the end of the experiment. This colonic immunomodulatory effect of SmSWP could not be confirmed at the protein level. Moreover, the effect of SmSWP appeared to be a local colonic phenomenon, since the flow cytometric T cell characterization of the mesenteric lymph nodes and the cytokine measurements in the serum did not reveal any effect of SmSWP treatment. In conclusion, SmSWP treatment reduced the severity of colitis in the adoptive transfer mouse model via the suppression of proinflammatory cytokines and the induction of an anti-inflammatory response in the colon. PMID:25313594
The construction of QSAR models is critically dependent on the quality of available data. As part of our efforts to develop public platforms to provide access to predictive models, we have attempted to discriminate the influence of the quality versus quantity of data available ...
Wang, Xinyu; Su, Shaofei; Jiang, Hao; Wang, Jiaying; Li, Xi; Liu, Meina
2018-05-01
To examine the short- and long-term effect of clinical pathway for non-small cell lung cancer surgery on the length of stay, the compliance of quality indicators and risk-adjusted post-operative complication rate. A retrospective quasi-experimental study from June 2011 to October 2015. A tertiary cancer hospital in China. Patients diagnosed as non-small cell lung cancer who underwent curative resection. Clinical pathway was implemented at January 2013. Hence, the study period was divided into three periods: pre-pathway, from June 2011 to December 2012; short-term period, from January 2013 to December 2013; long-term period, from January 2014 to October 2015. Three length of hospital stay indicators, four process performance indicators and one outcome indicator. ITS showed there was a significant decline of 2 days (P = 0.0421) for total length of stay and 2.23 days (P = 0.0199) for post-operative length of stay right after the implementation of clinical pathway. Short-term level changes were found in the compliance rate of required number of lymph node sampling (-8.08%, P = 0.0392), and risk-adjusted complication rate (9.02%, P = 0.0001). There were no statistically significant changes in other quality of care indicators. The clinical pathway had a positive impact on the length of stay but showed a transient negative effect on complication rate and the quality of lymph node sampling.
Annotation of phenotypic diversity: decoupling data curation and ontology curation using Phenex.
Balhoff, James P; Dahdul, Wasila M; Dececchi, T Alexander; Lapp, Hilmar; Mabee, Paula M; Vision, Todd J
2014-01-01
Phenex (http://phenex.phenoscape.org/) is a desktop application for semantically annotating the phenotypic character matrix datasets common in evolutionary biology. Since its initial publication, we have added new features that address several major bottlenecks in the efficiency of the phenotype curation process: allowing curators during the data curation phase to provisionally request terms that are not yet available from a relevant ontology; supporting quality control against annotation guidelines to reduce later manual review and revision; and enabling the sharing of files for collaboration among curators. We decoupled data annotation from ontology development by creating an Ontology Request Broker (ORB) within Phenex. Curators can use the ORB to request a provisional term for use in data annotation; the provisional term can be automatically replaced with a permanent identifier once the term is added to an ontology. We added a set of annotation consistency checks to prevent common curation errors, reducing the need for later correction. We facilitated collaborative editing by improving the reliability of Phenex when used with online folder sharing services, via file change monitoring and continual autosave. With the addition of these new features, and in particular the Ontology Request Broker, Phenex users have been able to focus more effectively on data annotation. Phenoscape curators using Phenex have reported a smoother annotation workflow, with much reduced interruptions from ontology maintenance and file management issues.
A Window to the World: Lessons Learned from NASA's Collaborative Metadata Curation Effort
NASA Astrophysics Data System (ADS)
Bugbee, K.; Dixon, V.; Baynes, K.; Shum, D.; le Roux, J.; Ramachandran, R.
2017-12-01
Well written descriptive metadata adds value to data by making data easier to discover as well as increases the use of data by providing the context or appropriateness of use. While many data centers acknowledge the importance of correct, consistent and complete metadata, allocating resources to curate existing metadata is often difficult. To lower resource costs, many data centers seek guidance on best practices for curating metadata but struggle to identify those recommendations. In order to assist data centers in curating metadata and to also develop best practices for creating and maintaining metadata, NASA has formed a collaborative effort to improve the Earth Observing System Data and Information System (EOSDIS) metadata in the Common Metadata Repository (CMR). This effort has taken significant steps in building consensus around metadata curation best practices. However, this effort has also revealed gaps in EOSDIS enterprise policies and procedures within the core metadata curation task. This presentation will explore the mechanisms used for building consensus on metadata curation, the gaps identified in policies and procedures, the lessons learned from collaborating with both the data centers and metadata curation teams, and the proposed next steps for the future.
Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier
2003-01-01
The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.
PathVisio 3: an extendable pathway analysis toolbox.
Kutmon, Martina; van Iersel, Martijn P; Bohler, Anwesha; Kelder, Thomas; Nunes, Nuno; Pico, Alexander R; Evelo, Chris T
2015-02-01
PathVisio is a commonly used pathway editor, visualization and analysis software. Biological pathways have been used by biologists for many years to describe the detailed steps in biological processes. Those powerful, visual representations help researchers to better understand, share and discuss knowledge. Since the first publication of PathVisio in 2008, the original paper was cited more than 170 times and PathVisio was used in many different biological studies. As an online editor PathVisio is also integrated in the community curated pathway database WikiPathways. Here we present the third version of PathVisio with the newest additions and improvements of the application. The core features of PathVisio are pathway drawing, advanced data visualization and pathway statistics. Additionally, PathVisio 3 introduces a new powerful extension systems that allows other developers to contribute additional functionality in form of plugins without changing the core application. PathVisio can be downloaded from http://www.pathvisio.org and in 2014 PathVisio 3 has been downloaded over 5,500 times. There are already more than 15 plugins available in the central plugin repository. PathVisio is a freely available, open-source tool published under the Apache 2.0 license (http://www.apache.org/licenses/LICENSE-2.0). It is implemented in Java and thus runs on all major operating systems. The code repository is available at http://svn.bigcat.unimaas.nl/pathvisio. The support mailing list for users is available on https://groups.google.com/forum/#!forum/wikipathways-discuss and for developers on https://groups.google.com/forum/#!forum/wikipathways-devel.
Linking microarray reporters with protein functions.
Gaj, Stan; van Erk, Arie; van Haaften, Rachel I M; Evelo, Chris T A
2007-09-26
The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/.
Niemeyer, Charlotte M.
2014-01-01
RAS genes encode a family of 21 kDa proteins that are an essential hub for a number of survival, proliferation, differentiation and senescence pathways. Signaling of the RAS-GTPases through the RAF-MEK-ERK pathway, the first identified mitogen-associated protein kinase (MAPK) cascade is essential in development. A group of genetic syndromes, named “RASopathies”, had been identified which are caused by heterozygosity for germline mutations in genes that encode protein components of the RAS/MAPK pathway. Several of these clinically overlapping disorders, including Noonan syndrome, Noonan-like CBL syndrome, Costello syndrome, cardio-facio-cutaneous (CFC) syndrome, neurofibromatosis type I, and Legius syndrome, predispose to cancer and abnormal myelopoiesis in infancy. This review focuses on juvenile myelomonocytic leukemia (JMML), a malignancy of early childhood characterized by initiating germline and/or somatic mutations in five genes of the RAS/MAPK pathway: PTPN11, CBL, NF-1, KRAS and NRAS. Natural courses of these five subtypes differ, although hematopoietic stem cell transplantation remains the only curative therapy option for most children with JMML. With whole-exome sequencing studies revealing few secondary lesions it will be crucial to better understand the RAS/MAPK signaling network with its crosstalks and feed-back loops to carefully design early clinical trials with novel pharmacological agents in this still puzzling leukemia. PMID:25420281
ReactPRED: a tool to predict and analyze biochemical reactions.
Sivakumar, Tadi Venkata; Giri, Varun; Park, Jin Hwan; Kim, Tae Yong; Bhaduri, Anirban
2016-11-15
Biochemical pathways engineering is often used to synthesize or degrade target chemicals. In silico screening of the biochemical transformation space allows predicting feasible reactions, constituting these pathways. Current enabling tools are customized to predict reactions based on pre-defined biochemical transformations or reaction rule sets. Reaction rule sets are usually curated manually and tailored to specific applications. They are not exhaustive. In addition, current systems are incapable of regulating and refining data with an aim to tune specificity and sensitivity. A robust and flexible tool that allows automated reaction rule set creation along with regulated pathway prediction and analyses is a need. ReactPRED aims to address the same. ReactPRED is an open source flexible and customizable tool enabling users to predict biochemical reactions and pathways. The tool allows automated reaction rule creation from a user defined reaction set. Additionally, reaction rule degree and rule tolerance features allow refinement of predicted data. It is available as a flexible graphical user interface and a console application. ReactPRED is available at: https://sourceforge.net/projects/reactpred/ CONTACT: anirban.b@samsung.com or ty76.kim@samsung.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta
2016-01-01
BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation.Database URL: https://www.biosharing.org. © The Author(s) 2016. Published by Oxford University Press.
Sharing behavioral data through a grid infrastructure using data standards
Min, Hua; Ohira, Riki; Collins, Michael A; Bondy, Jessica; Avis, Nancy E; Tchuvatkina, Olga; Courtney, Paul K; Moser, Richard P; Shaikh, Abdul R; Hesse, Bradford W; Cooper, Mary; Reeves, Dianne; Lanese, Bob; Helba, Cindy; Miller, Suzanne M; Ross, Eric A
2014-01-01
Objective In an effort to standardize behavioral measures and their data representation, the present study develops a methodology for incorporating measures found in the National Cancer Institute's (NCI) grid-enabled measures (GEM) portal, a repository for behavioral and social measures, into the cancer data standards registry and repository (caDSR). Methods The methodology consists of four parts for curating GEM measures into the caDSR: (1) develop unified modeling language (UML) models for behavioral measures; (2) create common data elements (CDE) for UML components; (3) bind CDE with concepts from the NCI thesaurus; and (4) register CDE in the caDSR. Results UML models have been developed for four GEM measures, which have been registered in the caDSR as CDE. New behavioral concepts related to these measures have been created and incorporated into the NCI thesaurus. Best practices for representing measures using UML models have been utilized in the practice (eg, caDSR). One dataset based on a GEM-curated measure is available for use by other systems and users connected to the grid. Conclusions Behavioral and population science data can be standardized by using and extending current standards. A new branch of CDE for behavioral science was developed for the caDSR. It expands the caDSR domain coverage beyond the clinical and biological areas. In addition, missing terms and concepts specific to the behavioral measures addressed in this paper were added to the NCI thesaurus. A methodology was developed and refined for curation of behavioral and population science data. PMID:24076749
Automated workflows for data curation and standardization of chemical structures for QSAR modeling
Large collections of chemical structures and associated experimental data are publicly available, and can be used to build robust QSAR models for applications in different fields. One common concern is the quality of both the chemical structure information and associated experime...
Curating NASA's future extraterrestrial sample collections: How do we achieve maximum proficiency?
NASA Astrophysics Data System (ADS)
McCubbin, Francis; Evans, Cynthia; Allton, Judith; Fries, Marc; Righter, Kevin; Zolensky, Michael; Zeigler, Ryan
2016-07-01
Introduction: The Astromaterials Acquisition and Curation Office (henceforth referred to herein as NASA Curation Office) at NASA Johnson Space Center (JSC) is responsible for curating all of NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials", JSC is charged with "The curation of all extraterrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including "…documentation, preservation, preparation, and distribution of samples for research, education, and public outreach." Here we describe some of the ongoing efforts to ensure that the future activities of the NASA Curation Office are working to-wards a state of maximum proficiency. Founding Principle: Curatorial activities began at JSC (Manned Spacecraft Center before 1973) as soon as design and construction planning for the Lunar Receiving Laboratory (LRL) began in 1964 [1], not with the return of the Apollo samples in 1969, nor with the completion of the LRL in 1967. This practice has since proven that curation begins as soon as a sample return mission is conceived, and this founding principle continues to return dividends today [e.g., 2]. The Next Decade: Part of the curation process is planning for the future, and we refer to these planning efforts as "advanced curation" [3]. Advanced Curation is tasked with developing procedures, technology, and data sets necessary for curating new types of collections as envisioned by NASA exploration goals. We are (and have been) planning for future curation, including cold curation, extended curation of ices and volatiles, curation of samples with special chemical considerations such as perchlorate-rich samples, curation of organically- and biologically-sensitive samples, and the use of minimally invasive analytical techniques (e.g., micro-CT, [4]) to characterize samples. These efforts will be useful for Mars Sample Return, Lunar South Pole-Aitken Basin Sample Return, and Comet Surface Sample Return, all of which were named in the NRC Planetary Science Decadal Survey 2013-2022. We are fully committed to pushing the boundaries of curation protocol as humans continue to push the boundaries of space exploration and sample return. However, to improve our ability to curate astromaterials collections of the future and to provide maximum protection to any returned samples, it is imperative that curation involvement commences at the time of mission conception. When curation involvement is at the ground floor of mission planning, it provides a mechanism by which the samples can be protected against project-level decisions that could undermine the scientific value of the re-turned samples. A notable example of one of the bene-fits of early curation involvement in mission planning is in the acquisition of contamination knowledge (CK). CK capture strategies are designed during the initial planning stages of a sample return mission, and they are to be implemented during all phases of the mission from assembly, test, and launch operations (ATLO), through cruise and mission operations, to the point of preliminary examination after Earth return. CK is captured by witness materials and coupons exposed to the contamination environment in the assembly labs and on the space craft during launch, cruise, and operations. These materials, along with any procedural blanks and returned flight-hardware, represent our CK capture for the returned samples and serves as a baseline from which analytical results can be vetted. Collection of CK is a critical part of being able to conduct and interpret data from organic geochemistry and biochemistry investigations of returned samples. The CK samples from a given mission are treated as part of the sample collection of that mission, hence they are part of the permanent archive that is maintained by the NASA curation Office. We are in the midst of collecting witness plates and coupons for the OSIRIS-REx mission, and we are in the planning stages for similar activities for the Mars 2020 rover mission, which is going to be the first step in a multi-stage campaign to return martian samples to Earth. Concluding Remarks: The return of every extraterrestrial sample is a scientific investment, and the CK samples and any procedural blanks represent an insurance policy against imperfections in the sample-collection and sample-return process. The curation facilities and personnel are the primary managers of that investment, and the scientific community, at large, is the beneficiary. The NASA Curation Office at JSC has the assigned task of maintaining the long-term integrity of all of NASA's astromaterials and ensuring that the samples are distributed for scientific study in a fair, timely, and responsible manner. It is only through this openness and global collaboration in the study of astromaterials that the return on our scientific investments can be maximized. For information on requesting samples and becoming part of the global study of astromaterials, please visit curator.jsc.nasa.gov References: [1] Mangus, S. & Larsen, W. (2004) NASA/CR-2004-208938, NASA, Washington, DC. [2] Allen, C. et al., (2011) Chemie Der Erde-Geochemistry, 71, 1-20. [3] McCubbin, F.M. et al., (2016) 47th LPSC #2668. [4] Zeigler, R.A. et al., (2014) 45th LPSC #2665.
A genome-scale metabolic reconstruction of Pseudomonas putida KT2440: iJN746 as a cell factory.
Nogales, Juan; Palsson, Bernhard Ø; Thiele, Ines
2008-09-16
Pseudomonas putida is the best studied pollutant degradative bacteria and is harnessed by industrial biotechnology to synthesize fine chemicals. Since the publication of P. putida KT2440's genome, some in silico analyses of its metabolic and biotechnology capacities have been published. However, global understanding of the capabilities of P. putida KT2440 requires the construction of a metabolic model that enables the integration of classical experimental data along with genomic and high-throughput data. The constraint-based reconstruction and analysis (COBRA) approach has been successfully used to build and analyze in silico genome-scale metabolic reconstructions. We present a genome-scale reconstruction of P. putida KT2440's metabolism, iJN746, which was constructed based on genomic, biochemical, and physiological information. This manually-curated reconstruction accounts for 746 genes, 950 reactions, and 911 metabolites. iJN746 captures biotechnologically relevant pathways, including polyhydroxyalkanoate synthesis and catabolic pathways of aromatic compounds (e.g., toluene, benzoate, phenylacetate, nicotinate), not described in other metabolic reconstructions or biochemical databases. The predictive potential of iJN746 was validated using experimental data including growth performance and gene deletion studies. Furthermore, in silico growth on toluene was found to be oxygen-limited, suggesting the existence of oxygen-efficient pathways not yet annotated in P. putida's genome. Moreover, we evaluated the production efficiency of polyhydroxyalkanoates from various carbon sources and found fatty acids as the most prominent candidates, as expected. Here we presented the first genome-scale reconstruction of P. putida, a biotechnologically interesting all-surrounder. Taken together, this work illustrates the utility of iJN746 as i) a knowledge-base, ii) a discovery tool, and iii) an engineering platform to explore P. putida's potential in bioremediation and bioplastic production.
Leigh, Nicholas D; O'Neill, Rachel E; Du, Wei; Chen, Chuan; Qiu, Jingxin; Ashwell, Jonathan D; McCarthy, Philip L; Chen, George L; Cao, Xuefang
2017-07-01
Allogeneic hematopoietic cell transplantation (allo-HCT) is a potentially curative treatment for hematologic and immunologic diseases. However, graft-versus-host disease (GVHD) may develop when donor-derived T cells recognize and damage genetically distinct normal host tissues. In addition to TCR signaling, costimulatory pathways are involved in T cell activation. CD27 is a TNFR family member expressed on T cells, and its ligand, CD70, is expressed on APCs. The CD27/CD70 costimulatory pathway was shown to be critical for T cell function and survival in viral infection models. However, the role of this pathway in allo-HCT is previously unknown. In this study, we have examined its contribution in GVHD pathogenesis. Surprisingly, Ab blockade of CD70 after allo-HCT significantly increases GVHD. Interestingly, whereas donor T cell- or bone marrow-derived CD70 plays no role in GVHD, host-derived CD70 inhibits GVHD as CD70 -/- hosts show significantly increased GVHD. This is evidenced by reduced survival, more severe weight loss, and increased histopathologic damage compared with wild-type hosts. In addition, CD70 -/- hosts have higher levels of proinflammatory cytokines TNF-α, IFN-γ, IL-2, and IL-17. Moreover, accumulation of donor CD4 + and CD8 + effector T cells is increased in CD70 -/- versus wild-type hosts. Mechanistic analyses suggest that CD70 expressed by host hematopoietic cells is involved in the control of alloreactive T cell apoptosis and expansion. Together, our findings demonstrate that host CD70 serves as a unique negative regulator of allogeneic T cell response by contributing to donor T cell apoptosis and inhibiting expansion of donor effector T cells. Copyright © 2017 by The American Association of Immunologists, Inc.
NASA Astrophysics Data System (ADS)
Hou, C. Y.; Dattore, R.; Peng, G. S.
2014-12-01
The National Center for Atmospheric Research's Global Climate Four-Dimensional Data Assimilation (CFDDA) Hourly 40km Reanalysis dataset is a dynamically downscaled dataset with high temporal and spatial resolution. The dataset contains three-dimensional hourly analyses in netCDF format for the global atmospheric state from 1985 to 2005 on a 40km horizontal grid (0.4°grid increment) with 28 vertical levels, providing good representation of local forcing and diurnal variation of processes in the planetary boundary layer. This project aimed to make the dataset publicly available, accessible, and usable in order to provide a unique resource to allow and promote studies of new climate characteristics. When the curation project started, it had been five years since the data files were generated. Also, although the Principal Investigator (PI) had generated a user document at the end of the project in 2009, the document had not been maintained. Furthermore, the PI had moved to a new institution, and the remaining team members were reassigned to other projects. These factors made data curation in the areas of verifying data quality, harvest metadata descriptions, documenting provenance information especially challenging. As a result, the project's curation process found that: Data curator's skill and knowledge helped make decisions, such as file format and structure and workflow documentation, that had significant, positive impact on the ease of the dataset's management and long term preservation. Use of data curation tools, such as the Data Curation Profiles Toolkit's guidelines, revealed important information for promoting the data's usability and enhancing preservation planning. Involving data curators during each stage of the data curation life cycle instead of at the end could improve the curation process' efficiency. Overall, the project showed that proper resources invested in the curation process would give datasets the best chance to fulfill their potential to help with new climate pattern discovery.
Yu, Sheng; Liao, Katherine P; Shaw, Stanley Y; Gainer, Vivian S; Churchill, Susanne E; Szolovits, Peter; Murphy, Shawn N; Kohane, Isaac S; Cai, Tianxi
2015-09-01
Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype. The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable. The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs and Increasing Value
NASA Astrophysics Data System (ADS)
Myers, J.; Hedstrom, M.; Plale, B. A.; Kumar, P.; McDonald, R.; Kooper, R.; Marini, L.; Kouper, I.; Chandrasekar, K.
2013-12-01
What if everything that researchers know about their data, and everything their applications know, were directly available to curators? What if all the information that data consumers discover and infer about data were also available? What if curation and preservation activities occurred incrementally, during research projects instead of after they end, and could be leveraged to make it easier to manage research data from the moment of its creation? These are questions that the Sustainable Environments - Actionable Data (SEAD) project, funded as part of the National Science Foundation's DataNet partnership, was designed to answer. Data curation is challenging, but it is made more difficult by the historical separation of data production, data use, and formal curation activities across organizations, locations, and applications, and across time. Modern computing and networking technologies allow a much different approach in which data and metadata can easily flow between these activities throughout the data lifecycle, and in which heterogeneous and evolving data and metadata can be managed. Sustainability research, SEAD's initial focus area, is a clear example of an area where the nature of the research (cross-disciplinary, integrating heterogeneous data from independent sources, small teams, rapid evolution of sensing and analysis techniques) and the barriers and costs inherent in traditional methods have limited adoption of existing curation tools and techniques, to the detriment of overall scientific progress. To explore these ideas and create a sustainable curation capability for communities such as sustainability research, the SEAD team has developed and is now deploying an interacting set of open source data services that demonstrate this approach. These services provide end-to-end support for management of data during research projects; publication of that data into long-term archives; and integration of it into community networks of publications, research center activities, and synthesis efforts. They build on a flexible ';semantic content management' architecture and incorporate notions of ';active' and ';social' curation - continuous, incremental curation activities performed by the data producers (active) and the community (social) that are motivated by a range of direct benefits. Examples include the use of metadata (tags) to allow generation of custom geospatial maps, automated metadata extraction to generate rich data pages for known formats, and the use of information about data authorship to allow automatic updates of personal and project research profiles when data is published. In this presentation, we describe the core capabilities of SEAD's services and their application in sustainability research. We also outline the key features of the SEAD architecture - the use of global semantic identifiers, extensible data and metadata models, web services to manage context shifts, scalable cloud storage - and highlight how this approach is particularly well suited to extension by independent third parties. We conclude with thoughts on how this approach can be applied to challenging issues such as exposing ';dark' data and reducing duplicate creation of derived data products, and can provide a new level of analytics for community analysis and coordination.
NASA Technical Reports Server (NTRS)
Blumenfeld, E. H.; Evans, C. A.; Oshel, E. R.; Liddle, D. A.; Beaulieu, K.; Zeigler, R. A.; Righter, K.; Hanna, R. D.; Ketcham, R. A.
2014-01-01
Providing web-based data of complex and sensitive astromaterials (including meteorites and lunar samples) in novel formats enhances existing preliminary examination data on these samples and supports targeted sample requests and analyses. We have developed and tested a rigorous protocol for collecting highly detailed imagery of meteorites and complex lunar samples in non-contaminating environments. These data are reduced to create interactive 3D models of the samples. We intend to provide these data as they are acquired on NASA's Astromaterials Acquisition and Curation website at http://curator.jsc.nasa.gov/.
Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.
Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas
2017-01-21
We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.
Jayakrishnan, Thejus T; Nadeem, Hasan; Groeschl, Ryan T; George, Ben; Thomas, James P; Ritch, Paul S; Christians, Kathleen K; Tsai, Susan; Evans, Douglas B; Pappas, Sam G; Gamblin, T Clark; Turaga, Kiran K
2015-01-01
Objectives Laparoscopy is recommended to detect radiographically occult metastases in patients with pancreatic cancer before curative resection. This study was conducted to test the hypothesis that diagnostic laparoscopy (DL) is cost-effective in patients undergoing curative resection with or without neoadjuvant therapy (NAT). Methods Decision tree modelling compared routine DL with exploratory laparotomy (ExLap) at the time of curative resection in resectable cancer treated with surgery first, (SF) and borderline resectable cancer treated with NAT. Costs (US$) from the payer's perspective, quality-adjusted life months (QALMs) and incremental cost-effectiveness ratios (ICERs) were calculated. Base case estimates and multi-way sensitivity analyses were performed. Willingness to pay (WtP) was US$4166/QALM (or US$50 000/quality-adjusted life year). Results Base case costs were US$34 921 for ExLap and US$33 442 for DL in SF patients, and US$39 633 for ExLap and US$39 713 for DL in NAT patients. Routine DL is the dominant (preferred) strategy in both treatment types: it allows for cost reductions of US$10 695/QALM in SF and US$4158/QALM in NAT patients. Conclusions The present analysis supports the cost-effectiveness of routine DL before curative resection in pancreatic cancer patients treated with either SF or NAT. PMID:25123702
Large collections of chemical structures and associated experimental data are publicly available, and can be used to build robust QSAR models for applications in different fields. One common concern is the quality of both the chemical structure information and associated experime...
USDA-ARS?s Scientific Manuscript database
The use of swine in biomedical research has increased dramatically in the last decade. Diverse genomic- and proteomic databases have been developed to facilitate research using human and rodent models. Current porcine gene databases, however, lack the robust annotation to study pig models that are...
The importance of data curation on QSAR Modeling - PHYSPROP open data as a case study. (QSAR 2016)
During the last few decades many QSAR models and tools have been developed at the US EPA, including the widely used EPISuite. During this period the arsenal of computational capabilities supporting cheminformatics has broadened dramatically with multiple software packages. These ...
Increasing availability of large collections of chemical structures and associated experimental data provides an opportunity to build robust QSAR models for applications in different fields. One common concern is the quality of both the chemical structure information and associat...
How should the completeness and quality of curated nanomaterial data be evaluated?
NASA Astrophysics Data System (ADS)
Marchese Robinson, Richard L.; Lynch, Iseult; Peijnenburg, Willie; Rumble, John; Klaessig, Fred; Marquardt, Clarissa; Rauscher, Hubert; Puzyn, Tomasz; Purian, Ronit; Åberg, Christoffer; Karcher, Sandra; Vriens, Hanne; Hoet, Peter; Hoover, Mark D.; Hendren, Christine Ogilvie; Harper, Stacey L.
2016-05-01
Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated?Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated? Electronic supplementary information (ESI) available: (1) Detailed information regarding issues raised in the main text; (2) original survey responses. See DOI: 10.1039/c5nr08944a
This presentation will examine the impact of data quality on the construction of QSAR models being developed within the EPA‘s National Center for Computational Toxicology. We have developed a public-facing platform to provide access to predictive models. As part of the work we ha...
Steffensen, Jon Lund; Dufault-Thompson, Keith; Zhang, Ying
2018-01-01
The metabolism of individual organisms and biological communities can be viewed as a network of metabolites connected to each other through chemical reactions. In metabolic networks, chemical reactions transform reactants into products, thereby transferring elements between these metabolites. Knowledge of how elements are transferred through reactant/product pairs allows for the identification of primary compound connections through a metabolic network. However, such information is not readily available and is often challenging to obtain for large reaction databases or genome-scale metabolic models. In this study, a new algorithm was developed for automatically predicting the element-transferring reactant/product pairs using the limited information available in the standard representation of metabolic networks. The algorithm demonstrated high efficiency in analyzing large datasets and provided accurate predictions when benchmarked with manually curated data. Applying the algorithm to the visualization of metabolic networks highlighted pathways of primary reactant/product connections and provided an organized view of element-transferring biochemical transformations. The algorithm was implemented as a new function in the open source software package PSAMM in the release v0.30 (https://zhanglab.github.io/psamm/).
Astromaterials Curation Online Resources for Principal Investigators
NASA Technical Reports Server (NTRS)
Todd, Nancy S.; Zeigler, Ryan A.; Mueller, Lina
2017-01-01
The Astromaterials Acquisition and Curation office at NASA Johnson Space Center curates all of NASA's extraterrestrial samples, the most extensive set of astromaterials samples available to the research community worldwide. The office allocates 1500 individual samples to researchers and students each year and has served the planetary research community for 45+ years. The Astromaterials Curation office provides access to its sample data repository and digital resources to support the research needs of sample investigators and to aid in the selection and request of samples for scientific study. These resources can be found on the Astromaterials Acquisition and Curation website at https://curator.jsc.nasa.gov. To better serve our users, we have engaged in several activities to enhance the data available for astromaterials samples, to improve the accessibility and performance of the website, and to address user feedback. We havealso put plans in place for continuing improvements to our existing data products.
Accurate atom-mapping computation for biochemical reactions.
Latendresse, Mario; Malerich, Jeremiah P; Travers, Mike; Karp, Peter D
2012-11-26
The complete atom mapping of a chemical reaction is a bijection of the reactant atoms to the product atoms that specifies the terminus of each reactant atom. Atom mapping of biochemical reactions is useful for many applications of systems biology, in particular for metabolic engineering where synthesizing new biochemical pathways has to take into account for the number of carbon atoms from a source compound that are conserved in the synthesis of a target compound. Rapid, accurate computation of the atom mapping(s) of a biochemical reaction remains elusive despite significant work on this topic. In particular, past researchers did not validate the accuracy of mapping algorithms. We introduce a new method for computing atom mappings called the minimum weighted edit-distance (MWED) metric. The metric is based on bond propensity to react and computes biochemically valid atom mappings for a large percentage of biochemical reactions. MWED models can be formulated efficiently as Mixed-Integer Linear Programs (MILPs). We have demonstrated this approach on 7501 reactions of the MetaCyc database for which 87% of the models could be solved in less than 10 s. For 2.1% of the reactions, we found multiple optimal atom mappings. We show that the error rate is 0.9% (22 reactions) by comparing these atom mappings to 2446 atom mappings of the manually curated Kyoto Encyclopedia of Genes and Genomes (KEGG) RPAIR database. To our knowledge, our computational atom-mapping approach is the most accurate and among the fastest published to date. The atom-mapping data will be available in the MetaCyc database later in 2012; the atom-mapping software will be available within the Pathway Tools software later in 2012.
Galuppo, Maria; Rossi, Antonietta; Giacoppo, Sabrina; Pace, Simona; Bramanti, Placido; Sautebin, Lidia; Mazzon, Emanuela
2015-09-01
Traumatic spinal cord injury (SCI) represents one of the most disabling injuries of the human body causing temporary or permanent sensory and/or motor system deficit, particularly hind limb locomotor function impairment. At present, steroidal inflammatory drugs, in particular methylprednisolone sodium succinate (MPSS) are the first line choice treatment of acute SCI. Despite progress in pharmacological, surgical and rehabilitative treatment approaches, SCI still remains a very complex medical and psychological challenge, with no curative therapy available. The aim of the present study was to compare the efficacy of MPSS in respect to other GCs such as dexamethasone (Dex) and mometasone furoate (MF) in an in vitro suitable model of LPS-induced inflammation in J774 cells as well as in an in vivo experimental mouse SCI (compression model). In both the in vitro and in vivo experiments, MF resulted surprisingly more potent than Dex and MPSS. In detail, mice sacrificed seven days after induction of SCI trauma resulted not only in tissue damage, cellular infiltration, fibrosis, astrocyte activation, iNOS expression, extracellular signal regulated kinase 1/2 phosphorylation in injured tissue, poly (ADP-ribose) polymerase 1 (PARP-1) activation but also apoptosis (Bax and Bcl-2 expression). All three GCs demonstrated the ability to modulate inflammatory, oxidative as well as apoptotic pathways, but MF demonstrated the best efficacy, while Dex and MPSS showed alternative potency with a different degree of protection. Therefore, we can conclude that MF is the best candidate for post-traumatic chronic treatment, since it ameliorates different molecular pathways involved in the damage's propagation to the surrounding areas of the injured spinal cord. Copyright © 2015 Elsevier Ltd. All rights reserved.
Dahdul, Wasila M; Balhoff, James P; Engeman, Jeffrey; Grande, Terry; Hilton, Eric J; Kothari, Cartik; Lapp, Hilmar; Lundberg, John G; Midford, Peter E; Vision, Todd J; Westerfield, Monte; Mabee, Paula M
2010-05-20
The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.
How should the completeness and quality of curated nanomaterial data be evaluated?†
Marchese Robinson, Richard L.; Lynch, Iseult; Peijnenburg, Willie; Rumble, John; Klaessig, Fred; Marquardt, Clarissa; Rauscher, Hubert; Puzyn, Tomasz; Purian, Ronit; Åberg, Christoffer; Karcher, Sandra; Vriens, Hanne; Hoet, Peter; Hoover, Mark D.; Hendren, Christine Ogilvie; Harper, Stacey L.
2016-01-01
Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials’ behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated? PMID:27143028
Human Prostate Cancer Hallmarks Map
Datta, Dipamoy; Aftabuddin, Md.; Gupta, Dinesh Kumar; Raha, Sanghamitra; Sen, Prosenjit
2016-01-01
Human prostate cancer is a complex heterogeneous disease that mainly affects elder male population of the western world with a high rate of mortality. Acquisitions of diverse sets of hallmark capabilities along with an aberrant functioning of androgen receptor signaling are the central driving forces behind prostatic tumorigenesis and its transition into metastatic castration resistant disease. These hallmark capabilities arise due to an intense orchestration of several crucial factors, including deregulation of vital cell physiological processes, inactivation of tumor suppressive activity and disruption of prostate gland specific cellular homeostasis. The molecular complexity and redundancy of oncoproteins signaling in prostate cancer demands for concurrent inhibition of multiple hallmark associated pathways. By an extensive manual curation of the published biomedical literature, we have developed Human Prostate Cancer Hallmarks Map (HPCHM), an onco-functional atlas of human prostate cancer associated signaling and events. It explores molecular architecture of prostate cancer signaling at various levels, namely key protein components, molecular connectivity map, oncogenic signaling pathway map, pathway based functional connectivity map etc. Here, we briefly represent the systems level understanding of the molecular mechanisms associated with prostate tumorigenesis by considering each and individual molecular and cell biological events of this disease process. PMID:27476486
Dal Pra, Alan; Locke, Jennifer A.; Borst, Gerben; Supiot, Stephane; Bristow, Robert G.
2016-01-01
Radiation therapy (RT) is one of the mainstay treatments for prostate cancer (PCa). The potentially curative approaches can provide satisfactory results for many patients with non-metastatic PCa; however, a considerable number of individuals may present disease recurrence and die from the disease. Exploiting the rich molecular biology of PCa will provide insights into how the most resistant tumor cells can be eradicated to improve treatment outcomes. Important for this biology-driven individualized treatment is a robust selection procedure. The development of predictive biomarkers for RT efficacy is therefore of utmost importance for a clinically exploitable strategy to achieve tumor-specific radiosensitization. This review highlights the current status and possible opportunities in the modulation of four key processes to enhance radiation response in PCa by targeting the: (1) androgen signaling pathway; (2) hypoxic tumor cells and regions; (3) DNA damage response (DDR) pathway; and (4) abnormal extra-/intracell signaling pathways. In addition, we discuss how and which patients should be selected for biomarker-based clinical trials exploiting and validating these targeted treatment strategies with precision RT to improve cure rates in non-indolent, localized PCa. PMID:26909338
Freytag, Saskia; Burgess, Rosemary; Oliver, Karen L; Bahlo, Melanie
2017-06-08
The pathogenesis of neurological and mental health disorders often involves multiple genes, complex interactions, as well as brain- and development-specific biological mechanisms. These characteristics make identification of disease genes for such disorders challenging, as conventional prioritisation tools are not specifically tailored to deal with the complexity of the human brain. Thus, we developed a novel web-application-brain-coX-that offers gene prioritisation with accompanying visualisations based on seven gene expression datasets in the post-mortem human brain, the largest such resource ever assembled. We tested whether our tool can correctly prioritise known genes from 37 brain-specific KEGG pathways and 17 psychiatric conditions. We achieved average sensitivity of nearly 50%, at the same time reaching a specificity of approximately 75%. We also compared brain-coX's performance to that of its main competitors, Endeavour and ToppGene, focusing on the ability to discover novel associations. Using a subset of the curated SFARI autism gene collection we show that brain-coX's prioritisations are most similar to SFARI's own curated gene classifications. brain-coX is the first prioritisation and visualisation web-tool targeted to the human brain and can be freely accessed via http://shiny.bioinf.wehi.edu.au/freytag.s/ .
Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.
Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J
2017-01-01
The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.
The Astromaterials X-Ray Computed Tomography Laboratory at Johnson Space Center
NASA Astrophysics Data System (ADS)
Zeigler, R. A.; Blumenfeld, E. H.; Srinivasan, P.; McCubbin, F. M.; Evans, C. A.
2018-04-01
The Astromaterials Curation Office has recently begun incorporating X-ray CT data into the curation processes for lunar and meteorite samples, and long-term curation of that data and serving it to the public represent significant technical challenges.
Lewis, Cara C; Klasnja, Predrag; Powell, Byron J; Lyon, Aaron R; Tuzzio, Leah; Jones, Salene; Walsh-Bailey, Callie; Weiner, Bryan
2018-01-01
The science of implementation has offered little toward understanding how different implementation strategies work. To improve outcomes of implementation efforts, the field needs precise, testable theories that describe the causal pathways through which implementation strategies function. In this perspective piece, we describe a four-step approach to developing causal pathway models for implementation strategies. First, it is important to ensure that implementation strategies are appropriately specified. Some strategies in published compilations are well defined but may not be specified in terms of its core component that can have a reliable and measureable impact. Second, linkages between strategies and mechanisms need to be generated. Existing compilations do not offer mechanisms by which strategies act, or the processes or events through which an implementation strategy operates to affect desired implementation outcomes. Third, it is critical to identify proximal and distal outcomes the strategy is theorized to impact, with the former being direct, measurable products of the strategy and the latter being one of eight implementation outcomes (1). Finally, articulating effect modifiers, like preconditions and moderators, allow for an understanding of where, when, and why strategies have an effect on outcomes of interest. We argue for greater precision in use of terms for factors implicated in implementation processes; development of guidelines for selecting research design and study plans that account for practical constructs and allow for the study of mechanisms; psychometrically strong and pragmatic measures of mechanisms; and more robust curation of evidence for knowledge transfer and use.
Modeling tandem AAG8-MEK inhibition in melanoma cells.
Sun, Bing; Kawahara, Masahiro; Nagamune, Teruyuki
2014-06-01
Drug resistance presents a challenge to the treatment of cancer patients, especially for melanomas, most of which are caused by the hyperactivation of MAPK signaling pathway. Innate or acquired drug-resistant relapse calls for the investigation of the resistant mechanisms and new anti-cancer drugs to provide implications for the ultimate goal of curative therapy. Aging-associated gene 8 (AAG8, encoded by the SIGMAR1 gene) is a chaperone protein profoundly elaborated in neurology. However, roles of AAG8 in carcinogenesis remain unclear. Herein, we discover AAG8 antagonists as new MEK inhibitors in melanoma cells and propose a novel drug combination strategy for melanoma therapy by presenting the experimental evidences. We report that specific antagonism of AAG8, efficiently suppresses melanoma cell growth and migration through, at least in part, the inactivation of the RAS-CRAF-MEK signaling pathway. We further demonstrate that melanoma cells that are resistant to AAG8 antagonist harbor refractory CRAF-MEK activity. MEK acts as a central mediator for anti-cancer effects and also for the resistance mechanism, leading to our proposal of tandem AAG8-MEK inhibition in melanoma cells. Combination of AAG8 antagonist and very low concentration of a MEK inhibitor synergistically restricts the growth of drug-resistant cells. These data collectively pinpoint AAG8 as a potential target and delineate a promising drug combination strategy for melanoma therapy. © 2014 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Benedict, K. K.; Lenhardt, W. C.; Young, J. W.; Gordon, L. C.; Hughes, S.; Santhana Vannan, S. K.
2017-12-01
The planning for and development of efficient workflows for the creation, reuse, sharing, documentation, publication and preservation of research data is a general challenge that research teams of all sizes face. In response to: requirements from funding agencies for full-lifecycle data management plans that will result in well documented, preserved, and shared research data products increasing requirements from publishers for shared data in conjunction with submitted papers interdisciplinary research team's needs for efficient data sharing within projects, and increasing reuse of research data for replication and new, unanticipated research, policy development, and public use alternative strategies to traditional data life cycle approaches must be developed and shared that enable research teams to meet these requirements while meeting the core science objectives of their projects within the available resources. In support of achieving these goals, the concept of Agile Data Curation has been developed in which there have been parallel activities in support of 1) identifying a set of shared values and principles that underlie the objectives of agile data curation, 2) soliciting case studies from the Earth science and other research communities that illustrate aspects of what the contributors consider agile data curation methods and practices, and 3) identifying or developing design patterns that are high-level abstractions from successful data curation practice that are related to common data curation problems for which common solution strategies may be employed. This paper provides a collection of case studies that have been contributed by the Earth science community, and an initial analysis of those case studies to map them to emerging shared data curation problems and their potential solutions. Following the initial analysis of these problems and potential solutions, existing design patterns from software engineering and related disciplines are identified as a starting point for the development of a catalog of data curation design patterns that may be reused in the design and execution of new data curation processes.
CARFMAP: A Curated Pathway Map of Cardiac Fibroblasts.
Nim, Hieu T; Furtado, Milena B; Costa, Mauro W; Kitano, Hiroaki; Rosenthal, Nadia A; Boyd, Sarah E
2015-01-01
The adult mammalian heart contains multiple cell types that work in unison under tightly regulated conditions to maintain homeostasis. Cardiac fibroblasts are a significant and unique population of non-muscle cells in the heart that have recently gained substantial interest in the cardiac biology community. To better understand this renaissance cell, it is essential to systematically survey what has been known in the literature about the cellular and molecular processes involved. We have built CARFMAP (http://visionet.erc.monash.edu.au/CARFMAP), an interactive cardiac fibroblast pathway map derived from the biomedical literature using a software-assisted manual data collection approach. CARFMAP is an information-rich interactive tool that enables cardiac biologists to explore the large body of literature in various creative ways. There is surprisingly little overlap between the cardiac fibroblast pathway map, a foreskin fibroblast pathway map, and a whole mouse organism signalling pathway map from the REACTOME database. Among the use cases of CARFMAP is a common task in our cardiac biology laboratory of identifying new genes that are (1) relevant to cardiac literature, and (2) differentially regulated in high-throughput assays. From the expression profiles of mouse cardiac and tail fibroblasts, we employed CARFMAP to characterise cardiac fibroblast pathways. Using CARFMAP in conjunction with transcriptomic data, we generated a stringent list of six genes that would not have been singled out using bioinformatics analyses alone. Experimental validation showed that five genes (Mmp3, Il6, Edn1, Pdgfc and Fgf10) are differentially regulated in the cardiac fibroblast. CARFMAP is a powerful tool for systems analyses of cardiac fibroblasts, facilitating systems-level cardiovascular research.
Paull, Evan O; Carlin, Daniel E; Niepel, Mario; Sorger, Peter K; Haussler, David; Stuart, Joshua M
2013-11-01
Identifying the cellular wiring that connects genomic perturbations to transcriptional changes in cancer is essential to gain a mechanistic understanding of disease initiation, progression and ultimately to predict drug response. We have developed a method called Tied Diffusion Through Interacting Events (TieDIE) that uses a network diffusion approach to connect genomic perturbations to gene expression changes characteristic of cancer subtypes. The method computes a subnetwork of protein-protein interactions, predicted transcription factor-to-target connections and curated interactions from literature that connects genomic and transcriptomic perturbations. Application of TieDIE to The Cancer Genome Atlas and a breast cancer cell line dataset identified key signaling pathways, with examples impinging on MYC activity. Interlinking genes are predicted to correspond to essential components of cancer signaling and may provide a mechanistic explanation of tumor character and suggest subtype-specific drug targets. Software is available from the Stuart lab's wiki: https://sysbiowiki.soe.ucsc.edu/tiedie. jstuart@ucsc.edu. Supplementary data are available at Bioinformatics online.
The effect of economic development on population health: a review of the empirical evidence.
Lange, Simon; Vollmer, Sebastian
2017-01-01
Economic growth is considered an important determinant of population health. Relevant studies investigating the effect of economic growth on health outcomes were identified from Google Scholar and PubMed searches in economics and medical journals. Additional resources generated through economic growth are potentially useful for improving population health. The empirical evidence on the aggregate effect of economic growth on population health is rather mixed and inconclusive. The causal pathways from economic growth to population health are crucial and failure or success in completing the pathways explains differences in empirical findings. Future research should investigate how additional resources can more effectively reach those in need and how additional resources can be used more efficiently. It is particularly relevant to understand why preventive health care in developing countries is very price elastic whereas curative health care is very health inelastic and how this understanding can inform public health policy. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
McDonald, Paige Green; O’Connell, Mary; Lutgendorf, Susan K.
2013-01-01
This article introduces the supplemental issue of “Cancer, Brain, Behavior, and Immunity” and outlines important discoveries, paradigm shifts, and methodological innovations that have emerged in the past decade to advance mechanistic and translational understanding of biobehavioral influences on tumor biology, cancer treatment-related sequelae, and cancer outcomes. We offer a heuristic framework for research on biobehavioral pathways in cancer. The shifting survivorship landscape is highlighted and we propose that the changing demographics suggest prudent adoption of a life course perspective of cancer and cancer survivorship. We note opportunities for psychoneuroimmunology (PNI) research to ameliorate the long-term, unintended consequences of aggressive curative intent and call attention to the critical role of reciprocal translational pathways between animal and human studies. Lastly, we briefly summarize the articles included in this compilation and offer our perspectives on future research directions. HighlightsThis article introduces the National Cancer Institute sponsored special issue Cancer, Brain, Behavior, and Immunity and highlights the last decade of PNI-cancer research. PMID:23333846
Functional wiring of the yeast kinome revealed by global analysis of genetic network motifs
Sharifpoor, Sara; van Dyk, Dewald; Costanzo, Michael; Baryshnikova, Anastasia; Friesen, Helena; Douglas, Alison C.; Youn, Ji-Young; VanderSluis, Benjamin; Myers, Chad L.; Papp, Balázs; Boone, Charles; Andrews, Brenda J.
2012-01-01
A combinatorial genetic perturbation strategy was applied to interrogate the yeast kinome on a genome-wide scale. We assessed the global effects of gene overexpression or gene deletion to map an integrated genetic interaction network of synthetic dosage lethal (SDL) and loss-of-function genetic interactions (GIs) for 92 kinases, producing a meta-network of 8700 GIs enriched for pathways known to be regulated by cognate kinases. Kinases most sensitive to dosage perturbations had constitutive cell cycle or cell polarity functions under standard growth conditions. Condition-specific screens confirmed that the spectrum of kinase dosage interactions can be expanded substantially in activating conditions. An integrated network composed of systematic SDL, negative and positive loss-of-function GIs, and literature-curated kinase–substrate interactions revealed kinase-dependent regulatory motifs predictive of novel gene-specific phenotypes. Our study provides a valuable resource to unravel novel functional relationships and pathways regulated by kinases and outlines a general strategy for deciphering mutant phenotypes from large-scale GI networks. PMID:22282571
Unresolved questions associated with the management of ventricular pre-excitation syndrome.
Brembilla-Perrot, Béatrice; Girerd, Nicolas; Sellal, Jean-Marc
2018-05-13
many recent recommendations concern the management of pre-excitation syndrome. In clinical practice, they are sometimes difficult to use. The purpose of the authors was to discuss the main problems associated with this management. This article is protected by copyright. All rights reserved Three problems are encountered: 1) the reality of the absence of symptoms or the interpretation of atypical symptoms, 2) the electrocardiographic diagnosis of pre-excitation syndrome that can be missed and 3) the exact electrophysiological protocol and its interpretation used for the evaluation of the prognosis Because of significant progress largely related to the development of curative treatment, it seems easy to propose ablation in many patients despite the related risks of invasive studies and to minimize the invasive risk by only performing ablation for patients with at-risk pathways. However, there is a low risk of spontaneous events in truly asymptomatic patients and the indication of accessory pathway ablation should be discussed case by case. This article is protected by copyright. All rights reserved.
Murugesan, G S; Sathishkumar, M; Jayabalan, R; Binupriya, A R; Swaminathan, K; Yun, S E
2009-04-01
Kombucha tea (KT) is sugared black tea fermented with a symbiotic culture of acetic acid bacteria and yeasts, which is said to be tea fungus. KT is claimed to have various beneficial effects on human health, but there is very little scientific evidence available in the literature. In the present study, KT along with black tea (BT) and black tea manufactured with tea fungus enzymes (enzyme-processed tea, ET) was evaluated for hepatoprotective and curative properties against CCl4-induced toxicity, using male albino rats as an experimental model by analyzing aspartate transaminase, alanine transaminase, and alkaline phosphatase in plasma and malondialdehyde content in plasma and liver tissues. Histopathological analysis of liver tissue was also included. Results showed that BT, ET, and KT have the potential to revert the CCl4-induced hepatotoxicity. Among the three types of teas tried, KT was found to be more efficient than BT and ET. Antioxidant molecules produced during the fermentation period could be the reason for the efficient hepatoprotective and curative properties of KT against CCI4-induced hepatotoxicity.
Site-based data curation based on hot spring geobiology
Palmer, Carole L.; Thomer, Andrea K.; Baker, Karen S.; Wickett, Karen M.; Hendrix, Christie L.; Rodman, Ann; Sigler, Stacey; Fouke, Bruce W.
2017-01-01
Site-Based Data Curation (SBDC) is an approach to managing research data that prioritizes sharing and reuse of data collected at scientifically significant sites. The SBDC framework is based on geobiology research at natural hot spring sites in Yellowstone National Park as an exemplar case of high value field data in contemporary, cross-disciplinary earth systems science. Through stakeholder analysis and investigation of data artifacts, we determined that meaningful and valid reuse of digital hot spring data requires systematic documentation of sampling processes and particular contextual information about the site of data collection. We propose a Minimum Information Framework for recording the necessary metadata on sampling locations, with anchor measurements and description of the hot spring vent distinct from the outflow system, and multi-scale field photography to capture vital information about hot spring structures. The SBDC framework can serve as a global model for the collection and description of hot spring systems field data that can be readily adapted for application to the curation of data from other kinds scientifically significant sites. PMID:28253269
Natural Language Processing in aid of FlyBase curators
Karamanis, Nikiforos; Seal, Ruth; Lewin, Ian; McQuilton, Peter; Vlachos, Andreas; Gasperin, Caroline; Drysdale, Rachel; Briscoe, Ted
2008-01-01
Background Despite increasing interest in applying Natural Language Processing (NLP) to biomedical text, whether this technology can facilitate tasks such as database curation remains unclear. Results PaperBrowser is the first NLP-powered interface that was developed under a user-centered approach to improve the way in which FlyBase curators navigate an article. In this paper, we first discuss how observing curators at work informed the design and evaluation of PaperBrowser. Then, we present how we appraise PaperBrowser's navigational functionalities in a user-based study using a text highlighting task and evaluation criteria of Human-Computer Interaction. Our results show that PaperBrowser reduces the amount of interactions between two highlighting events and therefore improves navigational efficiency by about 58% compared to the navigational mechanism that was previously available to the curators. Moreover, PaperBrowser is shown to provide curators with enhanced navigational utility by over 74% irrespective of the different ways in which they highlight text in the article. Conclusion We show that state-of-the-art performance in certain NLP tasks such as Named Entity Recognition and Anaphora Resolution can be combined with the navigational functionalities of PaperBrowser to support curation quite successfully. PMID:18410678
Text Mining to Support Gene Ontology Curation and Vice Versa.
Ruch, Patrick
2017-01-01
In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.
Stvilia, Besiki
2017-01-01
The importance of managing research data has been emphasized by the government, funding agencies, and scholarly communities. Increased access to research data increases the impact and efficiency of scientific activities and funding. Thus, many research institutions have established or plan to establish research data curation services as part of their Institutional Repositories (IRs). However, in order to design effective research data curation services in IRs, and to build active research data providers and user communities around those IRs, it is essential to study current data curation practices and provide rich descriptions of the sociotechnical factors and relationships shaping those practices. Based on 13 interviews with 15 IR staff members from 13 large research universities in the United States, this paper provides a rich, qualitative description of research data curation and use practices in IRs. In particular, the paper identifies data curation and use activities in IRs, as well as their structures, roles played, skills needed, contradictions and problems present, solutions sought, and workarounds applied. The paper can inform the development of best practice guides, infrastructure and service templates, as well as education in research data curation in Library and Information Science (LIS) schools. PMID:28301533
Lee, Dong Joon; Stvilia, Besiki
2017-01-01
The importance of managing research data has been emphasized by the government, funding agencies, and scholarly communities. Increased access to research data increases the impact and efficiency of scientific activities and funding. Thus, many research institutions have established or plan to establish research data curation services as part of their Institutional Repositories (IRs). However, in order to design effective research data curation services in IRs, and to build active research data providers and user communities around those IRs, it is essential to study current data curation practices and provide rich descriptions of the sociotechnical factors and relationships shaping those practices. Based on 13 interviews with 15 IR staff members from 13 large research universities in the United States, this paper provides a rich, qualitative description of research data curation and use practices in IRs. In particular, the paper identifies data curation and use activities in IRs, as well as their structures, roles played, skills needed, contradictions and problems present, solutions sought, and workarounds applied. The paper can inform the development of best practice guides, infrastructure and service templates, as well as education in research data curation in Library and Information Science (LIS) schools.
Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C; Hoeng, Julia
2015-01-01
With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com © The Author(s) 2015. Published by Oxford University Press.
Sharing behavioral data through a grid infrastructure using data standards.
Min, Hua; Ohira, Riki; Collins, Michael A; Bondy, Jessica; Avis, Nancy E; Tchuvatkina, Olga; Courtney, Paul K; Moser, Richard P; Shaikh, Abdul R; Hesse, Bradford W; Cooper, Mary; Reeves, Dianne; Lanese, Bob; Helba, Cindy; Miller, Suzanne M; Ross, Eric A
2014-01-01
In an effort to standardize behavioral measures and their data representation, the present study develops a methodology for incorporating measures found in the National Cancer Institute's (NCI) grid-enabled measures (GEM) portal, a repository for behavioral and social measures, into the cancer data standards registry and repository (caDSR). The methodology consists of four parts for curating GEM measures into the caDSR: (1) develop unified modeling language (UML) models for behavioral measures; (2) create common data elements (CDE) for UML components; (3) bind CDE with concepts from the NCI thesaurus; and (4) register CDE in the caDSR. UML models have been developed for four GEM measures, which have been registered in the caDSR as CDE. New behavioral concepts related to these measures have been created and incorporated into the NCI thesaurus. Best practices for representing measures using UML models have been utilized in the practice (eg, caDSR). One dataset based on a GEM-curated measure is available for use by other systems and users connected to the grid. Behavioral and population science data can be standardized by using and extending current standards. A new branch of CDE for behavioral science was developed for the caDSR. It expands the caDSR domain coverage beyond the clinical and biological areas. In addition, missing terms and concepts specific to the behavioral measures addressed in this paper were added to the NCI thesaurus. A methodology was developed and refined for curation of behavioral and population science data. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Law, MeiYee; Shaw, David R
2018-01-01
Mouse Genome Informatics (MGI, http://www.informatics.jax.org/ ) web resources provide free access to meticulously curated information about the laboratory mouse. MGI's primary goal is to help researchers investigate the genetic foundations of human diseases by translating information from mouse phenotypes and disease models studies to human systems. MGI provides comprehensive phenotypes for over 50,000 mutant alleles in mice and provides experimental model descriptions for over 1500 human diseases. Curated data from scientific publications are integrated with those from high-throughput phenotyping and gene expression centers. Data are standardized using defined, hierarchical vocabularies such as the Mammalian Phenotype (MP) Ontology, Mouse Developmental Anatomy and the Gene Ontologies (GO). This chapter introduces you to Gene and Allele Detail pages and provides step-by-step instructions for simple searches and those that take advantage of the breadth of MGI data integration.
Curating Blood: How Students' and Researchers' Drawings Bring Potential Phenomena to Light
ERIC Educational Resources Information Center
Hay, D. B.; Pitchford, S.
2016-01-01
This paper explores students and researchers drawings of white blood cell recruitment. The data combines interviews with exhibit of review-type academic images and analyses of student model-drawings. The analysis focuses on the material aspects of bioscientific data-making and we use the literature of concrete bioscience modelling to differentiate…
Ghosh, Sujoy; Vivar, Juan; Nelson, Christopher P; Willenborg, Christina; Segrè, Ayellet V; Mäkinen, Ville-Petteri; Nikpay, Majid; Erdmann, Jeannette; Blankenberg, Stefan; O'Donnell, Christopher; März, Winfried; Laaksonen, Reijo; Stewart, Alexandre FR; Epstein, Stephen E; Shah, Svati H; Granger, Christopher B; Hazen, Stanley L; Kathiresan, Sekar; Reilly, Muredach P; Yang, Xia; Quertermous, Thomas; Samani, Nilesh J; Schunkert, Heribert; Assimes, Themistocles L; McPherson, Ruth
2016-01-01
Objective Genome-wide association (GWA) studies have identified multiple genetic variants affecting the risk of coronary artery disease (CAD). However, individually these explain only a small fraction of the heritability of CAD and for most, the causal biological mechanisms remain unclear. We sought to obtain further insights into potential causal processes of CAD by integrating large-scale GWA data with expertly curated databases of core human pathways and functional networks. Approaches and Results Employing pathways (gene sets) from Reactome, we carried out a two-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CADGWAS data sets (9,889 cases/11,089 controls), nominally significant gene-sets were tested for replication in a meta-analysis of 9 additional studies (15,502 cases/55,730 controls) from the CARDIoGRAM Consortium. A total of 32 of 639 Reactome pathways tested showed convincing association with CAD (replication p<0.05). These pathways resided in 9 of 21 core biological processes represented in Reactome, and included pathways relevant to extracellular matrix integrity, innate immunity, axon guidance, and signaling by PDRF, NOTCH, and the TGF-β/SMAD receptor complex. Many of these pathways had strengths of association comparable to those observed in lipid transport pathways. Network analysis of unique genes within the replicated pathways further revealed several interconnected functional and topologically interacting modules representing novel associations (e.g. semaphorin regulated axonal guidance pathway) besides confirming known processes (lipid metabolism). The connectivity in the observed networks was statistically significant compared to random networks (p<0.001). Network centrality analysis (‘degree’ and ‘betweenness’) further identified genes (e.g. NCAM1, FYN, FURIN etc.) likely to play critical roles in the maintenance and functioning of several of the replicated pathways. Conclusions These findings provide novel insights into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD. PMID:25977570
A Genome-Scale Metabolic Reconstruction of Mycoplasma genitalium, iPS189
Suthers, Patrick F.; Dasika, Madhukar S.; Kumar, Vinay Satish; Denisov, Gennady; Glass, John I.; Maranas, Costas D.
2009-01-01
With a genome size of ∼580 kb and approximately 480 protein coding regions, Mycoplasma genitalium is one of the smallest known self-replicating organisms and, additionally, has extremely fastidious nutrient requirements. The reduced genomic content of M. genitalium has led researchers to suggest that the molecular assembly contained in this organism may be a close approximation to the minimal set of genes required for bacterial growth. Here, we introduce a systematic approach for the construction and curation of a genome-scale in silico metabolic model for M. genitalium. Key challenges included estimation of biomass composition, handling of enzymes with broad specificities, and the lack of a defined medium. Computational tools were subsequently employed to identify and resolve connectivity gaps in the model as well as growth prediction inconsistencies with gene essentiality experimental data. The curated model, M. genitalium iPS189 (262 reactions, 274 metabolites), is 87% accurate in recapitulating in vivo gene essentiality results for M. genitalium. Approaches and tools described herein provide a roadmap for the automated construction of in silico metabolic models of other organisms. PMID:19214212
Improving the Acquisition and Management of Sample Curation Data
NASA Technical Reports Server (NTRS)
Todd, Nancy S.; Evans, Cindy A.; Labasse, Dan
2011-01-01
This paper discusses the current sample documentation processes used during and after a mission, examines the challenges and special considerations needed for designing effective sample curation data systems, and looks at the results of a simulated sample result mission and the lessons learned from this simulation. In addition, it introduces a new data architecture for an integrated sample Curation data system being implemented at the NASA Astromaterials Acquisition and Curation department and discusses how it improves on existing data management systems.
Effects of lactulose and silymarin on liver enzymes in cirrhotic rats.
Ghobadi Pour, Mozhgan; Mirazi, Naser; Alaei, Hojjatollah; Moradkhani, Shirin; Rajaei, Ziba; Monsef Esfahani, Alireza
2017-05-01
Silymarin, a mixture of antihepatotoxic flavonolignans used in the treatment of liver diseases, and lactulose, a nonabsorbable synthetic disaccharide, were investigated to analyze their probable synergic and healing effects in a hepatic cirrhotic rat model. Liver damage was induced by the administration and subsequent withdrawal of thioacetamide. The significant decrease in liver enzymes and malondialdehyde levels confirmed the curative effects of silymarin and lactulose. In the silymarin + lactulose group, liver enzyme and malondialdehyde levels were significantly reduced compared with those in the thioacetamide group. All treatments led to liver regeneration and triggered enhanced regeneration. Silymarin and lactulose alone or in combination have potent curative effects and reduce thioacetamide-induced liver damage.
Chatterjee, Ankita; Kundu, Sudip
2015-01-01
Chlorophyll is one of the most important pigments present in green plants and rice is one of the major food crops consumed worldwide. We curated the existing genome scale metabolic model (GSM) of rice leaf by incorporating new compartment, reactions and transporters. We used this modified GSM to elucidate how the chlorophyll is synthesized in a leaf through a series of bio-chemical reactions spanned over different organelles using inorganic macronutrients and light energy. We predicted the essential reactions and the associated genes of chlorophyll synthesis and validated against the existing experimental evidences. Further, ammonia is known to be the preferred source of nitrogen in rice paddy fields. The ammonia entering into the plant is assimilated in the root and leaf. The focus of the present work is centered on rice leaf metabolism. We studied the relative importance of ammonia transporters through the chloroplast and the cytosol and their interlink with other intracellular transporters. Ammonia assimilation in the leaves takes place by the enzyme glutamine synthetase (GS) which is present in the cytosol (GS1) and chloroplast (GS2). Our results provided possible explanation why GS2 mutants show normal growth under minimum photorespiration and appear chlorotic when exposed to air. PMID:26443104
Linking microarray reporters with protein functions
Gaj, Stan; van Erk, Arie; van Haaften, Rachel IM; Evelo, Chris TA
2007-01-01
Background The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. Results This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Conclusion Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/. PMID:17897448
Johnson, Robin J.; Lay, Jean M.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; Murphy, Cynthia Grondin; Mattingly, Carolyn J.
2013-01-01
The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency. PMID:23613709
The Nanomaterial Data Curation Initiative (NDCI) explores the critical aspect of data curation within the development of informatics approaches to understanding nanomaterial behavior. Data repositories and tools for integrating and interrogating complex nanomaterial datasets are...
NASA Technical Reports Server (NTRS)
Todd, N. S.; Evans, C.
2015-01-01
The Astromaterials Acquisition and Curation Office at NASA's Johnson Space Center (JSC) is the designated facility for curating all of NASA's extraterrestrial samples. The suite of collections includes the lunar samples from the Apollo missions, cosmic dust particles falling into the Earth's atmosphere, meteorites collected in Antarctica, comet and interstellar dust particles from the Stardust mission, asteroid particles from the Japanese Hayabusa mission, and solar wind atoms collected during the Genesis mission. To support planetary science research on these samples, NASA's Astromaterials Curation Office hosts the Astromaterials Curation Digital Repository, which provides descriptions of the missions and collections, and critical information about each individual sample. Our office is implementing several informatics initiatives with the goal of better serving the planetary research community. One of these initiatives aims to increase the availability and discoverability of sample data and images through the use of a newly designed common architecture for Astromaterials Curation databases.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faria, Jose P.; Overbeek, Ross; Taylor, Ronald C.
Here, we introduce a manually constructed and curated regulatory network model that describes the current state of knowledge of transcriptional regulation of B. subtilis. The model corresponds to an updated and enlarged version of the regulatory model of central metabolism originally proposed in 2008. We extended the original network to the whole genome by integration of information from DBTBS, a compendium of regulatory data that includes promoters, transcription factors (TFs), binding sites, motifs and regulated operons. Additionally, we consolidated our network with all the information on regulation included in the SporeWeb and Subtiwiki community-curated resources on B. subtilis. Finally, wemore » reconciled our network with data from RegPrecise, which recently released their own less comprehensive reconstruction of the regulatory network for B. subtilis. Our model describes 275 regulators and their target genes, representing 30 different mechanisms of regulation such as TFs, RNA switches, Riboswitches and small regulatory RNAs. Overall, regulatory information is included in the model for approximately 2500 of the ~4200 genes in B. subtilis 168. In an effort to further expand our knowledge of B. subtilis regulation, we reconciled our model with expression data. For this process, we reconstructed the Atomic Regulons (ARs) for B. subtilis, which are the sets of genes that share the same “ON” and “OFF” gene expression profiles across multiple samples of experimental data. We show how atomic regulons for B. subtilis are able to capture many sets of genes corresponding to regulated operons in our manually curated network. Additionally, we demonstrate how atomic regulons can be used to help expand or validate the knowledge of the regulatory networks by looking at highly correlated genes in the ARs for which regulatory information is lacking. During this process, we were also able to infer novel stimuli for hypothetical genes by exploring the genome expression metadata relating to experimental conditions, gaining insights into novel biology.« less
Faria, Jose P.; Overbeek, Ross; Taylor, Ronald C.; ...
2016-03-18
Here, we introduce a manually constructed and curated regulatory network model that describes the current state of knowledge of transcriptional regulation of B. subtilis. The model corresponds to an updated and enlarged version of the regulatory model of central metabolism originally proposed in 2008. We extended the original network to the whole genome by integration of information from DBTBS, a compendium of regulatory data that includes promoters, transcription factors (TFs), binding sites, motifs and regulated operons. Additionally, we consolidated our network with all the information on regulation included in the SporeWeb and Subtiwiki community-curated resources on B. subtilis. Finally, wemore » reconciled our network with data from RegPrecise, which recently released their own less comprehensive reconstruction of the regulatory network for B. subtilis. Our model describes 275 regulators and their target genes, representing 30 different mechanisms of regulation such as TFs, RNA switches, Riboswitches and small regulatory RNAs. Overall, regulatory information is included in the model for approximately 2500 of the ~4200 genes in B. subtilis 168. In an effort to further expand our knowledge of B. subtilis regulation, we reconciled our model with expression data. For this process, we reconstructed the Atomic Regulons (ARs) for B. subtilis, which are the sets of genes that share the same “ON” and “OFF” gene expression profiles across multiple samples of experimental data. We show how atomic regulons for B. subtilis are able to capture many sets of genes corresponding to regulated operons in our manually curated network. Additionally, we demonstrate how atomic regulons can be used to help expand or validate the knowledge of the regulatory networks by looking at highly correlated genes in the ARs for which regulatory information is lacking. During this process, we were also able to infer novel stimuli for hypothetical genes by exploring the genome expression metadata relating to experimental conditions, gaining insights into novel biology.« less
Managing biological networks by using text mining and computer-aided curation
NASA Astrophysics Data System (ADS)
Yu, Seok Jong; Cho, Yongseong; Lee, Min-Ho; Lim, Jongtae; Yoo, Jaesoo
2015-11-01
In order to understand a biological mechanism in a cell, a researcher should collect a huge number of protein interactions with experimental data from experiments and the literature. Text mining systems that extract biological interactions from papers have been used to construct biological networks for a few decades. Even though the text mining of literature is necessary to construct a biological network, few systems with a text mining tool are available for biologists who want to construct their own biological networks. We have developed a biological network construction system called BioKnowledge Viewer that can generate a biological interaction network by using a text mining tool and biological taggers. It also Boolean simulation software to provide a biological modeling system to simulate the model that is made with the text mining tool. A user can download PubMed articles and construct a biological network by using the Multi-level Knowledge Emergence Model (KMEM), MetaMap, and A Biomedical Named Entity Recognizer (ABNER) as a text mining tool. To evaluate the system, we constructed an aging-related biological network that consist 9,415 nodes (genes) by using manual curation. With network analysis, we found that several genes, including JNK, AP-1, and BCL-2, were highly related in aging biological network. We provide a semi-automatic curation environment so that users can obtain a graph database for managing text mining results that are generated in the server system and can navigate the network with BioKnowledge Viewer, which is freely available at http://bioknowledgeviewer.kisti.re.kr.
Discerning the clinical relevance of biomarkers in early stage breast cancer.
Ballinger, Tarah J; Kassem, Nawal; Shen, Fei; Jiang, Guanglong; Smith, Mary Lou; Railey, Elda; Howell, John; White, Carol B; Schneider, Bryan P
2017-07-01
Prior data suggest that breast cancer patients accept significant toxicity for small benefit. It is unclear whether personalized estimations of risk or benefit likelihood that could be provided by biomarkers alter treatment decisions in the curative setting. A choice-based conjoint (CBC) survey was conducted in 417 HER2-negative breast cancer patients who received chemotherapy in the curative setting. The survey presented pairs of treatment choices derived from common taxane- and anthracycline-based regimens, varying in degree of benefit by risk of recurrence and in toxicity profile, including peripheral neuropathy (PN) and congestive heart failure (CHF). Hypothetical biomarkers shifting benefit and toxicity risk were modeled to determine whether this knowledge alters choice. Previously identified biomarkers were evaluated using this model. Based on CBC analysis, a non-anthracycline regimen was the most preferred. Patients with prior PN had a similar preference for a taxane regimen as those who were PN naïve, but more dramatically shifted preference away from taxanes when PN was described as severe/irreversible. When modeled after hypothetical biomarkers, as the likelihood of PN increased, the preference for taxane-containing regimens decreased; similarly, as the likelihood of CHF increased, the preference for anthracycline regimens decreased. When evaluating validated biomarkers for PN and CHF, this knowledge did alter regimen preference. Patients faced with multi-faceted decisions consider personal experience and perceived risk of recurrent disease. Biomarkers providing information on likelihood of toxicity risk do influence treatment choices, and patients may accept reduced benefit when faced with higher risk of toxicity in the curative setting.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karp, Peter D.
Pathway Tools is a systems-biology software package written by SRI International (SRI) that produces Pathway/Genome Databases (PGDBs) for organisms with a sequenced genome. Pathway Tools also provides a wide range of capabilities for analyzing predicted metabolic networks and user-generated omics data. More than 5,000 academic, industrial, and government groups have licensed Pathway Tools. This user community includes researchers at all three DOE bioenergy centers, as well as academic and industrial metabolic engineering (ME) groups. An integral part of the Pathway Tools software is MetaCyc, a large, multiorganism database of metabolic pathways and enzymes that SRI and its academic collaborators manuallymore » curate. This project included two main goals: I. Enhance the MetaCyc content of bioenergy-related enzymes and pathways. II. Develop computational tools for engineering metabolic pathways that satisfy specified design goals, in particular for bioenergy-related pathways. In part I, SRI proposed to significantly expand the coverage of bioenergy-related metabolic information in MetaCyc, followed by the generation of organism-specific PGDBs for all energy-relevant organisms sequenced at the DOE Joint Genome Institute (JGI). Part I objectives included: 1: Expand the content of MetaCyc to include bioenergy-related enzymes and pathways. 2: Enhance the Pathway Tools software to enable display of complex polymer degradation processes. 3: Create new PGDBs for the energy-related organisms sequenced by JGI, update existing PGDBs with new MetaCyc content, and make these data available to JBEI via the BioCyc website. In part II, SRI proposed to develop an efficient computational tool for the engineering of metabolic pathways. Part II objectives included: 4: Develop computational tools for generating metabolic pathways that satisfy specified design goals, enabling users to specify parameters such as starting and ending compounds, and preferred or disallowed intermediate compounds. The pathways were to be generated using metabolic reactions from a reference database (DB). 5: Develop computational tools for ranking the pathways generated in objective (4) according to their optimality. The ranking criteria include stoichiometric yield, the number and cost of additional inputs and the cofactor compounds required by the pathway, pathway length, and pathway energetics. 6: Develop tools for visualizing generated pathways to facilitate the evaluation of a large space of generated pathways.« less
Alasmary, Fatmah A S; Awaad, Amani S; Alafeefy, Ahmed M; El-Meligy, Reham M; Alqasoumi, Saleh I
2018-01-01
Two novel quinazoline derivatives named as; 3-[(4-hydroxy-3-methoxy-benzylidene)-amino]-2- p- tolyl-3 H -quinazolin-4-one ( 5 ) and 2- p -Tolyl-3-[3,4,5-trimethoxy-benzylidene-amino]-3 H -quinazolin-4-one ( 6 ) in addition to one acetamide derivative named as 2-(2-Hydroxycarbonylphenylamino)- N -(4-aminosulphonylphenyl) 11 were synthesized, and evaluated for their anti-ulcerogenic & Anti-Ulcerative colitis activities. All of the three compounds showed curative activity against acetic acid induced ulcer model at a dose of 50 mg/kg, they produced 65%, 85% & 57.74% curative ratio for compounds 5 , 6 & 11 respectively. The effect of the tested compounds 5 , 6 & 11 at dose 50 mg/kg were significantly (P < 0.01) more effective than dexamesathone (0.1 mg/kg) in reducing all parameters. Compounds showed curative activity of for peptic ulcer (induced by absolute alcohol (at a dose of 50 mg/kg, it produced Curative of control ulcer 56.00%, 61.70% & 87.1% for compounds 5 , 6 & 11 respectively at dose 50 mg/kg, while the standard drug (Omeprazole 20 mg/kg) produced 33.3%. In both tests, the activity of our target compounds were higher than the standard drugs used for treatment of peptic ulcer and ulcerative colitis. No side effects were reported on liver and kidney functions upon prolonged oral administration of this compounds.
Abdou, Rania H.; Saleh, Sherif Y.; Khalil, Waleed F.
2015-01-01
Background: Recently, many efforts have been made to discover new products of natural origin which can limit the xenobiotic-induced hepatic injury. Carbon tetrachloride (CCl4) is a highly toxic chemical that is widely used to study hepatotoxicity in animal models. Objective: The present study was conducted to investigate the curative and protective effects of Schinus terbenthifolius ethanolic extract against CCl4 -induced acute hepatotoxicity in rats. Materials and Methods: S. terbenthifolius extract was orally administered in a dose of 350 mg dried extract/kg b.wt. before and after intoxication with CCl4 for curative and protective experiments, respectively. A group of hepatotoxicity indicative enzymes, oxidant-antioxidant capacity, DNA oxidation, and apoptosis markers were measured. Results: CCl4 increased liver enzyme leakage, oxidative stress, hepatic apoptosis, DNA oxidation, and inflammatory markers. Administration of S. terebinthifolius, either before or after CCl4 intoxication, significantly decreased elevated serum liver enzymes and reinstated the antioxidant capacity. Interestingly, S. terebinthifolius extract inhibited hepatocyte apoptosis as revealed by approximately 20 times down-regulation in caspase-3 expression when compared to CCl4 untreated group. On the other hand, there was neither protective nor curative effect of S. terebinthifolius against DNA damage caused by CCl4. Conclusion: The present study suggests that S. terebinthifolius extract could be a substantially promising hepatoprotective agent against CCl4 toxic effects and may be against other hepatotoxic chemical or drugs. PMID:26109780
Using chemical organization theory for model checking
Kaleta, Christoph; Richter, Stephan; Dittrich, Peter
2009-01-01
Motivation: The increasing number and complexity of biomodels makes automatic procedures for checking the models' properties and quality necessary. Approaches like elementary mode analysis, flux balance analysis, deficiency analysis and chemical organization theory (OT) require only the stoichiometric structure of the reaction network for derivation of valuable information. In formalisms like Systems Biology Markup Language (SBML), however, information about the stoichiometric coefficients required for an analysis of chemical organizations can be hidden in kinetic laws. Results: First, we introduce an algorithm that uncovers stoichiometric information that might be hidden in the kinetic laws of a reaction network. This allows us to apply OT to SBML models using modifiers. Second, using the new algorithm, we performed a large-scale analysis of the 185 models contained in the manually curated BioModels Database. We found that for 41 models (22%) the set of organizations changes when modifiers are considered correctly. We discuss one of these models in detail (BIOMD149, a combined model of the ERK- and Wnt-signaling pathways), whose set of organizations drastically changes when modifiers are considered. Third, we found inconsistencies in 5 models (3%) and identified their characteristics. Compared with flux-based methods, OT is able to identify those species and reactions more accurately [in 26 cases (14%)] that can be present in a long-term simulation of the model. We conclude that our approach is a valuable tool that helps to improve the consistency of biomodels and their repositories. Availability: All data and a JAVA applet to check SBML-models is available from http://www.minet.uni-jena.de/csb/prj/ot/tools Contact: dittrich@minet.uni-jena.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19468053
DSSTox and Chemical Information Technologies in Support of PredictiveToxicology
The EPA NCCT Distributed Structure-Searchable Toxicity (DSSTox) Database project initially focused on the curation and publication of high-quality, standardized, chemical structure-annotated toxicity databases for use in structure-activity relationship (SAR) modeling. In recent y...
Zhang, Cheng; Wang, Ning; Tan, Hor-Yue; Guo, Wei; Li, Sha; Feng, Yibin
2018-05-01
Bearing in mind the doctrine of tumor angiogenesis hypothesized by Folkman several decades ago, the fundamental strategy for alleviating numerous cancer indications may be the strengthening application of notable antiangiogenic therapies to inhibit metastasis-related tumor growth. Under physiological conditions, vascular sprouting is a relatively infrequent event unless when specifically stimulated by pathogenic factors that contribute to the accumulation of angiogenic activators such as the vascular endothelial growth factor (VEGF) family and basic fibroblast growth factor (bFGF). Since VEGFs have been identified as the principal cytokine to initiate angiogenesis in tumor growth, synthetic VEGF-targeting medicines containing bevacizumab and sorafenib have been extensively used, but prominent side effects have concomitantly emerged. Traditional Chinese medicines (TCM)-derived agents with distinctive safety profiles have shown their multitarget curative potential by impairing angiogenic stimulatory signaling pathways directly or eliciting synergistically therapeutic effects with anti-angiogenic drugs mainly targeting VEGF-dependent pathways. This review aims to summarize ( a) the up-to-date understanding of the role of VEGF/VEGFR in correlation with proangiogenic mechanisms in various tissues and cells; ( b) the elaboration of antitumor angiogenesis mechanisms of 4 representative TCMs, including Salvia miltiorrhiza, Curcuma longa, ginsenosides, and Scutellaria baicalensis; and ( c) circumstantial clarification of TCM-driven therapeutic actions of suppressing tumor angiogenesis by targeting VEGF/VEGFRs pathway in recent years, based on network pharmacology.
Mason, Clifford W; Swaan, Peter W; Weiner, Carl P
2006-06-01
The transition from myometrial quiescence to activation is poorly understood, and the analysis of array data is limited by the available data mining tools. We applied functional analysis and logical operations along regulatory gene networks to identify molecular processes and pathways underlying quiescence and activation. We analyzed some 18,400 transcripts and variants in guinea pig myometrium at stages corresponding to quiescence and activation, and compared them to the nonpregnant (control) counterpart using a functional mapping tool, MetaCore (GeneGo, St Joseph, MI) to identify novel gene networks composed of biological pathways during mid (MP) and late (LP) pregnancy. Genes altered during quiescence and or activation were identified following gene specific comparisons with myometrium from nonpregnant animals, and then linked to curated pathways and formulated networks. The MP and LP networks were subtracted from each other to identify unique genomic events during those periods. For example, changes 2-fold or greater in genes mediating protein biosynthesis, programmed cell death, microtubule polymerization, and microtubule based movement were noted during the transition to LP. We describe a novel approach combining microarrays and genetic data to identify networks associated with normal myometrial events. The resulting insights help identify potential biomarkers and permit future targeted investigations of these pathways or networks to confirm or refute their importance.
The PDS4 Information Model and its Role in Agile Science Data Curation
NASA Astrophysics Data System (ADS)
Hughes, J. S.; Crichton, D.
2017-12-01
PDS4 is an information model-driven service architecture supporting the capture, management, distribution and integration of massive planetary science data captured in distributed data archives world-wide. The PDS4 Information Model (IM), the core element of the architecture, was developed using lessons learned from 20 years of archiving Planetary Science Data and best practices for information model development. The foundational principles were adopted from the Open Archival Information System (OAIS) Reference Model (ISO 14721), the Metadata Registry Specification (ISO/IEC 11179), and W3C XML (Extensible Markup Language) specifications. These provided respectively an object oriented model for archive information systems, a comprehensive schema for data dictionaries and hierarchical governance, and rules for rules for encoding documents electronically. The PDS4 Information model is unique in that it drives the PDS4 infrastructure by providing the representation of concepts and their relationships, constraints, rules, and operations; a sharable, stable, and organized set of information requirements; and machine parsable definitions that are suitable for configuring and generating code. This presentation will provide an over of the PDS4 Information Model and how it is being leveraged to develop and evolve the PDS4 infrastructure and enable agile curation of over 30 years of science data collected by the international Planetary Science community.
NASA Technical Reports Server (NTRS)
Fries, M. D.; Allen, C. C.; Calaway, M. J.; Evans, C. A.; Stansbery, E. K.
2015-01-01
Curation of NASA's astromaterials sample collections is a demanding and evolving activity that supports valuable science from NASA missions for generations, long after the samples are returned to Earth. For example, NASA continues to loan hundreds of Apollo program samples to investigators every year and those samples are often analyzed using instruments that did not exist at the time of the Apollo missions themselves. The samples are curated in a manner that minimizes overall contamination, enabling clean, new high-sensitivity measurements and new science results over 40 years after their return to Earth. As our exploration of the Solar System progresses, upcoming and future NASA sample return missions will return new samples with stringent contamination control, sample environmental control, and Planetary Protection requirements. Therefore, an essential element of a healthy astromaterials curation program is a research and development (R&D) effort that characterizes and employs new technologies to maintain current collections and enable new missions - an Advanced Curation effort. JSC's Astromaterials Acquisition & Curation Office is continually performing Advanced Curation research, identifying and defining knowledge gaps about research, development, and validation/verification topics that are critical to support current and future NASA astromaterials sample collections. The following are highlighted knowledge gaps and research opportunities.
Zebrafish Models of Prader-Willi Syndrome: Fast Track to Pharmacotherapeutics
Spikol, Emma D.; Laverriere, Caroline E.; Robnett, Maya; Carter, Gabriela; Wolfe, Erin; Glasgow, Eric
2016-01-01
Prader-Willi syndrome (PWS) is a rare genetic neurodevelopmental disorder characterized by an insatiable appetite, leading to chronic overeating and obesity. Additional features include short stature, intellectual disability, behavioral problems and incomplete sexual development. Although significant progress has been made in understanding the genetic basis of PWS, the mechanisms underlying the pathogenesis of the disorder remain poorly understood. Treatment for PWS consists mainly of palliative therapies; curative therapies are sorely needed. Zebrafish, Danio rerio, represent a promising way forward for elucidating physiological problems such as obesity and identifying new pharmacotherapeutic options for PWS. Over the last decade, an increased appreciation for the highly conserved biology among vertebrates and the ability to perform high-throughput drug screening has seen an explosion in the use of zebrafish for disease modeling and drug discovery. Here, we review recent advances in developing zebrafish models of human disease. Aspects of zebrafish genetics and physiology that are relevant to PWS will be discussed, and the advantages and disadvantages of zebrafish models will be contrasted with current animal models for this syndrome. Finally, we will present a paradigm for drug screening in zebrafish that is potentially the fastest route for identifying and delivering curative pharmacotherapies to PWS patients. PMID:27857842
NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases.
Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin; Senger, Philipp
2015-01-01
Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article's supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in collaboration with domain disease experts. We elucidate the step-by-step guidelines used to critically prioritize studies from public archives and their metadata curation and discuss the key challenges encountered. Curated metadata for Alzheimer's disease gene expression studies are available for download. Database URL: www.scai.fraunhofer.de/NeuroTransDB.html. © The Author(s) 2015. Published by Oxford University Press.
NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases
Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin
2015-01-01
Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article’s supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in collaboration with domain disease experts. We elucidate the step-by-step guidelines used to critically prioritize studies from public archives and their metadata curation and discuss the key challenges encountered. Curated metadata for Alzheimer’s disease gene expression studies are available for download. Database URL: www.scai.fraunhofer.de/NeuroTransDB.html PMID:26475471
Morbidity of curative cancer surgery and suicide risk.
Jayakrishnan, Thejus T; Sekigami, Yurie; Rajeev, Rahul; Gamblin, T Clark; Turaga, Kiran K
2017-11-01
Curative cancer operations lead to debility and loss of autonomy in a population vulnerable to suicide death. The extent to which operative intervention impacts suicide risk is not well studied. To examine the effects of morbidity of curative cancer surgeries and prognosis of disease on the risk of suicide in patients with solid tumors. Retrospective cohort study using Surveillance, Epidemiology, and End Results data from 2004 to 2011; multilevel systematic review. General US population. Participants were 482 781 patients diagnosed with malignant neoplasm between 2004 and 2011 who underwent curative cancer surgeries. Death by suicide or self-inflicted injury. Among 482 781 patients that underwent curative cancer surgery, 231 committed suicide (16.58/100 000 person-years [95% confidence interval, CI, 14.54-18.82]). Factors significantly associated with suicide risk included male sex (incidence rate [IR], 27.62; 95% CI, 23.82-31.86) and age >65 years (IR, 22.54; 95% CI, 18.84-26.76). When stratified by 30-day overall postoperative morbidity, a significantly higher incidence of suicide was found for high-morbidity surgeries (IR, 33.30; 95% CI, 26.50-41.33) vs moderate morbidity (IR, 24.27; 95% CI, 18.92-30.69) and low morbidity (IR, 9.81; 95% CI, 7.90-12.04). Unit increase in morbidity was significantly associated with death by suicide (odds ratio, 1.01; 95% CI, 1.00-1.03; P = .02) and decreased suicide-specific survival (hazards ratio, 1.02; 95% CI, 1.00-1.03, P = .01) in prognosis-adjusted models. In this sample of cancer patients in the Surveillance, Epidemiology, and End Results database, patients that undergo high-morbidity surgeries appear most vulnerable to death by suicide. The identification of this high-risk cohort should motivate health care providers and particularly surgeons to adopt screening measures during the postoperative follow-up period for these patients. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Blumenfeld, E. H.; Evans, C. A.; Oshel, E. R.; Liddle, D. A.; Beaulieu, K.; Zeigler, R. A.; Hanna, R. D.; Ketcham, R. A.
2015-01-01
Established contemporary conservation methods within the fields of Natural and Cultural Heritage encourage an interdisciplinary approach to preservation of heritage material (both tangible and intangible) that holds "Outstanding Universal Value" for our global community. NASA's lunar samples were acquired from the moon for the primary purpose of intensive scientific investigation. These samples, however, also invoke cultural significance, as evidenced by the millions of people per year that visit lunar displays in museums and heritage centers around the world. Being both scientifically and culturally significant, the lunar samples require a unique conservation approach. Government mandate dictates that NASA's Astromaterials Acquisition and Curation Office develop and maintain protocols for "documentation, preservation, preparation and distribution of samples for research, education and public outreach" for both current and future collections of astromaterials. Documentation, considered the first stage within the conservation methodology, has evolved many new techniques since curation protocols for the lunar samples were first implemented, and the development of new documentation strategies for current and future astromaterials is beneficial to keeping curation protocols up to date. We have developed and tested a comprehensive non-destructive documentation technique using high-resolution image-based 3D reconstruction and X-ray CT (XCT) data in order to create interactive 3D models of lunar samples that would ultimately be served to both researchers and the public. These data enhance preliminary scientific investigations including targeted sample requests, and also provide a new visual platform for the public to experience and interact with the lunar samples. We intend to serve these data as they are acquired on NASA's Astromaterials Acquisistion and Curation website at http://curator.jsc.nasa.gov/. Providing 3D interior and exterior documentation of astromaterial samples addresses the increasing demands for accessability to data and contemporary techniques for documentation, which can be realized for both current collections as well as future sample return missions.
MaizeGDB update: New tools, data, and interface for the maize model organism database
USDA-ARS?s Scientific Manuscript database
MaizeGDB is a highly curated, community-oriented database and informatics service to researchers focused on the crop plant and model organism Zea mays ssp. mays. Although some form of the maize community database has existed over the last 25 years, there have only been two major releases. In 1991, ...
Samal, Babru B; Waites, Cameron K; Almeida-Suhett, Camila; Li, Zheng; Marini, Ann M; Samal, Nihar R; Elkahloun, Abdel; Braga, Maria F M; Eiden, Lee E
2015-10-01
We have previously demonstrated that mild controlled cortical impact (mCCI) injury to rat cortex causes indirect, concussive injury to underlying hippocampus and other brain regions, providing a reproducible model for mild traumatic brain injury (mTBI) and its neurochemical, synaptic, and behavioral sequelae. Here, we extend a preliminary gene expression study of the hippocampus-specific events occurring after mCCI and identify 193 transcripts significantly upregulated, and 21 transcripts significantly downregulated, 24 h after mCCI. Fifty-three percent of genes altered by mCCI within 24 h of injury are predicted to be expressed only in the non-neuronal/glial cellular compartment, with only 13% predicted to be expressed only in neurons. The set of upregulated genes following mCCI was interrogated using Ingenuity Pathway Analysis (IPA) augmented with manual curation of the literature (190 transcripts accepted for analysis), revealing a core group of 15 first messengers, mostly inflammatory cytokines, predicted to account for >99% of the transcript upregulation occurring 24 h after mCCI. Convergent analysis of predicted transcription factors (TFs) regulating the mCCI target genes, carried out in IPA relative to the entire Affymetrix-curated transcriptome, revealed a high concordance with TFs regulated by the cohort of 15 cytokines/cytokine-like messengers independently accounting for upregulation of the mCCI transcript cohort. TFs predicted to regulate transcription of the 193-gene mCCI cohort also displayed a high degree of overlap with TFs predicted to regulate glia-, rather than neuron-specific genes in cortical tissue. We conclude that mCCI predominantly affects transcription of non-neuronal genes within the first 24 h after insult. This finding suggests that early non-neuronal events trigger later permanent neuronal changes after mTBI, and that early intervention after mTBI could potentially affect the neurochemical cascade leading to later reported synaptic and behavioral dysfunction.
The Library as Partner in University Data Curation: A Case Study in Collaboration
ERIC Educational Resources Information Center
Latham, Bethany; Poe, Jodi Welch
2012-01-01
Data curation is a concept with many facets. Curation goes beyond research-generated data, and its principles can support the preservation of institutions' historical data. Libraries are well-positioned to bring relevant expertise to such problems, especially those requiring collaboration, because of their experience as neutral caretakers and…
ERIC Educational Resources Information Center
Schiano, Deborah
2013-01-01
Curation: to gather, organize, and present resources in a way that meets information needs and interests, makes sense for virtual as well as physical resources. A Northern New Jersey middle school library made the decision to curate its physical resources according to the needs of its users, and, in so doing, created a shelving system that is,…
ERIC Educational Resources Information Center
Mallon, Melissa, Ed.
2012-01-01
In their Top Trends of 2012, the Association of College and Research Libraries (ACRL) named data curation as one of the issues to watch in academic libraries in the near future (ACRL, 2012, p. 312). Data curation can be summarized as "the active and ongoing management of data through its life cycle of interest and usefulness to scholarship,…
Health Care Reform and Concurrent Curative Care for Terminally Ill Children: A Policy Analysis
Lindley, Lisa C.
2012-01-01
Within the Patient Protection and Affordable Care Act of 2010 or health care reform, is a relatively small provision about concurrent curative care that significantly affects terminally ill children. Effective on March 23, 2010, terminally ill children, who are enrolled in a Medicaid or state Children’s Health Insurance Plans (CHIP) hospice benefit, may concurrently receive curative care related to their terminal health condition. The purpose of this article was to conduct a policy analysis of the concurrent curative care legislation by examining the intended goals of the policy to improve access to care and enhance quality of end of life care for terminally ill children. In addition, the policy analysis explored the political feasibility of implementing concurrent curative care at the state-level. Based on this policy analysis, the federal policy of concurrent curative care for children would generally achieve its intended goals. However, important policy omissions focus attention on the need for further federal end of life care legislation for children. These findings have implications nurses. PMID:22822304
NASA Technical Reports Server (NTRS)
Allton, J. H.; Burkett, P. J.
2011-01-01
NASA Johnson Space Center operates clean curation facilities for Apollo lunar, Antarctic meteorite, stratospheric cosmic dust, Stardust comet and Genesis solar wind samples. Each of these collections is curated separately due unique requirements. The purpose of this abstract is to highlight the technical tensions between providing particulate cleanliness and molecular cleanliness, illustrated using data from curation laboratories. Strict control of three components are required for curating samples cleanly: a clean environment; clean containers and tools that touch samples; and use of non-shedding materials of cleanable chemistry and smooth surface finish. This abstract focuses on environmental cleanliness and the technical tension between achieving particulate and molecular cleanliness. An environment in which a sample is manipulated or stored can be a room, an enclosed glovebox (or robotic isolation chamber) or an individual sample container.
Earley, Kirsty; Livingstone, Daniel; Rea, Paul M
2017-01-01
Collection preservation is essential for the cultural status of any city. However, presenting a collection publicly risks damage. Recently this drawback has been overcome by digital curation. Described here is a method of digitisation using photogrammetry and virtual reality software. Items were selected from the Royal College of Physicians and Surgeons of Glasgow archives, and implemented into an online learning module for the Open University. Images were processed via Agisoft Photoscan, Autodesk Memento, and Garden Gnome Object 2VR. Although problems arose due to specularity, 2VR digital models were developed for online viewing. Future research must minimise the difficulty of digitising specular objects.
Pitt, Catherine; Roberts, Bayard; Checchi, Francesco
2012-01-10
Where hard-to-access populations (such as those living in insecure areas) lack access to basic health services, relief agencies, donors, and ministries of health face a dilemma in selecting the most effective intervention strategy. This paper uses a decision mathematical model to estimate the relative effectiveness of two alternative strategies, mobile clinics and fixed community-based health services, for antibiotic treatment of childhood pneumonia, the world's leading cause of child mortality. A "Markov cycle tree" cohort model was developed in Excel with Visual Basic to compare the number of deaths from pneumonia in children aged 1 to 59 months expected under three scenarios: 1) No curative services available, 2) Curative services provided by a highly-skilled but intermittent mobile clinic, and 3) Curative services provided by a low-skilled community health post. Parameter values were informed by literature and expert interviews. Probabilistic sensitivity analyses were conducted for several plausible scenarios. We estimated median pneumonia-specific under-5 mortality rates of 0.51 (95% credible interval: 0.49 to 0.541) deaths per 10,000 child-days without treatment, 0.45 (95% CI: 0.43 to 0.48) with weekly mobile clinics, and 0.31 (95% CI: 0.29 to 0.32) with CHWs in fixed health posts. Sensitivity analyses found the fixed strategy superior, except when mobile clinics visited communities daily, where rates of care-seeking were substantially higher at mobile clinics than fixed posts, or where several variables simultaneously differed substantially from our baseline assumptions. Current evidence does not support the hypothesis that mobile clinics are more effective than CHWs. A CHW strategy therefore warrants consideration in high-mortality, hard-to-access areas. Uncertainty remains, and parameter values may vary across contexts, but the model allows preliminary findings to be updated as new or context-specific evidence becomes available. Decision analytic modelling can guide needed field-based research efforts in hard-to-access areas and offer evidence-based insights for decision-makers.
Powers, Christina M; Hoover, Mark D; Harper, Stacey L
2015-01-01
Summary The Nanomaterial Data Curation Initiative (NDCI), a project of the National Cancer Informatics Program Nanotechnology Working Group (NCIP NanoWG), explores the critical aspect of data curation within the development of informatics approaches to understanding nanomaterial behavior. Data repositories and tools for integrating and interrogating complex nanomaterial datasets are gaining widespread interest, with multiple projects now appearing in the US and the EU. Even in these early stages of development, a single common aspect shared across all nanoinformatics resources is that data must be curated into them. Through exploration of sub-topics related to all activities necessary to enable, execute, and improve the curation process, the NDCI will provide a substantive analysis of nanomaterial data curation itself, as well as a platform for multiple other important discussions to advance the field of nanoinformatics. This article outlines the NDCI project and lays the foundation for a series of papers on nanomaterial data curation. The NDCI purpose is to: 1) present and evaluate the current state of nanomaterial data curation across the field on multiple specific data curation topics, 2) propose ways to leverage and advance progress for both individual efforts and the nanomaterial data community as a whole, and 3) provide opportunities for similar publication series on the details of the interactive needs and workflows of data customers, data creators, and data analysts. Initial responses from stakeholder liaisons throughout the nanoinformatics community reveal a shared view that it will be critical to focus on integration of datasets with specific orientation toward the purposes for which the individual resources were created, as well as the purpose for integrating multiple resources. Early acknowledgement and undertaking of complex topics such as uncertainty, reproducibility, and interoperability is proposed as an important path to addressing key challenges within the nanomaterial community, such as reducing collateral negative impacts and decreasing the time from development to market for this new class of technologies. PMID:26425427
NASA Technical Reports Server (NTRS)
Calaway, Michael J.
2013-01-01
In preparation for OSIRIS-REx and other future sample return missions concerned with analyzing organics, we conducted an Organic Contamination Baseline Study for JSC Curation Labsoratories in FY12. For FY12 testing, organic baseline study focused only on molecular organic contamination in JSC curation gloveboxes: presumably future collections (i.e. Lunar, Mars, asteroid missions) would use isolation containment systems over only cleanrooms for primary sample storage. This decision was made due to limit historical data on curation gloveboxes, limited IR&D funds and Genesis routinely monitors organics in their ISO class 4 cleanrooms.
A semi-automated methodology for finding lipid-related GO terms.
Fan, Mengyuan; Low, Hong Sang; Wenk, Markus R; Wong, Limsoon
2014-01-01
Although semantic similarity in Gene Ontology (GO) and other approaches may be used to find similar GO terms, there is yet a method to systematically find a class of GO terms sharing a common property with high accuracy (e.g., involving human curation). We have developed a methodology to address this issue and applied it to identify lipid-related GO terms, owing to the important and varied roles of lipids in many biological processes. Our methodology finds lipid-related GO terms in a semi-automated manner, requiring only moderate manual curation. We first obtain a list of lipid-related gold-standard GO terms by keyword search and manual curation. Then, based on the hypothesis that co-annotated GO terms share similar properties, we develop a machine learning method that expands the list of lipid-related terms from the gold standard. Those terms predicted most likely to be lipid related are examined by a human curator following specific curation rules to confirm the class labels. The structure of GO is also exploited to help reduce the curation effort. The prediction and curation cycle is repeated until no further lipid-related term is found. Our approach has covered a high proportion, if not all, of lipid-related terms with relatively high efficiency. http://compbio.ddns.comp.nus.edu.sg/∼lipidgo. © The Author(s) 2014. Published by Oxford University Press.
Davis, Allan Peter; Wiegers, Thomas C.; Murphy, Cynthia G.; Mattingly, Carolyn J.
2011-01-01
The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and convert free-text information into a structured format using official nomenclature, integrating third party controlled vocabularies for chemicals, genes, diseases and organisms, and a novel controlled vocabulary for molecular interactions. Manual curation produces a robust, richly annotated dataset of highly accurate and detailed information. Currently, CTD describes over 349 000 molecular interactions between 6800 chemicals, 20 900 genes (for 330 organisms) and 4300 diseases that have been manually curated from over 25 400 peer-reviewed articles. This manually curated data are further integrated with other third party data (e.g. Gene Ontology, KEGG and Reactome annotations) to generate a wealth of toxicogenomic relationships. Here, we describe our approach to manual curation that uses a powerful and efficient paradigm involving mnemonic codes. This strategy allows biocurators to quickly capture detailed information from articles by generating simple statements using codes to represent the relationships between data types. The paradigm is versatile, expandable, and able to accommodate new data challenges that arise. We have incorporated this strategy into a web-based curation tool to further increase efficiency and productivity, implement quality control in real-time and accommodate biocurators working remotely. Database URL: http://ctd.mdibl.org PMID:21933848
Yu, Guohua; Zhang, Yanqiong; Ren, Weiqiong; Dong, Ling; Li, Junfang; Geng, Ya; Zhang, Yi; Li, Defeng; Xu, Haiyu; Yang, Hongjun
2017-01-01
For decades in China, the Yin-Huang-Qing-Fei capsule (YHQFC) has been widely used in the treatment of chronic bronchitis, with good curative effects. Owing to the complexity of traditional Chinese herbal formulas, the pharmacological mechanism of YHQFC remains unclear. To address this problem, a network pharmacology-based strategy was proposed in this study. At first, the putative target profile of YHQFC was predicted using MedChem Studio, based on structural and functional similarities of all available YHQFC components to the known drugs obtained from the DrugBank database. Then, an interaction network was constructed using links between putative YHQFC targets and known therapeutic targets of chronic bronchitis. Following the calculation of four topological features (degree, betweenness, closeness, and coreness) of each node in the network, 475 major putative targets of YHQFC and their topological importance were identified. In addition, a pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes pathway database indicated that the major putative targets of YHQFC are significantly associated with various pathways involved in anti-inflammation processes, immune responses, and pathological changes caused by asthma. More interestingly, eight major putative targets of YHQFC (interleukin [IL]-3, IL-4, IL-5, IL-10, IL-13, FCER1G, CCL11, and EPX) were demonstrated to be associated with the inflammatory process that occurs during the progression of asthma. Finally, a molecular docking simulation was performed and the results exhibited that 17 pairs of chemical components and candidate YHQFC targets involved in asthma pathway had strong binding efficiencies. In conclusion, this network pharmacology-based investigation revealed that YHQFC may attenuate the inflammatory reaction of chronic bronchitis by regulating its candidate targets, which may be implicated in the major pathological processes of the asthma pathway.
Yu, Guohua; Zhang, Yanqiong; Ren, Weiqiong; Dong, Ling; Li, Junfang; Geng, Ya; Zhang, Yi; Li, Defeng; Xu, Haiyu; Yang, Hongjun
2017-01-01
For decades in China, the Yin–Huang–Qing–Fei capsule (YHQFC) has been widely used in the treatment of chronic bronchitis, with good curative effects. Owing to the complexity of traditional Chinese herbal formulas, the pharmacological mechanism of YHQFC remains unclear. To address this problem, a network pharmacology-based strategy was proposed in this study. At first, the putative target profile of YHQFC was predicted using MedChem Studio, based on structural and functional similarities of all available YHQFC components to the known drugs obtained from the DrugBank database. Then, an interaction network was constructed using links between putative YHQFC targets and known therapeutic targets of chronic bronchitis. Following the calculation of four topological features (degree, betweenness, closeness, and coreness) of each node in the network, 475 major putative targets of YHQFC and their topological importance were identified. In addition, a pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes pathway database indicated that the major putative targets of YHQFC are significantly associated with various pathways involved in anti-inflammation processes, immune responses, and pathological changes caused by asthma. More interestingly, eight major putative targets of YHQFC (interleukin [IL]-3, IL-4, IL-5, IL-10, IL-13, FCER1G, CCL11, and EPX) were demonstrated to be associated with the inflammatory process that occurs during the progression of asthma. Finally, a molecular docking simulation was performed and the results exhibited that 17 pairs of chemical components and candidate YHQFC targets involved in asthma pathway had strong binding efficiencies. In conclusion, this network pharmacology-based investigation revealed that YHQFC may attenuate the inflammatory reaction of chronic bronchitis by regulating its candidate targets, which may be implicated in the major pathological processes of the asthma pathway. PMID:28053519
ERIC Educational Resources Information Center
Lage, Kathryn; Losoff, Barbara; Maness, Jack
2011-01-01
Increasingly libraries are expected to play a role in scientific data curation initiatives, i.e., "the management and preservation of digital data over the long-term." This case study offers a novel approach for identifying researchers who are receptive toward library involvement in data curation. The authors interviewed researchers at…
ERIC Educational Resources Information Center
Ungerer, Leona M.
2016-01-01
Digital curation may be regarded as a core competency in higher education since it contributes to establishing a sense of metaliteracy (an essential requirement for optimally functioning in a modern media environment) among students. Digital curation is gradually finding its way into higher education curricula aimed at fostering social media…
The i5k Workspace@NAL—enabling genomic data access, visualization and curation of arthropod genomes
Poelchau, Monica; Childers, Christopher; Moore, Gary; Tsavatapalli, Vijaya; Evans, Jay; Lee, Chien-Yueh; Lin, Han; Lin, Jun-Wei; Hackett, Kevin
2015-01-01
The 5000 arthropod genomes initiative (i5k) has tasked itself with coordinating the sequencing of 5000 insect or related arthropod genomes. The resulting influx of data, mostly from small research groups or communities with little bioinformatics experience, will require visualization, dissemination and curation, preferably from a centralized platform. The National Agricultural Library (NAL) has implemented the i5k Workspace@NAL (http://i5k.nal.usda.gov/) to help meet the i5k initiative's genome hosting needs. Any i5k member is encouraged to contact the i5k Workspace with their genome project details. Once submitted, new content will be accessible via organism pages, genome browsers and BLAST search engines, which are implemented via the open-source Tripal framework, a web interface for the underlying Chado database schema. We also implement the Web Apollo software for groups that choose to curate gene models. New content will add to the existing body of 35 arthropod species, which include species relevant for many aspects of arthropod genomic research, including agriculture, invasion biology, systematics, ecology and evolution, and developmental research. PMID:25332403
Huybregts, Lieven; Becquey, Elodie; Zongrone, Amanda; Le Port, Agnes; Khassanova, Regina; Coulibaly, Lazare; Leroy, Jef L; Rawat, Rahul; Ruel, Marie T
2017-03-09
Evidence suggests that both preventive and curative nutrition interventions are needed to tackle child acute malnutrition (AM) in developing countries. In addition to reducing the incidence of AM, providing preventive interventions may also help increase attendance (and coverage) of AM screening, a major constraint in the community-based management of child acute malnutrition (CMAM) model. There is a paucity of evidence-based strategies to deliver integrated preventive and curative interventions effectively and affordably at scale. The aim of the Innovative Approaches for the Prevention of Childhood Malnutrition (PROMIS) study is to assess the feasibility, quality of implementation, effectiveness and cost-effectiveness of an integrated child malnutrition prevention and treatment intervention package implemented through a community-based platform in Mali and a facility-based platform in Burkina Faso. The PROMIS intervention entails a comprehensive preventive package offered on a monthly basis to caregivers of children, while children are screened for acute malnutrition (AM). The package consists of behavior change communication on essential nutrition and hygiene actions, and monthly preventive doses of small quantity lipid-based nutrient supplements (SQ-LNS) for children aged 6 to 23.9 months. Positive AM cases are referred to treatment services offered by first-line health services according to the CMAM model. The PROMIS intervention will be evaluated using a mixed methods approach. The impact study encompasses two types of study design: i) repeated cross-sectional surveys conducted at baseline and at endline after 24 months of program implementation and ii) a longitudinal study with a monthly follow-up for 18 months. Primary study impact measures include the incidence and endpoint prevalence of AM, AM screening coverage and treatment compliance. A process evaluation will assess the feasibility and quality of implementation of the intervention guided by country specific program impact pathways (PIPs). Cost-effectiveness analysis will assess the economic feasibility of the intervention. The PROMIS study assesses the effectiveness of an innovative model to integrate prevention and treatment interventions for greater and more sustainable impacts on the incidence and prevalence of AM using a rigorous, theory-based randomized control trial approach. This type of programmatic research is urgently needed to help program implementers, policy makers, and investors prioritize, select and scale-up the best program models to prevent and treat AM and achieve the World Health Assembly goal of reducing childhood wasting to less than 5% globally by the year 2025. Clinicaltrials.gov NCT02323815 (registered on December 18, 2014) and NCT02245152 (registered on September 16, 2014).
Altermann, Eric; Lu, Jingli; McCulloch, Alan
2017-01-01
Expert curated annotation remains one of the critical steps in achieving a reliable biological relevant annotation. Here we announce the release of GAMOLA2, a user friendly and comprehensive software package to process, annotate and curate draft and complete bacterial, archaeal, and viral genomes. GAMOLA2 represents a wrapping tool to combine gene model determination, functional Blast, COG, Pfam, and TIGRfam analyses with structural predictions including detection of tRNAs, rRNA genes, non-coding RNAs, signal protein cleavage sites, transmembrane helices, CRISPR repeats and vector sequence contaminations. GAMOLA2 has already been validated in a wide range of bacterial and archaeal genomes, and its modular concept allows easy addition of further functionality in future releases. A modified and adapted version of the Artemis Genome Viewer (Sanger Institute) has been developed to leverage the additional features and underlying information provided by the GAMOLA2 analysis, and is part of the software distribution. In addition to genome annotations, GAMOLA2 features, among others, supplemental modules that assist in the creation of custom Blast databases, annotation transfers between genome versions, and the preparation of Genbank files for submission via the NCBI Sequin tool. GAMOLA2 is intended to be run under a Linux environment, whereas the subsequent visualization and manual curation in Artemis is mobile and platform independent. The development of GAMOLA2 is ongoing and community driven. New functionality can easily be added upon user requests, ensuring that GAMOLA2 provides information relevant to microbiologists. The software is available free of charge for academic use. PMID:28386247
Jing, Chu-Yu; Fu, Yi-Peng; Zheng, Su-Su; Yi, Yong; Shen, Hu-Jia; Huang, Jin-Long; Xu, Xin; Lin, Jia-Jia; Zhou, Jian; Fan, Jia; Ren, Zheng-Gang; Qiu, Shuang-Jian; Zhang, Bo-Heng
2017-01-01
Abstract Adjuvant transarterial chemoembolization (TACE) is a major option for postoperative hepatocellular carcinoma (HCC) patients with recurrence risk factors. However, individualized predictive models for subgroup of these patients are limited. This study aimed to develop a prognostic nomogram for patients with HCC underwent adjuvant TACE following curative resection. A cohort comprising 144 HCC patients who received adjuvant TACE following curative resection in the Zhongshan Hospital were analyzed. The nomogram was formulated based on independent prognostic indicators for overall survival (OS). The performance of the nomogram was evaluated by the concordance index (C-index), calibration curve, and decision curve analysis (DCA) and compared with the conventional staging systems. The results were validated in an independent cohort of 86 patients with the same inclusion criteria. Serum alpha-fetoprotein (AFP), hyper-sensitive C-reactive protein (hs-CRP), incomplete tumor encapsulation, and double positive staining of Cytokeratin 7 and Cytokeratin 19 on tumor cells were identified as independent predictors for OS. The C-indices of the nomogram for OS prediction in the training cohort and validation cohort were 0.787 (95%CI 0.775–0.799) and 0.714 (95%CI 0.695–0.733), respectively. In both the training and validation cohorts, the calibration plot showed good consistency between the nomogram-predicted and the observed survival. Furthermore, the established nomogram was superior to the conventional staging systems in terms of C-index and clinical net benefit on DCA. The proposed nomogram provided an accurate prediction on risk stratification for HCC patients underwent adjuvant TACE following curative resection. PMID:28296727
Altermann, Eric; Lu, Jingli; McCulloch, Alan
2017-01-01
Expert curated annotation remains one of the critical steps in achieving a reliable biological relevant annotation. Here we announce the release of GAMOLA2, a user friendly and comprehensive software package to process, annotate and curate draft and complete bacterial, archaeal, and viral genomes. GAMOLA2 represents a wrapping tool to combine gene model determination, functional Blast, COG, Pfam, and TIGRfam analyses with structural predictions including detection of tRNAs, rRNA genes, non-coding RNAs, signal protein cleavage sites, transmembrane helices, CRISPR repeats and vector sequence contaminations. GAMOLA2 has already been validated in a wide range of bacterial and archaeal genomes, and its modular concept allows easy addition of further functionality in future releases. A modified and adapted version of the Artemis Genome Viewer (Sanger Institute) has been developed to leverage the additional features and underlying information provided by the GAMOLA2 analysis, and is part of the software distribution. In addition to genome annotations, GAMOLA2 features, among others, supplemental modules that assist in the creation of custom Blast databases, annotation transfers between genome versions, and the preparation of Genbank files for submission via the NCBI Sequin tool. GAMOLA2 is intended to be run under a Linux environment, whereas the subsequent visualization and manual curation in Artemis is mobile and platform independent. The development of GAMOLA2 is ongoing and community driven. New functionality can easily be added upon user requests, ensuring that GAMOLA2 provides information relevant to microbiologists. The software is available free of charge for academic use.
Lee, Jay S; Parashar, Vartika; Miller, Jacquelyn B; Bremmer, Samantha M; Vu, Joceline V; Waljee, Jennifer F; Dossett, Lesly A
2018-07-01
Excessive opioid prescribing is common after curative-intent surgery, but little is known about what factors influence prescribing behaviors among surgeons. To identify targets for intervention, we performed a qualitative study of opioid prescribing after curative-intent surgery using the Theoretical Domains Framework, a well-established implementation science method for identifying factors influencing healthcare provider behavior. Prior to data collection, we constructed a semi-structured interview guide to explore decision making for opioid prescribing. We then conducted interviews with surgical oncology providers at a single comprehensive cancer center. Interviews were recorded, transcribed verbatim, then independently coded by two investigators using the Theoretical Domains Framework to identify theoretical domains relevant to opioid prescribing. Relevant domains were then linked to behavior models to select targeted interventions likely to improve opioid prescribing. Twenty-one subjects were interviewed from November 2016 to May 2017, including attending surgeons, resident surgeons, physician assistants, and nurses. Five theoretical domains emerged as relevant to opioid prescribing: environmental context and resources; social influences; beliefs about consequences; social/professional role and identity; and goals. Using these domains, three interventions were identified as likely to change opioid prescribing behavior: (1) enablement (deploy nurses during preoperative visits to counsel patients on opioid use); (2) environmental restructuring (provide on-screen prompts with normative data on the quantity of opioid prescribed); and (3) education (provide prescribing guidelines). Key determinants of opioid prescribing behavior after curative-intent surgery include environmental and social factors. Interventions targeting these factors are likely to improve opioid prescribing in surgical oncology.
The preventive-curative conflict in primary health care.
De Sa, C
1993-04-01
Approximately 80% of the rural population in developing countries do not have access to appropriate curative care. The primary health care (PHC) approach emphasizes promotive and preventive services. Yet most people in developing countries consider curative care to be more important. Thus, PHC should include curative and rehabilitative care along with preventive and promotive care. The conflict between preventive and curative care is apparent at the community level, among health workers from all levels of the health system, and among policy makers. Community members are sometimes willing to pay for curative services but not preventive services. Further, they believe that they already know enough to prevent illness. Community health workers (CHWs), the mainstays of most PHC projects are trained in preventive efforts, but this hinders their effectiveness, since the community expects curative care. Besides, 66% of villagers' health problems require curative care. Further, CHWs are isolated from health professionals, adding to their inability to effect positive change. Health professionals are often unable to set up a relationship of trust with the community, largely due to their urban-based medical education. They tend not to explain treatment to patients or to simplify explanations in a condescending manner. They also mystify diseases, preventing people from understanding their own bodies and managing their illnesses. National governments often misinterpret national health policies promoting PHC and implement them from a top-down approach rather than from the bottom-up PHC-advocated approach. Nongovernmental organizations (NGOs) and international agencies also interpret PHC in different ways. Still, strong partnerships between government, NGOs, private sector, and international agencies are needed for effective implementation of PHC. Yet, many countries continue to have complex hierarchical social structures, inequitable distribution, and inadequate resources, making it difficult to implement effective PHC.
Lunar and Meteorite Thin Sections for Undergraduate and Graduate Studies
NASA Astrophysics Data System (ADS)
Allen, J.; Allen, C.
2012-12-01
The Johnson Space Center (JSC) has the unique responsibility to curate NASA's extraterrestrial samples from past and future missions. Curation includes documentation, preservation, preparation, and distribution of samples for research, education, and public outreach. Studies of rock and soil samples from the Moon and meteorites continue to yield useful information about the early history of the Moon, the Earth, and the inner solar system. Petrographic Thin Section Packages containing polished thin sections of samples from either the Lunar or Meteorite collections have been prepared. Each set of twelve sections of Apollo lunar samples or twelve sections of meteorites is available for loan from JSC. The thin sections sets are designed for use in domestic college and university courses in petrology. The loan period is very strict and limited to two weeks. Contact Ms. Mary Luckey, Education Sample Curator. Email address: mary.k.luckey@nasa.gov Each set of slides is accompanied by teaching materials and a sample disk of representative lunar or meteorite samples. It is important to note that the samples in these sets are not exactly the same as the ones listed here. This list represents one set of samples. A key education resource available on the Curation website is Antarctic Meteorite Teaching Collection: Educational Meteorite Thin Sections, originally compiled by Bevan French, Glenn McPherson, and Roy Clarke and revised by Kevin Righter in 2010. Curation Websites College and university staff and students are encouraged to access the Lunar Petrographic Thin Section Set Publication and the Meteorite Petrographic Thin Section Package Resource which feature many thin section images and detailed descriptions of the samples, research results. http://curator.jsc.nasa.gov/Education/index.cfm Request research samples: http://curator.jsc.nasa.gov/ JSC-CURATION-EDUCATION-DISKS@mail.nasa.govLunar Thin Sections; Meteorite Thin Sections;
ChemiRs: a web application for microRNAs and chemicals.
Su, Emily Chia-Yu; Chen, Yu-Sing; Tien, Yun-Cheng; Liu, Jeff; Ho, Bing-Ching; Yu, Sung-Liang; Singh, Sher
2016-04-18
MicroRNAs (miRNAs) are about 22 nucleotides, non-coding RNAs that affect various cellular functions, and play a regulatory role in different organisms including human. Until now, more than 2500 mature miRNAs in human have been discovered and registered, but still lack of information or algorithms to reveal the relations among miRNAs, environmental chemicals and human health. Chemicals in environment affect our health and daily life, and some of them can lead to diseases by inferring biological pathways. We develop a creditable online web server, ChemiRs, for predicting interactions and relations among miRNAs, chemicals and pathways. The database not only compares gene lists affected by chemicals and miRNAs, but also incorporates curated pathways to identify possible interactions. Here, we manually retrieved associations of miRNAs and chemicals from biomedical literature. We developed an online system, ChemiRs, which contains miRNAs, diseases, Medical Subject Heading (MeSH) terms, chemicals, genes, pathways and PubMed IDs. We connected each miRNA to miRBase, and every current gene symbol to HUGO Gene Nomenclature Committee (HGNC) for genome annotation. Human pathway information is also provided from KEGG and REACTOME databases. Information about Gene Ontology (GO) is queried from GO Online SQL Environment (GOOSE). With a user-friendly interface, the web application is easy to use. Multiple query results can be easily integrated and exported as report documents in PDF format. Association analysis of miRNAs and chemicals can help us understand the pathogenesis of chemical components. ChemiRs is freely available for public use at http://omics.biol.ntnu.edu.tw/ChemiRs .
[Cost variation in care groups?
Mohnen, S M; Molema, C C M; Steenbeek, W; van den Berg, M J; de Bruin, S R; Baan, C A; Struijs, J N
2017-01-01
Is the simple mean of the costs per diabetes patient a suitable tool with which to compare care groups? Do the total costs of care per diabetes patient really give the best insight into care group performance? Cross-sectional, multi-level study. The 2009 insurance claims of 104,544 diabetes patients managed by care groups in the Netherlands were analysed. The data were obtained from Vektis care information centre. For each care group we determined the mean costs per patient of all the curative care and diabetes-specific hospital care using the simple mean method, then repeated it using the 'generalized linear mixed model'. We also calculated for which proportion the differences found could be attributed to the care groups themselves. The mean costs of the total curative care per patient were €3,092 - €6,546; there were no significant differences between care groups. The mixed model method resulted in less variation (€2,884 - €3,511), and there were a few significant differences. We found a similar result for diabetes-specific hospital care and the ranking position of the care groups proved to be dependent on the method used. The care group effect was limited, although it was greater in the diabetes-specific hospital costs than in the total costs of curative care (6.7% vs. 0.4%). The method used to benchmark care groups carries considerable weight. Simply stated, determining the mean costs of care (still often done) leads to an overestimation of the differences between care groups. The generalized linear mixed model is more accurate and yields better comparisons. However, the fact remains that 'total costs of care' is a faulty indicator since care groups have little impact on them. A more informative indicator is 'costs of diabetes-specific hospital care' as these costs are more influenced by care groups.
Feng, Weiwei; Mao, Guanghua; Li, Qian; Wang, Wei; Chen, Yao; Zhao, Ting; Li, Fang; Zou, Ye; Wu, Huiyu; Yang, Liuqing; Wu, Xiangyang
2015-01-01
Aims/Introduction The present study was designed to evaluate the effect of chromium malate on glycometabolism, glycometabolism-related enzyme levels and lipid metabolism in type 2 diabetic rats, and dose–response and curative effects. Materials and Methods The model of type 2 diabetes rats was developed, and daily treatment with chromium malate was given for 4 weeks. A rat enzyme-linked immunosorbent assay kit was used to assay glycometabolism, glycometabolism-related enzyme levels and lipid metabolism changes. Results The results showed that the antihyperglycemic activity increased with administration of chromium malate in a dose–dependent manner. The serum insulin level, insulin resistance index and C-peptide level of the chromium malate groups at a dose of 17.5, 20.0 and 20.8 μg chromium/kg bodyweight were significantly lower than that of the model, chromium trichloride and chromium picolinate groups. The hepatic glycogen, glucose-6-phosphate dehydrogenase and glucokinase levels of the chromium malate groups at a dose of 17.5, 20.0 and 20.8 μg chromium/kg bodyweight were significantly higher than that of the model, chromium trichloride and chromium picolinate groups. Chromium malate at a dose of 20.0 and 20.8 μg chromium/kg bodyweight significantly changed the total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and triglycerides levels compared with the chromium trichloride and chromium picolinate groups. Conclusions The results showed that chromium malate exhibits greater benefits in treating type 2 diabetes, and the curative effect of chromium malate is superior to chromium trichloride and chromium picolinate. PMID:26221518
Brown, Ramsay A; Swanson, Larry W
2013-09-01
Systematic description and the unambiguous communication of findings and models remain among the unresolved fundamental challenges in systems neuroscience. No common descriptive frameworks exist to describe systematically the connective architecture of the nervous system, even at the grossest level of observation. Furthermore, the accelerating volume of novel data generated on neural connectivity outpaces the rate at which this data is curated into neuroinformatics databases to synthesize digitally systems-level insights from disjointed reports and observations. To help address these challenges, we propose the Neural Systems Language (NSyL). NSyL is a modeling language to be used by investigators to encode and communicate systematically reports of neural connectivity from neuroanatomy and brain imaging. NSyL engenders systematic description and communication of connectivity irrespective of the animal taxon described, experimental or observational technique implemented, or nomenclature referenced. As a language, NSyL is internally consistent, concise, and comprehensible to both humans and computers. NSyL is a promising development for systematizing the representation of neural architecture, effectively managing the increasing volume of data on neural connectivity and streamlining systems neuroscience research. Here we present similar precedent systems, how NSyL extends existing frameworks, and the reasoning behind NSyL's development. We explore NSyL's potential for balancing robustness and consistency in representation by encoding previously reported assertions of connectivity from the literature as examples. Finally, we propose and discuss the implications of a framework for how NSyL will be digitally implemented in the future to streamline curation of experimental results and bridge the gaps among anatomists, imagers, and neuroinformatics databases. Copyright © 2013 Wiley Periodicals, Inc.
Levin, A; Rahman, M A; Quayyum, Z; Routh, S; Barkat-e-Khuda
2001-01-01
This paper seeks to investigate the determinants of child health care seeking behaviours in rural Bangladesh. In particular, the effects of income, women's access to income, and the prices of obtaining child health care are examined. Data on the use of child curative care were collected in two rural areas of Bangladesh--Abhoynagar Thana of Jessore District and Mirsarai Thana of Chittagong District--in March 1997. In estimating the use of child curative care, the nested multinomial logit specification was used. The results of the analysis indicate that a woman's involvement in a credit union or income generation affected the likelihood that curative child care was used. Household wealth decreased the likelihood that the child had an illness episode and affected the likelihood that curative child care was sought. Among facility characteristics, travel time was statistically significant and was negatively associated with the use of a provider.
Reflections on curative health care in Nicaragua.
Slater, R G
1989-01-01
Improved health care in Nicaragua is a major priority of the Sandinista revolution; it has been pursued by major reforms of the national health care system, something few developing countries have attempted. In addition to its internationally recognized advances in public health, considerable progress has been made in health care delivery by expanding curative medical services through training more personnel and building more facilities to fulfill a commitment to free universal health coverage. The very uneven quality of medical care is the leading problem facing curative medicine now. Underlying factors include the difficulty of adequately training the greatly increased number of new physicians. Misdiagnosis and mismanagement continue to be major problems. The curative medical system is not well coordinated with the preventive sector. Recent innovations include initiation of a "medicina integral" residency, similar to family practice. Despite its inadequacies and the handicaps of war and poverty, the Nicaraguan curative medical system has made important progress. PMID:2705603
Curating blood: how students' and researchers' drawings bring potential phenomena to light
NASA Astrophysics Data System (ADS)
Hay, D. B.; Pitchford, S.
2016-11-01
This paper explores students and researchers drawings of white blood cell recruitment. The data combines interviews with exhibit of review-type academic images and analyses of student model-drawings. The analysis focuses on the material aspects of bio-scientific data-making and we use the literature of concrete bioscience modelling to differentiate the qualities of students model-making choices: novelty versus reproduction; completeness versus simplicity; and the achievement of similarity towards selected model targets. We show that while drawing on already published images, some third-year undergraduates are able to curate novel, and yet plausible causal channels in their graphic representations, implicating new phenomenal potentials as lead researchers do in their review-type academic publications. Our work links the virtues of drawing to learn to the disclosure of potential epistemic things, involving close attention to the contours of non-linguistic stuff and corresponding sensory perception of substance; space; time; shape and size; position; and force. The paper documents the authority and power students may achieve through making knowledge rather than repeating it. We show the ways in which drawing on the images elicited by others helps to develop physical, sensory, and sometimes affective relations towards the real and concrete world of scientific practice.
Interleukins and their signaling pathways in the Reactome biological pathway database.
Jupe, Steve; Ray, Keith; Roca, Corina Duenas; Varusai, Thawfeek; Shamovsky, Veronica; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning
2018-04-01
There is a wealth of biological pathway information available in the scientific literature, but it is spread across many thousands of publications. Alongside publications that contain definitive experimental discoveries are many others that have been dismissed as spurious, found to be irreproducible, or are contradicted by later results and consequently now considered controversial. Many descriptions and images of pathways are incomplete stylized representations that assume the reader is an expert and familiar with the established details of the process, which are consequently not fully explained. Pathway representations in publications frequently do not represent a complete, detailed, and unambiguous description of the molecules involved; their precise posttranslational state; or a full account of the molecular events they undergo while participating in a process. Although this might be sufficient to be interpreted by an expert reader, the lack of detail makes such pathways less useful and difficult to understand for anyone unfamiliar with the area and of limited use as the basis for computational models. Reactome was established as a freely accessible knowledge base of human biological pathways. It is manually populated with interconnected molecular events that fully detail the molecular participants linked to published experimental data and background material by using a formal and open data structure that facilitates computational reuse. These data are accessible on a Web site in the form of pathway diagrams that have descriptive summaries and annotations and as downloadable data sets in several formats that can be reused with other computational tools. The entire database and all supporting software can be downloaded and reused under a Creative Commons license. Pathways are authored by expert biologists who work with Reactome curators and editorial staff to represent the consensus in the field. Pathways are represented as interactive diagrams that include as much molecular detail as possible and are linked to literature citations that contain supporting experimental details. All newly created events undergo a peer-review process before they are added to the database and made available on the associated Web site. New content is added quarterly. The 63rd release of Reactome in December 2017 contains 10,996 human proteins participating in 11,426 events in 2,179 pathways. In addition, analytic tools allow data set submission for the identification and visualization of pathway enrichment and representation of expression profiles as an overlay on Reactome pathways. Protein-protein and compound-protein interactions from several sources, including custom user data sets, can be added to extend pathways. Pathway diagrams and analytic result displays can be downloaded as editable images, human-readable reports, and files in several standard formats that are suitable for computational reuse. Reactome content is available programmatically through a REpresentational State Transfer (REST)-based content service and as a Neo4J graph database. Signaling pathways for IL-1 to IL-38 are hierarchically classified within the pathway "signaling by interleukins." The classification used is largely derived from Akdis et al. The addition to Reactome of a complete set of the known human interleukins, their receptors, and established signaling pathways linked to annotations of relevant aspects of immune function provides a significant computationally accessible resource of information about this important family. This information can be extended easily as new discoveries become accepted as the consensus in the field. A key aim for the future is to increase coverage of gene expression changes induced by interleukin signaling. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Deng, Yiqi; Zhu, Lingjuan; Cai, Haoyang; Wang, Guan; Liu, Bo
2018-06-01
Autophagy, a highly conserved lysosomal degradation process in eukaryotic cells, can digest long-lived proteins and damaged organelles through vesicular trafficking pathways. Nowadays, mechanisms of autophagy have been gradually elucidated and thus the discovery of small-molecule drugs targeting autophagy has always been drawing much attention. So far, some autophagy-related web servers have been available online to facilitate scientists to obtain the information relevant to autophagy conveniently, such as HADb, CTLPScanner, iLIR server and ncRDeathDB. However, to the best of our knowledge, there is not any web server available about the autophagy-modulating compounds. According to published articles, all the compounds and their relations with autophagy were anatomized. Subsequently, an online Autophagic Compound Database (ACDB) (http://www.acdbliulab.com/) was constructed, which contained information of 357 compounds with 164 corresponding signalling pathways and potential targets in different diseases. We achieved a great deal of information of autophagy-modulating compounds, including compounds, targets/pathways and diseases. ACDB is a valuable resource for users to access to more than 300 curated small-molecule compounds correlated with autophagy. Autophagic compound database will facilitate to the discovery of more novel therapeutic drugs in the near future. © 2017 John Wiley & Sons Ltd.
Accelerating Adverse Outcome Pathway (AOP) development ...
The Adverse Outcome Pathway (AOP) framework is increasingly being adopted as a tool for organizing and summarizing the mechanistic information connecting molecular perturbations by environmental stressors with adverse outcomes relevant for ecological and human health outcomes. However, the conventional process for assembly of these AOPs is time and resource intensive, and has been a rate limiting step for AOP use and development. Therefore computational approaches to accelerate the process need to be developed. We previously developed a method for generating computationally predicted AOPs (cpAOPs) by association mining and integration of data from publicly available databases. In this work, a cpAOP network of ~21,000 associations was established between 105 phenotypes from TG-GATEs rat liver data from different time points (including microarray, pathological effects and clinical chemistry data), 994 REACTOME pathways, 688 High-throughput assays from ToxCast and 194 chemicals. A second network of 128,536 associations was generated by connecting 255 biological target genes from ToxCast to 4,980 diseases from CTD using either HT screening activity from ToxCast for 286 chemicals or CTD gene expression changes in response to 2,330 chemicals. Both networks were separately evaluated through manual extraction of disease-specific cpAOPs and comparison with expert curation of the relevant literature. By employing data integration strategies that involve the weighting of n
Lu, Chi; Xie, Conghua
2016-06-01
Radiotherapy is an important treatment modality for esophageal cancer; however, the clinical efficacy of radiotherapy is limited by tumor radioresistance. In the present study, we explored the hypothesis that radiation induces tumor cell autophagy as a cytoprotective adaptive response, which depends on liver kinase B1 (LKB1) also known as serine/threonine kinase 11 (STK11). Radiation-induced Eca-109 cell autophagy was found to be dependent on signaling through the LKB1 pathway, and autophagy inhibitors that disrupted radiation-induced Eca-109 cell autophagy increased cell cycle arrest and cell death in vitro. Inhibition of autophagy also reduced the clonogenic survival of the Eca-109 cells. When treated with radiation alone, human esophageal carcinoma xenografts showed increased LC3B and p-LKB1 expression, which was decreased by the autophagy inhibitor chloroquine. In vivo inhibition of autophagy disrupted tumor growth and increased tumor apoptosis when combined with 6 Gy of ionizing radiation. In summary, our findings elucidate a novel mechanism of resistance to radiotherapy in which radiation-induced autophagy, via the LKB1 pathway, promotes tumor cell survival. This indicates that inhibition of autophagy can serve as an adjuvant treatment to improve the curative effect of radiotherapy.
SwissPalm: Protein Palmitoylation database.
Blanc, Mathieu; David, Fabrice; Abrami, Laurence; Migliozzi, Daniel; Armand, Florence; Bürgi, Jérôme; van der Goot, Françoise Gisou
2015-01-01
Protein S-palmitoylation is a reversible post-translational modification that regulates many key biological processes, although the full extent and functions of protein S-palmitoylation remain largely unexplored. Recent developments of new chemical methods have allowed the establishment of palmitoyl-proteomes of a variety of cell lines and tissues from different species. As the amount of information generated by these high-throughput studies is increasing, the field requires centralization and comparison of this information. Here we present SwissPalm ( http://swisspalm.epfl.ch), our open, comprehensive, manually curated resource to study protein S-palmitoylation. It currently encompasses more than 5000 S-palmitoylated protein hits from seven species, and contains more than 500 specific sites of S-palmitoylation. SwissPalm also provides curated information and filters that increase the confidence in true positive hits, and integrates predictions of S-palmitoylated cysteine scores, orthologs and isoform multiple alignments. Systems analysis of the palmitoyl-proteome screens indicate that 10% or more of the human proteome is susceptible to S-palmitoylation. Moreover, ontology and pathway analyses of the human palmitoyl-proteome reveal that key biological functions involve this reversible lipid modification. Comparative analysis finally shows a strong crosstalk between S-palmitoylation and other post-translational modifications. Through the compilation of data and continuous updates, SwissPalm will provide a powerful tool to unravel the global importance of protein S-palmitoylation.
SwissPalm: Protein Palmitoylation database
Abrami, Laurence; Migliozzi, Daniel; Armand, Florence; Bürgi, Jérôme; van der Goot, Françoise Gisou
2015-01-01
Protein S-palmitoylation is a reversible post-translational modification that regulates many key biological processes, although the full extent and functions of protein S-palmitoylation remain largely unexplored. Recent developments of new chemical methods have allowed the establishment of palmitoyl-proteomes of a variety of cell lines and tissues from different species. As the amount of information generated by these high-throughput studies is increasing, the field requires centralization and comparison of this information. Here we present SwissPalm ( http://swisspalm.epfl.ch), our open, comprehensive, manually curated resource to study protein S-palmitoylation. It currently encompasses more than 5000 S-palmitoylated protein hits from seven species, and contains more than 500 specific sites of S-palmitoylation. SwissPalm also provides curated information and filters that increase the confidence in true positive hits, and integrates predictions of S-palmitoylated cysteine scores, orthologs and isoform multiple alignments. Systems analysis of the palmitoyl-proteome screens indicate that 10% or more of the human proteome is susceptible to S-palmitoylation. Moreover, ontology and pathway analyses of the human palmitoyl-proteome reveal that key biological functions involve this reversible lipid modification. Comparative analysis finally shows a strong crosstalk between S-palmitoylation and other post-translational modifications. Through the compilation of data and continuous updates, SwissPalm will provide a powerful tool to unravel the global importance of protein S-palmitoylation. PMID:26339475
Chen, I-Min A; Markowitz, Victor M; Palaniappan, Krishna; Szeto, Ernest; Chu, Ken; Huang, Jinghua; Ratner, Anna; Pillay, Manoj; Hadjithomas, Michalis; Huntemann, Marcel; Mikhailova, Natalia; Ovchinnikova, Galina; Ivanova, Natalia N; Kyrpides, Nikos C
2016-04-26
The exponential growth of genomic data from next generation technologies renders traditional manual expert curation effort unsustainable. Many genomic systems have included community annotation tools to address the problem. Most of these systems adopted a "Wiki-based" approach to take advantage of existing wiki technologies, but encountered obstacles in issues such as usability, authorship recognition, information reliability and incentive for community participation. Here, we present a different approach, relying on tightly integrated method rather than "Wiki-based" method, to support community annotation and user collaboration in the Integrated Microbial Genomes (IMG) system. The IMG approach allows users to use existing IMG data warehouse and analysis tools to add gene, pathway and biosynthetic cluster annotations, to analyze/reorganize contigs, genes and functions using workspace datasets, and to share private user annotations and workspace datasets with collaborators. We show that the annotation effort using IMG can be part of the research process to overcome the user incentive and authorship recognition problems thus fostering collaboration among domain experts. The usability and reliability issues are addressed by the integration of curated information and analysis tools in IMG, together with DOE Joint Genome Institute (JGI) expert review. By incorporating annotation operations into IMG, we provide an integrated environment for users to perform deeper and extended data analysis and annotation in a single system that can lead to publications and community knowledge sharing as shown in the case studies.
Data and the Shift in Systems, Services, and Literacy
ERIC Educational Resources Information Center
Mitchell, Erik T.
2012-01-01
This month, the "Journal of Web Librarianship" is exploring the idea of data curation and its uses in libraries. The word "data" is as universal now as the word "cloud" was last year, and it is no accident that libraries are exploring how best to support data curation services. Data curation involves library activities in just about every way,…
ERIC Educational Resources Information Center
Hodge, Zach
2017-01-01
Tullahoma City Schools, a rural district in Middle Tennessee, recently switched from traditional static textbooks to an online, open educational resource platform. As a result of this change the role of curator, a teacher who creates the Flexbook by compiling and organizing content, was created. This research project sought to add to the limited…
Jointly creating digital abstracts: dealing with synonymy and polysemy
2012-01-01
Background Ideally each Life Science article should get a ‘structured digital abstract’. This is a structured summary of the paper’s findings that is both human-verified and machine-readable. But articles can contain a large variety of information types and contextual details that all need to be reconciled with appropriate names, terms and identifiers, which poses a challenge to any curator. Current approaches mostly use tagging or limited entry-forms for semantic encoding. Findings We implemented a ‘controlled language’ as a more expressive representation method. We studied how usable this format was for wet-lab-biologists that volunteered as curators. We assessed some issues that arise with the usability of ontologies and other controlled vocabularies, for the encoding of structured information by ‘untrained’ curators. We take a user-oriented viewpoint, and make recommendations that may prove useful for creating a better curation environment: one that can engage a large community of volunteer curators. Conclusions Entering information in a biocuration environment could improve in expressiveness and user-friendliness, if curators would be enabled to use synonymous and polysemous terms literally, whereby each term stays linked to an identifier. PMID:23110757
Gramene 2013: Comparative plant genomics resources
USDA-ARS?s Scientific Manuscript database
Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework fo...
Advancing Models and Data for Characterizing Exposures to Chemicals in Consumer Products
EPA’s Office of Research and Development (ORD) is leading several efforts to develop data and methods for estimating population chemical exposures related to the use of consumer products. New curated chemical, ingredient, and product use information are being collected fro...
Mouse Genome Database: From sequence to phenotypes and disease models
Richardson, Joel E.; Kadin, James A.; Smith, Cynthia L.; Blake, Judith A.; Bult, Carol J.
2015-01-01
Summary The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here, we describe the data acquisition process, specifics about MGD's key data areas, methods to access and query MGD data, and outreach and user help facilities. genesis 53:458–473, 2015. © 2015 The Authors. Genesis Published by Wiley Periodicals, Inc. PMID:26150326
GMODWeb: a web framework for the generic model organism database
O'Connor, Brian D; Day, Allen; Cain, Scott; Arnaiz, Olivier; Sperling, Linda; Stein, Lincoln D
2008-01-01
The Generic Model Organism Database (GMOD) initiative provides species-agnostic data models and software tools for representing curated model organism data. Here we describe GMODWeb, a GMOD project designed to speed the development of model organism database (MOD) websites. Sites created with GMODWeb provide integration with other GMOD tools and allow users to browse and search through a variety of data types. GMODWeb was built using the open source Turnkey web framework and is available from . PMID:18570664
The lung in liver disease: old problem, new concepts.
Fallon, Michael B; Zhang, Junlan
2013-01-01
Liver dysfunction has been recognized to influence the lung in many different clinical situations, although the mechanisms for these effects are not well understood. One increasingly recognized interaction, the hepatopulmonary syndrome (HPS) occurs in the context of cirrhosis and results when alveolar microvascular dilation causes arterial gas exchange abnormalities and hypoxemia. HPS occurs in up to 30% of patients with cirrhosis and significantly increases mortality in affected patients. Currently, liver transplantation is the only curative therapy. Experimental biliary cirrhosis induced by common bile duct ligation (CBDL) in the rat reproduces the pulmonary vascular and gas exchange abnormalities of human HPS and has been contrasted with other experimental models of cirrhosis in which HPS does not develop. Microvascular dilation, intravascular monocyte infiltration, and angiogenesis in the lung have been identified as pathologic features that drive gas exchange abnormalities in experimental HPS. Our recent studies have identified biliary epithelium and activation and interaction between the endothelin-1 (ET-1)/endothelial endothelin B (ETB) receptor and CX3CL1/CX3CR1 pathways as important mechanisms for the observed pathologic events. These studies define novel interactions between the lung and liver in cirrhosis and may lead to effective medical therapies.
NASA Technical Reports Server (NTRS)
Snead, C. J.; McCubbin, F. M.; Nakamura-Messenger, K.; Righter, K.
2018-01-01
The Astromaterials Acquisition and Curation office at NASA Johnson Space Center has established an Advanced Curation program that is tasked with developing procedures, technologies, and data sets necessary for the curation of future astromaterials collections as envisioned by NASA exploration goals. One particular objective of the Advanced Curation program is the development of new methods for the collection, storage, handling and characterization of small (less than 100 micrometer) particles. Astromaterials Curation currently maintains four small particle collections: Cosmic Dust that has been collected in Earth's stratosphere by ER2 and WB-57 aircraft, Comet 81P/Wild 2 dust returned by NASA's Stardust spacecraft, interstellar dust that was returned by Stardust, and asteroid Itokawa particles that were returned by the JAXA's Hayabusa spacecraft. NASA Curation is currently preparing for the anticipated return of two new astromaterials collections - asteroid Ryugu regolith to be collected by Hayabusa2 spacecraft in 2021 (samples will be provided by JAXA as part of an international agreement), and asteroid Bennu regolith to be collected by the OSIRIS-REx spacecraft and returned in 2023. A substantial portion of these returned samples are expected to consist of small particle components, and mission requirements necessitate the development of new processing tools and methods in order to maximize the scientific yield from these valuable acquisitions. Here we describe initial progress towards the development of applicable sample handling methods for the successful curation of future small particle collections.
Dom, Martin; Offner, Fritz; Vanden Berghe, Wim; Van Ostade, Xaveer
2018-05-15
Withaferin A (WA), a natural steroid lactone from the plant Withania somnifera, is often studied because of its antitumor properties. Although many in vitro and in vivo studies have been performed, the identification of Withaferin A protein targets and its mechanism of antitumor action remain incomplete. We used quantitative chemoproteomics and differential protein expression analysis to characterize the WA antitumor effects on a multiple myeloma cell model. Identified relevant targets were further validated by Ingenuity Pathway Analysis and Western blot and indicate that WA targets protein networks that are specific for monoclonal gammopathy of undetermined significance (MGUS) and other closely related disorders, such as multiple myeloma (MM) and Waldenström macroglobulinemia (WM). By blocking the PSMB10 proteasome subunit, downregulation of ANXA4, potential association with HDAC6 and upregulation of HMOX1, WA puts a massive blockage on both proteotoxic and oxidative stress responses pathways, leaving cancer cells defenseless against WA induced stresses. These results indicate that WA mediated apoptosis is preceded by simultaneous targeting of cellular stress response pathways like proteasome degradation, autophagy and unfolded protein stress response and thus suggests that WA can be used as an effective treatment for MGUS and other closely related disorders. Multifunctional antitumor compounds are of great potential since they reduce the risk of multidrug resistance in chemotherapy. Unfortunately, characterization of all protein targets of a multifunctional compound is lacking. Therefore, we optimized an SILAC quantitative chemoproteomics workflow to identify the potential protein targets of Withaferin A (WA), a natural multifunctional compound with promising antitumor properties. To further understand the antitumor mechanisms of WA, we performed a differential protein expression analysis and combined the altered expression data with chemoproteome WA target data in the highly curated Ingenuity Pathway database. We provide a first global overview on how WA kills multiple myeloma cancer cells and serve as a starting point for further in depth experiments. Furthermore, the combined approach can be used for other types of cancer and/or other promising multifunctional compounds, thereby increasing the potential development of new antitumor therapies. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Elger, Kirsten; Ulbricht, Damian; Bertelmann, Roland
2017-04-01
Open access to research data is an increasing international request and includes not only data underlying scholarly publication, but also raw and curated data. Especially in the framework of the observed shift in many scientific fields towards data science and data mining, data repositories are becoming important player as data archives and access point to curated research data. While general and institutional data repositories are available across all scientific disciplines, domain-specific data repositories are specialised for scientific disciplines, like, e.g., bio- or geosciences, with the possibility to use more discipline-specific and richer metadata models than general repositories. Data publication is increasingly regarded as important scientific achievement, and datasets with digital object identifier (DOI) are now fully citable in journal articles. Moreover, following in their signature of the "Statement of Commitment of the Coalition on Publishing Data in the Earth and Space Sciences" (COPDESS), many publishers have adopted their data policies and recommend and even request to store and publish data underlying scholarly publications in (domain-specific) data repositories and not as classical supplementary material directly attached to the respective article. The curation of large dynamic data from global networks in, e.g., seismology, magnetics or geodesy, always required a high grade of professional, IT-supported data management, simply to be able to store and access the huge number of files and manage dynamic datasets. In contrast to these, the vast amount of research data acquired by individual investigators or small teams known as 'long-tail data' was often not the focus for the development of data curation infrastructures. Nevertheless, even though they are small in size and highly variable, in total they represent a significant portion of the total scientific outcome. The curation of long-tail data requires more individual approaches and personal involvement of the data curator, especially regarding the data description. Here we will introduce best practices for the publication of long-tail data that are helping to reduce the individual effort, improve the quality of the data description. The data repository of GFZ Data Services, which is hosted at GFZ German Research Centre for Geosciences in Potsdam, is a domain-specific data repository for geosciences. In addition to large dynamic datasets from different disciplines, it has a large focus on the DOI-referenced publication of long-tail data with the aim to reach a high grade of reusability through a comprehensive data description and in the same time provide and distribute standardised, machine actionable metadata for data discovery (FAIR data). The development of templates for data reports, metadata provision by scientists via an XML Metadata Editor and discipline-specific DOI landing pages are helping both, the data curators to handle all kinds of datasets and enabling the scientists, i.e. user, to quickly decide whether a published dataset is fulfilling their needs. In addition, GFZ Data Services have developed DOI-registration services for several international networks (e.g. ICGEM, World Stress Map, IGETS, etc.). In addition, we have developed project-or network-specific designs of the DOI landing pages with the logo or design of the networks or project
SPIKE – a database, visualization and analysis tool of cellular signaling pathways
Elkon, Ran; Vesterman, Rita; Amit, Nira; Ulitsky, Igor; Zohar, Idan; Weisz, Mali; Mass, Gilad; Orlev, Nir; Sternberg, Giora; Blekhman, Ran; Assa, Jackie; Shiloh, Yosef; Shamir, Ron
2008-01-01
Background Biological signaling pathways that govern cellular physiology form an intricate web of tightly regulated interlocking processes. Data on these regulatory networks are accumulating at an unprecedented pace. The assimilation, visualization and interpretation of these data have become a major challenge in biological research, and once met, will greatly boost our ability to understand cell functioning on a systems level. Results To cope with this challenge, we are developing the SPIKE knowledge-base of signaling pathways. SPIKE contains three main software components: 1) A database (DB) of biological signaling pathways. Carefully curated information from the literature and data from large public sources constitute distinct tiers of the DB. 2) A visualization package that allows interactive graphic representations of regulatory interactions stored in the DB and superposition of functional genomic and proteomic data on the maps. 3) An algorithmic inference engine that analyzes the networks for novel functional interplays between network components. SPIKE is designed and implemented as a community tool and therefore provides a user-friendly interface that allows registered users to upload data to SPIKE DB. Our vision is that the DB will be populated by a distributed and highly collaborative effort undertaken by multiple groups in the research community, where each group contributes data in its field of expertise. Conclusion The integrated capabilities of SPIKE make it a powerful platform for the analysis of signaling networks and the integration of knowledge on such networks with omics data. PMID:18289391
Consensus and conflict cards for metabolic pathway databases
2013-01-01
Background The metabolic network of H. sapiens and many other organisms is described in multiple pathway databases. The level of agreement between these descriptions, however, has proven to be low. We can use these different descriptions to our advantage by identifying conflicting information and combining their knowledge into a single, more accurate, and more complete description. This task is, however, far from trivial. Results We introduce the concept of Consensus and Conflict Cards (C2Cards) to provide concise overviews of what the databases do or do not agree on. Each card is centered at a single gene, EC number or reaction. These three complementary perspectives make it possible to distinguish disagreements on the underlying biology of a metabolic process from differences that can be explained by different decisions on how and in what detail to represent knowledge. As a proof-of-concept, we implemented C2CardsHuman, as a web application http://www.molgenis.org/c2cards, covering five human pathway databases. Conclusions C2Cards can contribute to ongoing reconciliation efforts by simplifying the identification of consensus and conflicts between pathway databases and lowering the threshold for experts to contribute. Several case studies illustrate the potential of the C2Cards in identifying disagreements on the underlying biology of a metabolic process. The overviews may also point out controversial biological knowledge that should be subject of further research. Finally, the examples provided emphasize the importance of manual curation and the need for a broad community involvement. PMID:23803311
Consensus and conflict cards for metabolic pathway databases.
Stobbe, Miranda D; Swertz, Morris A; Thiele, Ines; Rengaw, Trebor; van Kampen, Antoine H C; Moerland, Perry D
2013-06-26
The metabolic network of H. sapiens and many other organisms is described in multiple pathway databases. The level of agreement between these descriptions, however, has proven to be low. We can use these different descriptions to our advantage by identifying conflicting information and combining their knowledge into a single, more accurate, and more complete description. This task is, however, far from trivial. We introduce the concept of Consensus and Conflict Cards (C₂Cards) to provide concise overviews of what the databases do or do not agree on. Each card is centered at a single gene, EC number or reaction. These three complementary perspectives make it possible to distinguish disagreements on the underlying biology of a metabolic process from differences that can be explained by different decisions on how and in what detail to represent knowledge. As a proof-of-concept, we implemented C₂Cards(Human), as a web application http://www.molgenis.org/c2cards, covering five human pathway databases. C₂Cards can contribute to ongoing reconciliation efforts by simplifying the identification of consensus and conflicts between pathway databases and lowering the threshold for experts to contribute. Several case studies illustrate the potential of the C₂Cards in identifying disagreements on the underlying biology of a metabolic process. The overviews may also point out controversial biological knowledge that should be subject of further research. Finally, the examples provided emphasize the importance of manual curation and the need for a broad community involvement.
A crystallographic perspective on sharing data and knowledge
NASA Astrophysics Data System (ADS)
Bruno, Ian J.; Groom, Colin R.
2014-10-01
The crystallographic community is in many ways an exemplar of the benefits and practices of sharing data. Since the inception of the technique, virtually every published crystal structure has been made available to others. This has been achieved through the establishment of several specialist data centres, including the Cambridge Crystallographic Data Centre, which produces the Cambridge Structural Database. Containing curated structures of small organic molecules, some containing a metal, the database has been produced for almost 50 years. This has required the development of complex informatics tools and an environment allowing expert human curation. As importantly, a financial model has evolved which has, to date, ensured the sustainability of the resource. However, the opportunities afforded by technological changes and changing attitudes to sharing data make it an opportune moment to review current practices.
Lyu, Ming; Cui, Ying; Zhao, Tiechan; Ning, Zhaochen; Ren, Jie; Jin, Xingpiao; Fan, Guanwei; Zhu, Yan
2018-01-01
Shuxuening injection (SXNI) is a widely prescribed herbal medicine of Ginkgo biloba extract (EGB) for cerebral and cardiovascular diseases in China. However, its curative effects on ischemic stroke and heart diseases and the underlying mechanisms remain unknown. Taking an integrated approach of RNA-seq and network pharmacology analysis, we compared transcriptome profiles of brain and heart ischemia reperfusion injury in C57BL/6J mice to identify common and differential target genes by SXNI. Models for myocardial ischemia reperfusion injury (MIRI) by ligating left anterior descending coronary artery (LAD) for 30 min ischemia and 24 h reperfusion and cerebral ischemia reperfusion injury (CIRI) by middle cerebral artery occlusion (MCAO) for 90 min ischemia and 24 h reperfusion were employed to identify the common mechanisms of SXNI on both cerebral and myocardial ischemia reperfusion. In the CIRI model, ischemic infarct volume was markedly decreased after pre-treatment with SXNI at 0.5, 2.5, and 12.5 mL/kg. In the MIRI model, pre-treatment with SXNI at 2.5 and 12.5 mL/kg improved cardiac function and coronary blood flow and decreased myocardial infarction area. Besides, SXNI at 2.5 mL/kg also markedly reduced the levels of LDH, AST, CK-MB, and CK in serum. RNA-seq analysis identified 329 differentially expressed genes (DEGs) in brain and 94 DEGs in heart after SXNI treatment in CIRI or MIRI models, respectively. Core analysis by Ingenuity Pathway Analysis (IPA) revealed that atherosclerosis signaling and inflammatory response were top-ranked in the target profiles for both CIRI and MIRI after pre-treatment with SXNI. Specifically, Tnfrsf12a was recognized as an important common target, and was regulated by SXNI in CIRI and MIRI. In conclusion, our study showed that SXNI effectively protects brain and heart from I/R injuries via a common Tnfrsf12a-mediated pathway involving atherosclerosis signaling and inflammatory response. It provides a novel knowledge of active ingredients of Ginkgo biloba on cardio-cerebral vascular diseases in future clinical application. PMID:29681850
In vivo and in silico determination of essential genes of Campylobacter jejuni.
Metris, Aline; Reuter, Mark; Gaskin, Duncan J H; Baranyi, Jozsef; van Vliet, Arnoud H M
2011-11-01
In the United Kingdom, the thermophilic Campylobacter species C. jejuni and C. coli are the most frequent causes of food-borne gastroenteritis in humans. While campylobacteriosis is usually a relatively mild infection, it has a significant public health and economic impact, and possible complications include reactive arthritis and the autoimmune diseases Guillain-Barré syndrome. The rapid developments in "omics" technologies have resulted in the availability of diverse datasets allowing predictions of metabolism and physiology of pathogenic micro-organisms. When combined, these datasets may allow for the identification of potential weaknesses that can be used for development of new antimicrobials to reduce or eliminate C. jejuni and C. coli from the food chain. A metabolic model of C. jejuni was constructed using the annotation of the NCTC 11168 genome sequence, a published model of the related bacterium Helicobacter pylori, and extensive literature mining. Using this model, we have used in silico Flux Balance Analysis (FBA) to determine key metabolic routes that are essential for generating energy and biomass, thus creating a list of genes potentially essential for growth under laboratory conditions. To complement this in silico approach, candidate essential genes have been determined using a whole genome transposon mutagenesis method. FBA and transposon mutagenesis (both this study and a published study) predict a similar number of essential genes (around 200). The analysis of the intersection between the three approaches highlights the shikimate pathway where genes are predicted to be essential by one or more method, and tend to be network hubs, based on a previously published Campylobacter protein-protein interaction network, and could therefore be targets for novel antimicrobial therapy. We have constructed the first curated metabolic model for the food-borne pathogen Campylobacter jejuni and have presented the resulting metabolic insights. We have shown that the combination of in silico and in vivo approaches could point to non-redundant, indispensable genes associated with the well characterised shikimate pathway, and also genes of unknown function specific to C. jejuni, which are all potential novel Campylobacter intervention targets.
Astromaterials Acquisition and Curation Office (KT) Overview
NASA Technical Reports Server (NTRS)
Allen, Carlton
2014-01-01
The Astromaterials Acquisition and Curation Office has the unique responsibility to curate NASA's extraterrestrial samples - from past and forthcoming missions - into the indefinite future. Currently, curation includes documentation, preservation, physical security, preparation, and distribution of samples from the Moon, asteroids, comets, the solar wind, and the planet Mars. Each of these sample sets has a unique history and comes from a unique environment. The curation laboratories and procedures developed over 40 years have proven both necessary and sufficient to serve the evolving needs of a worldwide research community. A new generation of sample return missions to destinations across the solar system is being planned and proposed. The curators are developing the tools and techniques to meet the challenges of these new samples. Extraterrestrial samples pose unique curation requirements. These samples were formed and exist under conditions strikingly different from those on the Earth's surface. Terrestrial contamination would destroy much of the scientific significance of extraterrestrial materials. To preserve the research value of these precious samples, contamination must be minimized, understood, and documented. In addition, the samples must be preserved - as far as possible - from physical and chemical alteration. The elaborate curation facilities at JSC were designed and constructed, and have been operated for many years, to keep sample contamination and alteration to a minimum. Currently, JSC curates seven collections of extraterrestrial samples: (a)) Lunar rocks and soils collected by the Apollo astronauts, (b) Meteorites collected on dedicated expeditions to Antarctica, (c) Cosmic dust collected by high-altitude NASA aircraft,t (d) Solar wind atoms collected by the Genesis spacecraft, (e) Comet particles collected by the Stardust spacecraft, (f) Interstellar dust particles collected by the Stardust spacecraft, and (g) Asteroid soil particles collected by the Japan Aerospace Exploration Agency (JAXA) Hayabusa spacecraft Each of these sample sets has a unique history and comes from a unique environment. We have developed specialized laboratories and practices over many years to preserve and protect the samples, not only for current research but for studies that may be carried out in the indefinite future.
Mapping transcription factor interactome networks using HaloTag protein arrays.
Yazaki, Junshi; Galli, Mary; Kim, Alice Y; Nito, Kazumasa; Aleman, Fernando; Chang, Katherine N; Carvunis, Anne-Ruxandra; Quan, Rosa; Nguyen, Hien; Song, Liang; Alvarez, José M; Huang, Shao-Shan Carol; Chen, Huaming; Ramachandran, Niroshan; Altmann, Stefan; Gutiérrez, Rodrigo A; Hill, David E; Schroeder, Julian I; Chory, Joanne; LaBaer, Joshua; Vidal, Marc; Braun, Pascal; Ecker, Joseph R
2016-07-19
Protein microarrays enable investigation of diverse biochemical properties for thousands of proteins in a single experiment, an unparalleled capacity. Using a high-density system called HaloTag nucleic acid programmable protein array (HaloTag-NAPPA), we created high-density protein arrays comprising 12,000 Arabidopsis ORFs. We used these arrays to query protein-protein interactions for a set of 38 transcription factors and transcriptional regulators (TFs) that function in diverse plant hormone regulatory pathways. The resulting transcription factor interactome network, TF-NAPPA, contains thousands of novel interactions. Validation in a benchmarked in vitro pull-down assay revealed that a random subset of TF-NAPPA validated at the same rate of 64% as a positive reference set of literature-curated interactions. Moreover, using a bimolecular fluorescence complementation (BiFC) assay, we confirmed in planta several interactions of biological interest and determined the interaction localizations for seven pairs. The application of HaloTag-NAPPA technology to plant hormone signaling pathways allowed the identification of many novel transcription factor-protein interactions and led to the development of a proteome-wide plant hormone TF interactome network.
The Comparative Toxicogenomics Database: update 2017.
Davis, Allan Peter; Grondin, Cynthia J; Johnson, Robin J; Sciaky, Daniela; King, Benjamin L; McMorran, Roy; Wiegers, Jolene; Wiegers, Thomas C; Mattingly, Carolyn J
2017-01-04
The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) provides information about interactions between chemicals and gene products, and their relationships to diseases. Core CTD content (chemical-gene, chemical-disease and gene-disease interactions manually curated from the literature) are integrated with each other as well as with select external datasets to generate expanded networks and predict novel associations. Today, core CTD includes more than 30.5 million toxicogenomic connections relating chemicals/drugs, genes/proteins, diseases, taxa, Gene Ontology (GO) annotations, pathways, and gene interaction modules. In this update, we report a 33% increase in our core data content since 2015, describe our new exposure module (that harmonizes exposure science information with core toxicogenomic data) and introduce a novel dataset of GO-disease inferences (that identify common molecular underpinnings for seemingly unrelated pathologies). These advancements centralize and contextualize real-world chemical exposures with molecular pathways to help scientists generate testable hypotheses in an effort to understand the etiology and mechanisms underlying environmentally influenced diseases. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Lakoski, Susan; Mackey, John R.; Douglas, Pamela S.; Haykowsky, Mark J.; Jones, Lee W.
2013-01-01
Molecularly targeted therapeutics (MTT) are the future of cancer systemic therapy. They have already moved from palliative therapy for advanced solid malignancies into the setting of curative-intent treatment for early-stage disease. Cardiotoxicity is a frequent and potentially serious adverse complication of some targeted therapies, leading to a broad range of potentially life-threatening complications, therapy discontinuation, and poor quality of life. Low-cost pleiotropic interventions are therefore urgently required to effectively prevent and/or treat MTT-induced cardiotoxicity. Aerobic exercise therapy has the unique capacity to modulate, without toxicity, multiple gene expression pathways in several organ systems, including a plethora of cardiac-specific molecular and cell-signaling pathways implicated in MTT-induced cardiac toxicity. In this review, we examine the molecular signaling of antiangiogenic and HER2-directed therapies that may underpin cardiac toxicity and the hypothesized molecular mechanisms underlying the cardioprotective properties of aerobic exercise. It is hoped that this knowledge can be used to maximize the benefits of small molecule inhibitors, while minimizing cardiac damage in patients with solid malignancies. PMID:23335619
Functional Interaction Network Construction and Analysis for Disease Discovery.
Wu, Guanming; Haw, Robin
2017-01-01
Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.
BUGLIONI, ALESSIA; BURNETT, JOHN C.
2014-01-01
Heart failure (HF) is a syndrome characterized by a complex pathophysiology which involves multiple organ systems, with the kidney playing a major role. HF can present with reduced ejection fraction (EF), HFrEF, or with preserved EF (HFpEF). The interplay between diverse organ systems contributing to HF is mediated by the activation of counteracting neurohormonal pathways focused to re-establishing hemodynamic homeostasis. During early stages of HF, these biochemical signals, consisting mostly of hormones and neurotransmitters secreted by a variety of cell types, are compensatory and the patient is asymptomatic. However, with disease progression, the attempt to reverse or delay cardiac dysfunction is deleterious, leading to multi-organ congestion, fibrosis and decompensation and finally symptomatic HF. In conclusion, these neurohormonal pathways mediate the evolution of HF and have become a way to monitor HF. Specifically, these mediators have become important in the diagnosis and prognosis of this highly fatal cardiovascular disease. Finally, while these multiple neurohumoral factors serve as important HF biomarkers, they can also be targeted for more effective and curative HF treatments. PMID:25445413
Ghosh, Sujoy; Vivar, Juan; Nelson, Christopher P; Willenborg, Christina; Segrè, Ayellet V; Mäkinen, Ville-Petteri; Nikpay, Majid; Erdmann, Jeannette; Blankenberg, Stefan; O'Donnell, Christopher; März, Winfried; Laaksonen, Reijo; Stewart, Alexandre F R; Epstein, Stephen E; Shah, Svati H; Granger, Christopher B; Hazen, Stanley L; Kathiresan, Sekar; Reilly, Muredach P; Yang, Xia; Quertermous, Thomas; Samani, Nilesh J; Schunkert, Heribert; Assimes, Themistocles L; McPherson, Ruth
2015-07-01
Genome-wide association studies have identified multiple genetic variants affecting the risk of coronary artery disease (CAD). However, individually these explain only a small fraction of the heritability of CAD and for most, the causal biological mechanisms remain unclear. We sought to obtain further insights into potential causal processes of CAD by integrating large-scale GWA data with expertly curated databases of core human pathways and functional networks. Using pathways (gene sets) from Reactome, we carried out a 2-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CAD genome-wide association study data sets (9889 cases/11 089 controls), nominally significant gene sets were tested for replication in a meta-analysis of 9 additional studies (15 502 cases/55 730 controls) from the Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) Consortium. A total of 32 of 639 Reactome pathways tested showed convincing association with CAD (replication P<0.05). These pathways resided in 9 of 21 core biological processes represented in Reactome, and included pathways relevant to extracellular matrix (ECM) integrity, innate immunity, axon guidance, and signaling by PDRF (platelet-derived growth factor), NOTCH, and the transforming growth factor-β/SMAD receptor complex. Many of these pathways had strengths of association comparable to those observed in lipid transport pathways. Network analysis of unique genes within the replicated pathways further revealed several interconnected functional and topologically interacting modules representing novel associations (eg, semaphoring-regulated axonal guidance pathway) besides confirming known processes (lipid metabolism). The connectivity in the observed networks was statistically significant compared with random networks (P<0.001). Network centrality analysis (degree and betweenness) further identified genes (eg, NCAM1, FYN, FURIN, etc) likely to play critical roles in the maintenance and functioning of several of the replicated pathways. These findings provide novel insights into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD. © 2015 American Heart Association, Inc.
1-CMDb: A Curated Database of Genomic Variations of the One-Carbon Metabolism Pathway.
Bhat, Manoj K; Gadekar, Veerendra P; Jain, Aditya; Paul, Bobby; Rai, Padmalatha S; Satyamoorthy, Kapaettu
2017-01-01
The one-carbon metabolism pathway is vital in maintaining tissue homeostasis by driving the critical reactions of folate and methionine cycles. A myriad of genetic and epigenetic events mark the rate of reactions in a tissue-specific manner. Integration of these to predict and provide personalized health management requires robust computational tools that can process multiomics data. The DNA sequences that may determine the chain of biological events and the endpoint reactions within one-carbon metabolism genes remain to be comprehensively recorded. Hence, we designed the one-carbon metabolism database (1-CMDb) as a platform to interrogate its association with a host of human disorders. DNA sequence and network information of a total of 48 genes were extracted from a literature survey and KEGG pathway that are involved in the one-carbon folate-mediated pathway. The information generated, collected, and compiled for all these genes from the UCSC genome browser included the single nucleotide polymorphisms (SNPs), CpGs, copy number variations (CNVs), and miRNAs, and a comprehensive database was created. Furthermore, a significant correlation analysis was performed for SNPs in the pathway genes. Detailed data of SNPs, CNVs, CpG islands, and miRNAs for 48 folate pathway genes were compiled. The SNPs in CNVs (9670), CpGs (984), and miRNAs (14) were also compiled for all pathway genes. The SIFT score, the prediction and PolyPhen score, as well as the prediction for each of the SNPs were tabulated and represented for folate pathway genes. Also included in the database for folate pathway genes were the links to 124 various phenotypes and disease associations as reported in the literature and from publicly available information. A comprehensive database was generated consisting of genomic elements within and among SNPs, CNVs, CpGs, and miRNAs of one-carbon metabolism pathways to facilitate (a) single source of information and (b) integration into large-genome scale network analysis to be developed in the future by the scientific community. The database can be accessed at http://slsdb.manipal.edu/ocm/. © 2017 S. Karger AG, Basel.
McKim, James M.; Hartung, Thomas; Kleensang, Andre; Sá-Rocha, Vanessa
2016-01-01
Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose–response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimension-ality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals’ potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced "false-negatives" (i.e. extreme sensitizers as non-sensitizer) on all data sets. PMID:26046447
Luechtefeld, Thomas; Maertens, Alexandra; McKim, James M; Hartung, Thomas; Kleensang, Andre; Sá-Rocha, Vanessa
2015-11-01
Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose-response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimensionality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals' potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced " false-negatives" (i.e. extreme sensitizers as non-sensitizer) on all data sets. Copyright © 2015 John Wiley & Sons, Ltd.
Exploring human disease using the Rat Genome Database.
Shimoyama, Mary; Laulederkind, Stanley J F; De Pons, Jeff; Nigam, Rajni; Smith, Jennifer R; Tutaj, Marek; Petri, Victoria; Hayman, G Thomas; Wang, Shur-Jen; Ghiasvand, Omid; Thota, Jyothi; Dwinell, Melinda R
2016-10-01
Rattus norvegicus, the laboratory rat, has been a crucial model for studies of the environmental and genetic factors associated with human diseases for over 150 years. It is the primary model organism for toxicology and pharmacology studies, and has features that make it the model of choice in many complex-disease studies. Since 1999, the Rat Genome Database (RGD; http://rgd.mcw.edu) has been the premier resource for genomic, genetic, phenotype and strain data for the laboratory rat. The primary role of RGD is to curate rat data and validate orthologous relationships with human and mouse genes, and make these data available for incorporation into other major databases such as NCBI, Ensembl and UniProt. RGD also provides official nomenclature for rat genes, quantitative trait loci, strains and genetic markers, as well as unique identifiers. The RGD team adds enormous value to these basic data elements through functional and disease annotations, the analysis and visual presentation of pathways, and the integration of phenotype measurement data for strains used as disease models. Because much of the rat research community focuses on understanding human diseases, RGD provides a number of datasets and software tools that allow users to easily explore and make disease-related connections among these datasets. RGD also provides comprehensive human and mouse data for comparative purposes, illustrating the value of the rat in translational research. This article introduces RGD and its suite of tools and datasets to researchers - within and beyond the rat community - who are particularly interested in leveraging rat-based insights to understand human diseases. © 2016. Published by The Company of Biologists Ltd.
Lobach, Iryna; Fan, Ruzong; Manga, Prashiela
A central problem in genetic epidemiology is to identify and rank genetic markers involved in a disease. Complex diseases, such as cancer, hypertension, diabetes, are thought to be caused by an interaction of a panel of genetic factors, that can be identified by markers, which modulate environmental factors. Moreover, the effect of each genetic marker may be small. Hence, the association signal may be missed unless a large sample is considered, or a priori biomedical data are used. Recent advances generated a vast variety of a priori information, including linkage maps and information about gene regulatory dependence assembled into curated pathway databases. We propose a genotype-based approach that takes into account linkage disequilibrium (LD) information between genetic markers that are in moderate LD while modeling gene-gene and gene-environment interactions. A major advantage of our method is that the observed genetic information enters a model directly thus eliminating the need to estimate haplotype-phase. Our approach results in an algorithm that is inexpensive computationally and does not suffer from bias induced by haplotype-phase ambiguity. We investigated our model in a series of simulation experiments and demonstrated that the proposed approach results in estimates that are nearly unbiased and have small variability. We applied our method to the analysis of data from a melanoma case-control study and investigated interaction between a set of pigmentation genes and environmental factors defined by age and gender. Furthermore, an application of our method is demonstrated using a study of Alcohol Dependence.
Molecular Signature for Lymphatic Invasion Associated with Survival of Epithelial Ovarian Cancer.
Paik, E Sun; Choi, Hyun Jin; Kim, Tae-Joong; Lee, Jeong-Won; Kim, Byoung-Gie; Bae, Duk-Soo; Choi, Chel Hun
2018-04-01
We aimed to develop molecular classifier that can predict lymphatic invasion and their clinical significance in epithelial ovarian cancer (EOC) patients. We analyzed gene expression (mRNA, methylated DNA) in data from The Cancer Genome Atlas. To identify molecular signatures for lymphatic invasion, we found differentially expressed genes. The performance of classifier was validated by receiver operating characteristics analysis, logistic regression, linear discriminant analysis (LDA), and support vector machine (SVM). We assessed prognostic role of classifier using random survival forest (RSF) model and pathway deregulation score (PDS). For external validation,we analyzed microarray data from 26 EOC samples of Samsung Medical Center and curatedOvarianData database. We identified 21 mRNAs, and seven methylated DNAs from primary EOC tissues that predicted lymphatic invasion and created prognostic models. The classifier predicted lymphatic invasion well, which was validated by logistic regression, LDA, and SVM algorithm (C-index of 0.90, 0.71, and 0.74 for mRNA and C-index of 0.64, 0.68, and 0.69 for DNA methylation). Using RSF model, incorporating molecular data with clinical variables improved prediction of progression-free survival compared with using only clinical variables (p < 0.001 and p=0.008). Similarly, PDS enabled us to classify patients into high-risk and low-risk group, which resulted in survival difference in mRNA profiles (log-rank p-value=0.011). In external validation, gene signature was well correlated with prediction of lymphatic invasion and patients' survival. Molecular signature model predicting lymphatic invasion was well performed and also associated with survival of EOC patients.
NASA Technical Reports Server (NTRS)
Calaway, Michael J.; Allen, Carlton C.; Allton, Judith H.
2014-01-01
Future robotic and human spaceflight missions to the Moon, Mars, asteroids, and comets will require curating astromaterial samples with minimal inorganic and organic contamination to preserve the scientific integrity of each sample. 21st century sample return missions will focus on strict protocols for reducing organic contamination that have not been seen since the Apollo manned lunar landing program. To properly curate these materials, the Astromaterials Acquisition and Curation Office under the Astromaterial Research and Exploration Science Directorate at NASA Johnson Space Center houses and protects all extraterrestrial materials brought back to Earth that are controlled by the United States government. During fiscal year 2012, we conducted a year-long project to compile historical documentation and laboratory tests involving organic investigations at these facilities. In addition, we developed a plan to determine the current state of organic cleanliness in curation laboratories housing astromaterials. This was accomplished by focusing on current procedures and protocols for cleaning, sample handling, and storage. While the intention of this report is to give a comprehensive overview of the current state of organic cleanliness in JSC curation laboratories, it also provides a baseline for determining whether our cleaning procedures and sample handling protocols need to be adapted and/or augmented to meet the new requirements for future human spaceflight and robotic sample return missions.
Sambourg, Laure; Thierry-Mieg, Nicolas
2010-12-21
As protein interactions mediate most cellular mechanisms, protein-protein interaction networks are essential in the study of cellular processes. Consequently, several large-scale interactome mapping projects have been undertaken, and protein-protein interactions are being distilled into databases through literature curation; yet protein-protein interaction data are still far from comprehensive, even in the model organism Saccharomyces cerevisiae. Estimating the interactome size is important for evaluating the completeness of current datasets, in order to measure the remaining efforts that are required. We examined the yeast interactome from a new perspective, by taking into account how thoroughly proteins have been studied. We discovered that the set of literature-curated protein-protein interactions is qualitatively different when restricted to proteins that have received extensive attention from the scientific community. In particular, these interactions are less often supported by yeast two-hybrid, and more often by more complex experiments such as biochemical activity assays. Our analysis showed that high-throughput and literature-curated interactome datasets are more correlated than commonly assumed, but that this bias can be corrected for by focusing on well-studied proteins. We thus propose a simple and reliable method to estimate the size of an interactome, combining literature-curated data involving well-studied proteins with high-throughput data. It yields an estimate of at least 37, 600 direct physical protein-protein interactions in S. cerevisiae. Our method leads to higher and more accurate estimates of the interactome size, as it accounts for interactions that are genuine yet difficult to detect with commonly-used experimental assays. This shows that we are even further from completing the yeast interactome map than previously expected.
A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework.
Bandrowski, A E; Cachat, J; Li, Y; Müller, H M; Sternberg, P W; Ciccarese, P; Clark, T; Marenco, L; Wang, R; Astakhov, V; Grethe, J S; Martone, M E
2012-01-01
The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is 'hidden' from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to reduce the workload of the curators, it has resulted in valuable analytic by-products that address accessibility, use and citation of resources that can now be shared with resource owners and the larger scientific community. DATABASE URL: http://neuinfo.org.
A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework
Bandrowski, A. E.; Cachat, J.; Li, Y.; Müller, H. M.; Sternberg, P. W.; Ciccarese, P.; Clark, T.; Marenco, L.; Wang, R.; Astakhov, V.; Grethe, J. S.; Martone, M. E.
2012-01-01
The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is ‘hidden’ from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to reduce the workload of the curators, it has resulted in valuable analytic by-products that address accessibility, use and citation of resources that can now be shared with resource owners and the larger scientific community. Database URL: http://neuinfo.org PMID:22434839
Körner, Philipp; Ehrmann, Katja; Hartmannsgruber, Johann; Metz, Michaela; Steigerwald, Sabrina; Flentje, Michael; van Oorschot, Birgitt
2017-07-01
The benefits of patient-reported symptom assessment combined with integrated palliative care are well documented. This study assessed the symptom burden of palliative and curative-intent radiation oncology patients. Prior to first consultation and at the end of RT, all adult cancer patients planned to receive fractionated percutaneous radiotherapy (RT) were asked to answer the Edmonton Symptom Assessment Scale (ESAS; nine symptoms from 0 = no symptoms to 10 = worst possible symptoms). Mean values were used for curative vs. palliative and pre-post comparisons, and the clinical relevance was evaluated (symptom values ≥ 4). Of 163 participating patients, 151 patients (90.9%) completed both surveys (116 curative and 35 palliative patients). Before beginning RT, 88.6% of palliative and 72.3% of curative patients showed at least one clinically relevant symptom. Curative patients most frequently named decreased general wellbeing (38.6%), followed by tiredness (35.0%), anxiety (32.4%), depression (30.0%), pain (26.3%), lack of appetite (23.5%), dyspnea (17.8%), drowsiness (8.0%) and nausea (6.1%). Palliative patients most frequently named decreased general wellbeing (62.8%), followed by pain (62.8%), tiredness (60.0%), lack of appetite (40.0%), anxiety (38.0%), depression (33.3%), dyspnea (28.5%), drowsiness (25.7%) and nausea (14.2%). At the end of RT, the proportion of curative and palliative patients with a clinically relevant symptom had increased significantly to 79.8 and 91.4%, respectively; whereas the proportion of patients reporting clinically relevant pain had decreased significantly (42.8 vs. 62.8%, respectively). Palliative patients had significantly increased tiredness. Curative patients reported significant increases in pain, tiredness, nausea, drowsiness, lack of appetite and restrictions in general wellbeing. Assessment of patient-reported symptoms was successfully realized in radiation oncology routine. Overall, both groups showed a high symptom burden. The results prove the need of systematic symptom assessment and programs for early integrated supportive and palliative care in radiation oncology.
Martín-Navarro, Antonio; Gaudioso-Simón, Andrés; Álvarez-Jarreta, Jorge; Montoya, Julio; Mayordomo, Elvira; Ruiz-Pesini, Eduardo
2017-03-07
Several methods have been developed to predict the pathogenicity of missense mutations but none has been specifically designed for classification of variants in mtDNA-encoded polypeptides. Moreover, there is not available curated dataset of neutral and damaging mtDNA missense variants to test the accuracy of predictors. Because mtDNA sequencing of patients suffering mitochondrial diseases is revealing many missense mutations, it is needed to prioritize candidate substitutions for further confirmation. Predictors can be useful as screening tools but their performance must be improved. We have developed a SVM classifier (Mitoclass.1) specific for mtDNA missense variants. Training and validation of the model was executed with 2,835 mtDNA damaging and neutral amino acid substitutions, previously curated by a set of rigorous pathogenicity criteria with high specificity. Each instance is described by a set of three attributes based on evolutionary conservation in Eukaryota of wildtype and mutant amino acids as well as coevolution and a novel evolutionary analysis of specific substitutions belonging to the same domain of mitochondrial polypeptides. Our classifier has performed better than other web-available tested predictors. We checked performance of three broadly used predictors with the total mutations of our curated dataset. PolyPhen-2 showed the best results for a screening proposal with a good sensitivity. Nevertheless, the number of false positive predictions was too high. Our method has an improved sensitivity and better specificity in relation to PolyPhen-2. We also publish predictions for the complete set of 24,201 possible missense variants in the 13 human mtDNA-encoded polypeptides. Mitoclass.1 allows a better selection of candidate damaging missense variants from mtDNA. A careful search of discriminatory attributes and a training step based on a curated dataset of amino acid substitutions belonging exclusively to human mtDNA genes allows an improved performance. Mitoclass.1 accuracy could be improved in the future when more mtDNA missense substitutions will be available for updating the attributes and retraining the model.
Argo: enabling the development of bespoke workflows and services for disease annotation.
Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sophia
2016-01-01
Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest.With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources.This article presents the application of Argo's capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V's User Interactive Track (IAT), we demonstrated and evaluated Argo's suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track's top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo's support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models, through to user-interactive ones which allow human curators to manually provide their corrections to automatically generated annotations. Our participation in the BioCreative V challenges shows Argo's potential as an enabling technology for curating disease and phenotypic information from literature.Database URL: http://argo.nactem.ac.uk. © The Author(s) 2016. Published by Oxford University Press.
Argo: enabling the development of bespoke workflows and services for disease annotation
Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sophia
2016-01-01
Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest. With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources. This article presents the application of Argo’s capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V’s User Interactive Track (IAT), we demonstrated and evaluated Argo’s suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track’s top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo’s support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models, through to user-interactive ones which allow human curators to manually provide their corrections to automatically generated annotations. Our participation in the BioCreative V challenges shows Argo’s potential as an enabling technology for curating disease and phenotypic information from literature. Database URL: http://argo.nactem.ac.uk PMID:27189607
Writing Commons: A Model for the Creation, Usability, and Evaluation of OERs
ERIC Educational Resources Information Center
Herron, Josh
2016-01-01
As Open Educational Resources (OER) increasingly receive attention from academics, educational foundations, and government agencies, exemplars will emerge that lower student textbook costs by moving away from commercial publishers through self-publishing or curating web-based resources. Joe Moxley's "Writing Commons" serves as a scaled…
Comparing Health Education Approaches in Textbooks of Sixteen Countries
ERIC Educational Resources Information Center
Carvalho, Graca S.; Dantas, Catarina; Rauma, Anna-Liisa; Luzi, Daniela; Ruggieri, Roberta; Bogner, Franz; Geier, Christine; Caussidier, Claude; Berger, Dominique; Clement, Pierre
2008-01-01
Classically, health education has provided mainly factual knowledge about diseases and their prevention. This educational approach is within the so called Biomedical Model (BM). It is based on pathologic (Pa), curative (Cu) and preventive (Pr) conceptions of health. In contrast, the Health Promotion (HP) approach of health education intends to…
USDA-ARS?s Scientific Manuscript database
Toxoplasma gondii, the most common parasitic infection of the human brain and eye, persists across lifetimes, can progressively damage sight, and is currently incurable. New, curative medicines are needed urgently. Herein, we developed novel models to facilitate drug development: EGS strain T. gondi...
Data Provenance Architecture for the Geosciences
NASA Astrophysics Data System (ADS)
Murphy, F.; Irving, D. H.
2012-12-01
The pace at which geoscientific insights inform societal development quickens with time and these insights drive decisions and actions of ever-increasing human and economic significance. Until recently academic, commercial and government bodies have maintained distinct bodies of knowledge to support scientific enquiry as well as societal development. However, it has become clear that the curation of the body of data is an activity of equal or higher social and commercial value. We address the community challenges in the curation of, access to, and analysis of scientific data including: the tensions between creators, providers and users; incentives and barriers to sharing; ownership and crediting. We also discuss the technical and financial challenges in maximising the return on the effort made in generating geoscientific data. To illustrate how these challenges might be addressed in the broader geoscientific domain, we describe the high-level data governance and analytical architecture in the upstream Oil Industry. This domain is heavily dependent on costly and highly diverse geodatasets collected and assimilated over timeframes varying from seconds to decades. These data must support both operational decisions at the minute-hour timefame and strategic and economic decisions of enterprise or national scale, and yet be sufficiently robust to last the life of a producing field. We develop three themes around data provenance, data ownership and business models for data curation. 1/ The overarching aspiration is to ensure that data provenance and quality is maintained along the analytical workflow. Hence if data on which a publication or report changes, the report and its publishers can be notified and we describe a mechanism by which dependent knowledge products can be flagged. 2/ From a cost and management point of view we look at who "owns" data especially in cases where the cost of curation and stewardship is significant compared to the cost of acquiring the data in the first place. Analytical value can be placed on data and this can govern the mode of custodianship. 3/ The broader scientific domain requires the development of new business models, whereby the need to re-examine how scientific credit and reputation are built and assessed, the current management of the scientific canon still refers back to the paper system and an appraisal of how we curate layered/rich/dynamic scientific content is timely. Using our review of the upstream Oil and Gas industry, we expand our ideas to the interplay between government, academic, private and public geo- and other datasets. Whilst there is no simple answer, much work has already been achieved around standardised data exchange formats, metadata and searchability via semantic frameworks, and cost/charge/licensing models. The key observation from the Oil industry is that it is the practicalities of data management issues that are driving change rather than any commercial/philosophical agenda - driven in particular by the custodians, managers and architects of the data rather than the users or the owners.
Sen Sarma, Moushumi; Arcoleo, David; Khetani, Radhika S; Chee, Brant; Ling, Xu; He, Xin; Jiang, Jing; Mei, Qiaozhu; Zhai, ChengXiang; Schatz, Bruce
2011-07-01
With the rapid decrease in cost of genome sequencing, the classification of gene function is becoming a primary problem. Such classification has been performed by human curators who read biological literature to extract evidence. BeeSpace Navigator is a prototype software for exploratory analysis of gene function using biological literature. The software supports an automatic analogue of the curator process to extract functions, with a simple interface intended for all biologists. Since extraction is done on selected collections that are semantically indexed into conceptual spaces, the curation can be task specific. Biological literature containing references to gene lists from expression experiments can be analyzed to extract concepts that are computational equivalents of a classification such as Gene Ontology, yielding discriminating concepts that differentiate gene mentions from other mentions. The functions of individual genes can be summarized from sentences in biological literature, to produce results resembling a model organism database entry that is automatically computed. Statistical frequency analysis based on literature phrase extraction generates offline semantic indexes to support these gene function services. The website with BeeSpace Navigator is free and open to all; there is no login requirement at www.beespace.illinois.edu for version 4. Materials from the 2010 BeeSpace Software Training Workshop are available at www.beespace.illinois.edu/bstwmaterials.php.
Morshed, Nader; Echols, Nathaniel; Adams, Paul D.
2015-04-25
In the process of macromolecular model building, crystallographers must examine electron density for isolated atoms and differentiate sites containing structured solvent molecules from those containing elemental ions. This task requires specific knowledge of metal-binding chemistry and scattering properties and is prone to error. A method has previously been described to identify ions based on manually chosen criteria for a number of elements. Here, the use of support vector machines (SVMs) to automatically classify isolated atoms as either solvent or one of various ions is described. Two data sets of protein crystal structures, one containing manually curated structures deposited with anomalousmore » diffraction data and another with automatically filtered, high-resolution structures, were constructed. On the manually curated data set, an SVM classifier was able to distinguish calcium from manganese, zinc, iron and nickel, as well as all five of these ions from water molecules, with a high degree of accuracy. Additionally, SVMs trained on the automatically curated set of high-resolution structures were able to successfully classify most common elemental ions in an independent validation test set. This method is readily extensible to other elemental ions and can also be used in conjunction with previous methods based on a priori expectations of the chemical environment and X-ray scattering.« less
Recovery, Transportation and Acceptance to the Curation Facility of the Hayabusa Re-Entry Capsule
NASA Technical Reports Server (NTRS)
Abe, M.; Fujimura, A.; Yano, H.; Okamoto, C.; Okada, T.; Yada, T.; Ishibashi, Y.; Shirai, K.; Nakamura, T.; Noguchi, T.;
2011-01-01
The "Hayabusa" re-entry capsule was safely carried into the clean room of Sagamihara Planetary Sample Curation Facility in JAXA on June 18, 2010. After executing computed tomographic (CT) scanning, removal of heat shield, and surface cleaning of sample container, the sample container was enclosed into the clean chamber. After opening the sample container and residual gas sampling in the clean chamber, optical observation, sample recovery, sample separation for initial analysis will be performed. This curation work is continuing for several manths with some selected member of Hayabusa Asteroidal Sample Preliminary Examination Team (HASPET). We report here on the 'Hayabusa' capsule recovery operation, and transportation and acceptance at the curation facility of the Hayabusa re-entry capsule.
Open semantic annotation of scientific publications using DOMEO.
Ciccarese, Paolo; Ocana, Marco; Clark, Tim
2012-04-24
Our group has developed a useful shared software framework for performing, versioning, sharing and viewing Web annotations of a number of kinds, using an open representation model. The Domeo Annotation Tool was developed in tandem with this open model, the Annotation Ontology (AO). Development of both the Annotation Framework and the open model was driven by requirements of several different types of alpha users, including bench scientists and biomedical curators from university research labs, online scientific communities, publishing and pharmaceutical companies.Several use cases were incrementally implemented by the toolkit. These use cases in biomedical communications include personal note-taking, group document annotation, semantic tagging, claim-evidence-context extraction, reagent tagging, and curation of textmining results from entity extraction algorithms. We report on the Domeo user interface here. Domeo has been deployed in beta release as part of the NIH Neuroscience Information Framework (NIF, http://www.neuinfo.org) and is scheduled for production deployment in the NIF's next full release.Future papers will describe other aspects of this work in detail, including Annotation Framework Services and components for integrating with external textmining services, such as the NCBO Annotator web service, and with other textmining applications using the Apache UIMA framework.
Open semantic annotation of scientific publications using DOMEO
2012-01-01
Background Our group has developed a useful shared software framework for performing, versioning, sharing and viewing Web annotations of a number of kinds, using an open representation model. Methods The Domeo Annotation Tool was developed in tandem with this open model, the Annotation Ontology (AO). Development of both the Annotation Framework and the open model was driven by requirements of several different types of alpha users, including bench scientists and biomedical curators from university research labs, online scientific communities, publishing and pharmaceutical companies. Several use cases were incrementally implemented by the toolkit. These use cases in biomedical communications include personal note-taking, group document annotation, semantic tagging, claim-evidence-context extraction, reagent tagging, and curation of textmining results from entity extraction algorithms. Results We report on the Domeo user interface here. Domeo has been deployed in beta release as part of the NIH Neuroscience Information Framework (NIF, http://www.neuinfo.org) and is scheduled for production deployment in the NIF’s next full release. Future papers will describe other aspects of this work in detail, including Annotation Framework Services and components for integrating with external textmining services, such as the NCBO Annotator web service, and with other textmining applications using the Apache UIMA framework. PMID:22541592
Zapata-Peñasco, Icoquih; Poot-Hernandez, Augusto Cesar; Eguiarte, Luis E
2017-01-01
Abstract The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large “omic” datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. PMID:29069412
Santos, André S; Ramos, Rommel T; Silva, Artur; Hirata, Raphael; Mattos-Guaraldi, Ana L; Meyer, Roberto; Azevedo, Vasco; Felicori, Liza; Pacheco, Luis G C
2018-05-11
Biochemical tests are traditionally used for bacterial identification at the species level in clinical microbiology laboratories. While biochemical profiles are generally efficient for the identification of the most important corynebacterial pathogen Corynebacterium diphtheriae, their ability to differentiate between biovars of this bacterium is still controversial. Besides, the unambiguous identification of emerging human pathogenic species of the genus Corynebacterium may be hampered by highly variable biochemical profiles commonly reported for these species, including Corynebacterium striatum, Corynebacterium amycolatum, Corynebacterium minutissimum, and Corynebacterium xerosis. In order to identify the genomic basis contributing for the biochemical variabilities observed in phenotypic identification methods of these bacteria, we combined a comprehensive literature review with a bioinformatics approach based on reconstruction of six specific biochemical reactions/pathways in 33 recently released whole genome sequences. We used data retrieved from curated databases (MetaCyc, PathoSystems Resource Integration Center (PATRIC), The SEED, TransportDB, UniProtKB) associated with homology searches by BLAST and profile Hidden Markov Models (HMMs) to detect enzymes participating in the various pathways and performed ab initio protein structure modeling and molecular docking to confirm specific results. We found a differential distribution among the various strains of genes that code for some important enzymes, such as beta-phosphoglucomutase and fructokinase, and also for individual components of carbohydrate transport systems, including the fructose-specific phosphoenolpyruvate-dependent sugar phosphotransferase (PTS) and the ribose-specific ATP-binging cassette (ABC) transporter. Horizontal gene transfer plays a role in the biochemical variability of the isolates, as some genes needed for sucrose fermentation were seen to be present in genomic islands. Noteworthy, using profile HMMs, we identified an enzyme with putative alpha-1,6-glycosidase activity only in some specific strains of C. diphtheriae and this may aid to understanding of the differential abilities to utilize glycogen and starch between the biovars.
De Anda, Valerie; Zapata-Peñasco, Icoquih; Poot-Hernandez, Augusto Cesar; Eguiarte, Luis E; Contreras-Moreira, Bruno; Souza, Valeria
2017-11-01
The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large "omic" datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. © The Author 2017. Published by Oxford University Press.
Spetz, Johan; Rudqvist, Nils; Langen, Britta; Parris, Toshima Z; Dalmo, Johanna; Schüler, Emil; Wängberg, Bo; Nilsson, Ola; Helou, Khalil; Forssell-Aronsson, Eva
2018-05-01
Patients with neuroendocrine tumors expressing somatostatin receptors are often treated with 177 Lu[Lu]-octreotate. Despite being highly effective in animal models, 177 Lu[Lu]-octreotate-based therapies in the clinical setting can be optimized further. The aims of the study were to identify and elucidate possible optimization venues for 177 Lu[Lu]-octreotate tumor therapy by characterizing transcriptional responses in the GOT1 small intestine neuroendocrine tumor model in nude mice. GOT1-bearing female BALB/c nude mice were intravenously injected with 15 MBq 177 Lu[Lu]-octreotate (non-curative amount) or mock-treated with saline solution. Animals were killed 1, 3, 7 or 41 d after injection. Total RNA was extracted from the tumor samples and profiled using Illumina microarray expression analysis. Differentially expressed genes were identified (treated vs. control) and pathway analysis was performed. Distribution of differentially expressed transcripts indicated a time-dependent treatment response in GOT1 tumors after 177 Lu[Lu]-octreotate administration. Regulation of CDKN1A, BCAT1 and PAM at 1 d after injection was compatible with growth arrest as the initial response to treatment. Upregulation of APOE and BAX at 3 d, and ADORA2A, BNIP3, BNIP3L and HSPB1 at 41 d after injection suggests first activation and then inhibition of the intrinsic apoptotic pathway during tumor regression and regrowth, respectively. Transcriptional analysis showed radiation-induced apoptosis as an early response after 177 Lu[Lu]-octreotate administration, followed by pro-survival transcriptional changes in the tumor during the regrowth phase. Time-dependent changes in cell cycle and apoptosis-related processes suggest different time points after radionuclide therapy when tumor cells may be more susceptible to additional treatment, highlighting the importance of timing when administering multiple therapeutic agents. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Worldwide Report, Epidemiology
1985-10-09
would be. In the prevailing climate at least one could imagine a rush for the test by individuals who for reasons that are completely unreasonable...standard model for a control program. Each program must be adapted to the local epidemiological characteristics and the needs and possibilities of the...8217best’ in medical care. The result is an urban-biased, hospital- oriented, curative-care model . The Government of Pakistan has also enhanced his urban
Levering, Jennifer; Fiedler, Tomas; Sieg, Antje; van Grinsven, Koen W A; Hering, Silvio; Veith, Nadine; Olivier, Brett G; Klett, Lara; Hugenholtz, Jeroen; Teusink, Bas; Kreikemeyer, Bernd; Kummer, Ursula
2016-08-20
Genome-scale metabolic models comprise stoichiometric relations between metabolites, as well as associations between genes and metabolic reactions and facilitate the analysis of metabolism. We computationally reconstructed the metabolic network of the lactic acid bacterium Streptococcus pyogenes M49. Initially, we based the reconstruction on genome annotations and already existing and curated metabolic networks of Bacillus subtilis, Escherichia coli, Lactobacillus plantarum and Lactococcus lactis. This initial draft was manually curated with the final reconstruction accounting for 480 genes associated with 576 reactions and 558 metabolites. In order to constrain the model further, we performed growth experiments of wild type and arcA deletion strains of S. pyogenes M49 in a chemically defined medium and calculated nutrient uptake and production fluxes. We additionally performed amino acid auxotrophy experiments to test the consistency of the model. The established genome-scale model can be used to understand the growth requirements of the human pathogen S. pyogenes and define optimal and suboptimal conditions, but also to describe differences and similarities between S. pyogenes and related lactic acid bacteria such as L. lactis in order to find strategies to reduce the growth of the pathogen and propose drug targets. Copyright © 2016 Elsevier B.V. All rights reserved.
Ezra Tsur, Elishai
2017-01-01
Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.
New imidazolidinedione derivatives as antimalarial agents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Liang; Sathunuru, Ramadas; Luong, ThuLan
2012-04-30
A series of new N-alky- and N-alkoxy-imidazolidinediones was prepared and assessed for prophylactic and radical curative activities in mouse and Rhesus monkey models. New compounds are generally metabolically stable, weakly active in vitro against Plasmodium falciparum clones (D6 and W2) and in mice infected with Plasmodium berghei sporozoites. Representative compounds 8e and 9c showed good causal prophylactic activity in Rhesus monkeys dosed 30 mg/kg/day for 3 consecutive days by IM, delayed patency for 19-21 days and 54-86 days, respectively, as compared to the untreated control. By oral, 9c showed only marginal activity in causal prophylactic and radical curative tests atmore » 50 mg/kg/day x 3 and 30 mg/kg/day x 7 plus chloroquine 10 mg/kg for 7 days, respectively.« less
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task
Arighi, Cecilia N.; Carterette, Ben; Cohen, K. Bretonnel; Krallinger, Martin; Wilbur, W. John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E.; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L.; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P.; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O.; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy
2013-01-01
In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators’ overall experience of a system, regardless of the system’s high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV. PMID:23327936