Science.gov

Sample records for manually curated database

  1. NSDNA: a manually curated database of experimentally supported ncRNAs associated with nervous system diseases

    PubMed Central

    Wang, Jianjian; Cao, Yuze; Zhang, Huixue; Wang, Tianfeng; Tian, Qinghua; Lu, Xiaoyu; Lu, Xiaoyan; Kong, Xiaotong; Liu, Zhaojun; Wang, Ning; Zhang, Shuai; Ma, Heping; Ning, Shangwei; Wang, Lihua

    2017-01-01

    The Nervous System Disease NcRNAome Atlas (NSDNA) (http://www.bio-bigdata.net/nsdna/) is a manually curated database that provides comprehensive experimentally supported associations about nervous system diseases (NSDs) and noncoding RNAs (ncRNAs). NSDs represent a common group of disorders, some of which are characterized by high morbidity and disabilities. The pathogenesis of NSDs at the molecular level remains poorly understood. ncRNAs are a large family of functionally important RNA molecules. Increasing evidence shows that diverse ncRNAs play a critical role in various NSDs. Mining and summarizing NSD–ncRNA association data can help researchers discover useful information. Hence, we developed an NSDNA database that documents 24 713 associations between 142 NSDs and 8593 ncRNAs in 11 species, curated from more than 1300 articles. This database provides a user-friendly interface for browsing and searching and allows for data downloading flexibility. In addition, NSDNA offers a submission page for researchers to submit novel NSD–ncRNA associations. It represents an extremely useful and valuable resource for researchers who seek to understand the functions and molecular mechanisms of ncRNA involved in NSDs. PMID:27899613

  2. NSDNA: a manually curated database of experimentally supported ncRNAs associated with nervous system diseases.

    PubMed

    Wang, Jianjian; Cao, Yuze; Zhang, Huixue; Wang, Tianfeng; Tian, Qinghua; Lu, Xiaoyu; Lu, Xiaoyan; Kong, Xiaotong; Liu, Zhaojun; Wang, Ning; Zhang, Shuai; Ma, Heping; Ning, Shangwei; Wang, Lihua

    2017-01-04

    The Nervous System Disease NcRNAome Atlas (NSDNA) (http://www.bio-bigdata.net/nsdna/) is a manually curated database that provides comprehensive experimentally supported associations about nervous system diseases (NSDs) and noncoding RNAs (ncRNAs). NSDs represent a common group of disorders, some of which are characterized by high morbidity and disabilities. The pathogenesis of NSDs at the molecular level remains poorly understood. ncRNAs are a large family of functionally important RNA molecules. Increasing evidence shows that diverse ncRNAs play a critical role in various NSDs. Mining and summarizing NSD-ncRNA association data can help researchers discover useful information. Hence, we developed an NSDNA database that documents 24 713 associations between 142 NSDs and 8593 ncRNAs in 11 species, curated from more than 1300 articles. This database provides a user-friendly interface for browsing and searching and allows for data downloading flexibility. In addition, NSDNA offers a submission page for researchers to submit novel NSD-ncRNA associations. It represents an extremely useful and valuable resource for researchers who seek to understand the functions and molecular mechanisms of ncRNA involved in NSDs.

  3. CyanoBase and RhizoBase: databases of manually curated annotations for cyanobacterial and rhizobial genomes.

    PubMed

    Fujisawa, Takatomo; Okamoto, Shinobu; Katayama, Toshiaki; Nakao, Mitsuteru; Yoshimura, Hidehisa; Kajiya-Kanegae, Hiromi; Yamamoto, Sumiko; Yano, Chiyoko; Yanaka, Yuka; Maita, Hiroko; Kaneko, Takakazu; Tabata, Satoshi; Nakamura, Yasukazu

    2014-01-01

    To understand newly sequenced genomes of closely related species, comprehensively curated reference genome databases are becoming increasingly important. We have extended CyanoBase (http://genome.microbedb.jp/cyanobase), a genome database for cyanobacteria, and newly developed RhizoBase (http://genome.microbedb.jp/rhizobase), a genome database for rhizobia, nitrogen-fixing bacteria associated with leguminous plants. Both databases focus on the representation and reusability of reference genome annotations, which are continuously updated by manual curation. Domain experts have extracted names, products and functions of each gene reported in the literature. To ensure effectiveness of this procedure, we developed the TogoAnnotation system offering a web-based user interface and a uniform storage of annotations for the curators of the CyanoBase and RhizoBase databases. The number of references investigated for CyanoBase increased from 2260 in our previous report to 5285, and for RhizoBase, we perused 1216 references. The results of these intensive annotations are displayed on the GeneView pages of each database. Advanced users can also retrieve this information through the representational state transfer-based web application programming interface in an automated manner.

  4. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs

    PubMed Central

    Wang, Peng; Zhi, Hui; Zhang, Yunpeng; Liu, Yue; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Ning, Shangwei; Li, Xia

    2015-01-01

    In this study, we describe miRSponge, a manually curated database, which aims at providing an experimentally supported resource for microRNA (miRNA) sponges. Recent evidence suggests that miRNAs are themselves regulated by competing endogenous RNAs (ceRNAs) or ‘miRNA sponges’ that contain miRNA binding sites. These competitive molecules can sequester miRNAs to prevent them interacting with their natural targets to play critical roles in various biological and pathological processes. It has become increasingly important to develop a high quality database to record and store ceRNA data to support future studies. To this end, we have established the experimentally supported miRSponge database that contains data on 599 miRNA-sponge interactions and 463 ceRNA relationships from 11 species following manual curating from nearly 1200 published articles. Database classes include endogenously generated molecules including coding genes, pseudogenes, long non-coding RNAs and circular RNAs, along with exogenously introduced molecules including viral RNAs and artificial engineered sponges. Approximately 70% of the interactions were identified experimentally in disease states. miRSponge provides a user-friendly interface for convenient browsing, retrieval and downloading of dataset. A submission page is also included to allow researchers to submit newly validated miRNA sponge data. Database URL: http://www.bio-bigdata.net/miRSponge. PMID:26424084

  5. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs.

    PubMed

    Wang, Peng; Zhi, Hui; Zhang, Yunpeng; Liu, Yue; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Ning, Shangwei; Li, Xia

    2015-01-01

    In this study, we describe miRSponge, a manually curated database, which aims at providing an experimentally supported resource for microRNA (miRNA) sponges. Recent evidence suggests that miRNAs are themselves regulated by competing endogenous RNAs (ceRNAs) or 'miRNA sponges' that contain miRNA binding sites. These competitive molecules can sequester miRNAs to prevent them interacting with their natural targets to play critical roles in various biological and pathological processes. It has become increasingly important to develop a high quality database to record and store ceRNA data to support future studies. To this end, we have established the experimentally supported miRSponge database that contains data on 599 miRNA-sponge interactions and 463 ceRNA relationships from 11 species following manual curating from nearly 1200 published articles. Database classes include endogenously generated molecules including coding genes, pseudogenes, long non-coding RNAs and circular RNAs, along with exogenously introduced molecules including viral RNAs and artificial engineered sponges. Approximately 70% of the interactions were identified experimentally in disease states. miRSponge provides a user-friendly interface for convenient browsing, retrieval and downloading of dataset. A submission page is also included to allow researchers to submit newly validated miRNA sponge data. Database URL: http://www.bio-bigdata.net/miRSponge.

  6. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers

    PubMed Central

    Ning, Shangwei; Zhang, Jizhou; Wang, Peng; Zhi, Hui; Wang, Jianjian; Liu, Yue; Gao, Yue; Guo, Maoni; Yue, Ming; Wang, Lihua; Li, Xia

    2016-01-01

    Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) is a manually curated database of cancer-associated long non-coding RNAs (lncRNAs) with experimental support that aims to provide a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. LncRNAs represent a large category of functional RNA molecules that play a significant role in human cancers. A curated collection and summary of deregulated lncRNAs in cancer is essential to thoroughly understand the mechanisms and functions of lncRNAs. Here, we developed the Lnc2Cancer database, which contains 1057 manually curated associations between 531 lncRNAs and 86 human cancers. Each association includes lncRNA and cancer name, the lncRNA expression pattern, experimental techniques, a brief functional description, the original reference and additional annotation information. Lnc2Cancer provides a user-friendly interface to conveniently browse, retrieve and download data. Lnc2Cancer also offers a submission page for researchers to submit newly validated lncRNA-cancer associations. With the rapidly increasing interest in lncRNAs, Lnc2Cancer will significantly improve our understanding of lncRNA deregulation in cancer and has the potential to be a timely and valuable resource. PMID:26481356

  7. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers.

    PubMed

    Ning, Shangwei; Zhang, Jizhou; Wang, Peng; Zhi, Hui; Wang, Jianjian; Liu, Yue; Gao, Yue; Guo, Maoni; Yue, Ming; Wang, Lihua; Li, Xia

    2016-01-04

    Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) is a manually curated database of cancer-associated long non-coding RNAs (lncRNAs) with experimental support that aims to provide a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. LncRNAs represent a large category of functional RNA molecules that play a significant role in human cancers. A curated collection and summary of deregulated lncRNAs in cancer is essential to thoroughly understand the mechanisms and functions of lncRNAs. Here, we developed the Lnc2Cancer database, which contains 1057 manually curated associations between 531 lncRNAs and 86 human cancers. Each association includes lncRNA and cancer name, the lncRNA expression pattern, experimental techniques, a brief functional description, the original reference and additional annotation information. Lnc2Cancer provides a user-friendly interface to conveniently browse, retrieve and download data. Lnc2Cancer also offers a submission page for researchers to submit newly validated lncRNA-cancer associations. With the rapidly increasing interest in lncRNAs, Lnc2Cancer will significantly improve our understanding of lncRNA deregulation in cancer and has the potential to be a timely and valuable resource.

  8. EpiDBase: a manually curated database for small molecule modulators of epigenetic landscape

    PubMed Central

    Loharch, Saurabh; Bhutani, Isha; Jain, Kamal; Gupta, Pawan; Sahoo, Debendra K.; Parkesh, Raman

    2015-01-01

    We have developed EpiDBase (www.epidbase.org), an interactive database of small molecule ligands of epigenetic protein families by bringing together experimental, structural and chemoinformatic data in one place. Currently, EpiDBase encompasses 5784 unique ligands (11 422 entries) of various epigenetic markers such as writers, erasers and readers. The EpiDBase includes experimental IC50 values, ligand molecular weight, hydrogen bond donor and acceptor count, XlogP, number of rotatable bonds, number of aromatic rings, InChIKey, two-dimensional and three-dimensional (3D) chemical structures. A catalog of all epidbase ligands based on the molecular weight is also provided. A structure editor is provided for 3D visualization of ligands. EpiDBase is integrated with tools like text search, disease-specific search, advanced search, substructure, and similarity analysis. Advanced analysis can be performed using substructure and OpenBabel-based chemical similarity fingerprints. The EpiDBase is curated to identify unique molecular scaffolds. Initially, molecules were selected by removing peptides, macrocycles and other complex structures and then processed for conformational sampling by generating 3D conformers. Subsequent filtering through Zinc Is Not Commercial (ZINC: a free database of commercially available compounds for virtual screening) and Lilly MedChem regular rules retained many distinctive drug-like molecules. These molecules were then analyzed for physicochemical properties using OpenBabel descriptors and clustered using various methods such as hierarchical clustering, binning partition and multidimensional scaling. EpiDBase provides comprehensive resources for further design, development and refinement of small molecule modulators of epigenetic markers. Database URL: www.epidbase.org PMID:25776023

  9. 3CDB: a manually curated database of chromosome conformation capture data.

    PubMed

    Yun, Xiaoxiao; Xia, Lili; Tang, Bixia; Zhang, Hui; Li, Feifei; Zhang, Zhihua

    2016-01-01

    Chromosome conformation capture (3C) is a biochemical technology to analyse contact frequencies between selected genomic sites in a cell population. Its recent genomic variants, e.g. Hi-C/ chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the study of nuclear organization at an unprecedented level. However, due to the inherent low resolution and ultrahigh cost of Hi-C/ChIA-PET, 3C is still the gold standard for determining interactions between given regulatory DNA elements, such as enhancers and promoters. Therefore, we developed a database of 3C determined functional chromatin interactions (3CDB;http://3cdb.big.ac.cn). To construct 3CDB, we searched PubMed and Google Scholar with carefully designed keyword combinations and retrieved more than 5000 articles from which we manually extracted 3319 interactions in 17 species. Moreover, we proposed a systematic evaluation scheme for data reliability and classified the interactions into four categories. Contact frequencies are not directly comparable as a result of various modified 3C protocols employed among laboratories. Our evaluation scheme provides a plausible solution to this long-standing problem in the field. A user-friendly web interface was designed to assist quick searches in 3CDB. We believe that 3CDB will provide fundamental information for experimental design and phylogenetic analysis, as well as bridge the gap between molecular and systems biologists who must now contend with noisy high-throughput data.Database URL:http://3cdb.big.ac.cn.

  10. Exposome-Explorer: a manually-curated database on biomarkers of exposure to dietary and environmental factors

    PubMed Central

    Neveu, Vanessa; Moussy, Alice; Rouaix, Héloïse; Wedekind, Roland; Pon, Allison; Knox, Craig; Wishart, David S.; Scalbert, Augustin

    2017-01-01

    Exposome-Explorer (http://exposome-explorer.iarc.fr) is the first database dedicated to biomarkers of exposure to environmental risk factors. It contains detailed information on the nature of biomarkers, their concentrations in various human biospecimens, the study population where measured and the analytical techniques used for measurement. It also contains correlations with external exposure measurements and data on biological reproducibility over time. The data in Exposome-Explorer was manually collected from peer-reviewed publications and organized to make it easily accessible through a web interface for in-depth analyses. The database and the web interface were developed using the Ruby on Rails framework. A total of 480 publications were analyzed and 10 510 concentration values in blood, urine and other biospecimens for 692 dietary and pollutant biomarkers were collected. Over 8000 correlation values between dietary biomarker levels and food intake as well as 536 values of biological reproducibility over time were also compiled. Exposome-Explorer makes it easy to compare the performance between biomarkers and their fields of application. It should be particularly useful for epidemiologists and clinicians wishing to select panels of biomarkers that can be used in biomonitoring studies or in exposome-wide association studies, thereby allowing them to better understand the etiology of chronic diseases. PMID:27924041

  11. A Curated Database of Rodent Uterotrophic Bioactivity

    PubMed Central

    Kleinstreuer, Nicole C.; Ceger, Patricia C.; Allen, David G.; Strickland, Judy; Chang, Xiaoqing; Hamm, Jonathan T.; Casey, Warren M.

    2015-01-01

    Background: Novel in vitro methods are being developed to identify chemicals that may interfere with estrogen receptor (ER) signaling, but the results are difficult to put into biological context because of reliance on reference chemicals established using results from other in vitro assays and because of the lack of high-quality in vivo reference data. The Organisation for Economic Co-operation and Development (OECD)-validated rodent uterotrophic bioassay is considered the “gold standard” for identifying potential ER agonists. Objectives: We performed a comprehensive literature review to identify and evaluate data from uterotrophic studies and to analyze study variability. Methods: We reviewed 670 articles with results from 2,615 uterotrophic bioassays using 235 unique chemicals. Study descriptors, such as species/strain, route of administration, dosing regimen, lowest effect level, and test outcome, were captured in a database of uterotrophic results. Studies were assessed for adherence to six criteria that were based on uterotrophic regulatory test guidelines. Studies meeting all six criteria (458 bioassays on 118 unique chemicals) were considered guideline-like (GL) and were subsequently analyzed. Results: The immature rat model was used for 76% of the GL studies. Active outcomes were more prevalent across rat models (74% active) than across mouse models (36% active). Of the 70 chemicals with at least two GL studies, 18 (26%) had discordant outcomes and were classified as both active and inactive. Many discordant results were attributable to differences in study design (e.g., injection vs. oral dosing). Conclusions: This uterotrophic database provides a valuable resource for understanding in vivo outcome variability and for evaluating the performance of in vitro assays that measure estrogenic activity. Citation: Kleinstreuer NC, Ceger PC, Allen DG, Strickland J, Chang X, Hamm JT, Casey WM. 2016. A curated database of rodent uterotrophic bioactivity. Environ

  12. CARD 2017: expansion and model-centric curation of the Comprehensive Antibiotic Resistance Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins, and mutations involved in AMR. CARD is ontologi...

  13. MIPS: curated databases and comprehensive secondary data resources in 2010.

    PubMed

    Mewes, H Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F X; Stümpflen, Volker; Antonov, Alexey

    2011-01-01

    The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).

  14. From manual curation to visualization of gene families and networks across Solanaceae plant species.

    PubMed

    Pujar, Anuradha; Menda, Naama; Bombarely, Aureliano; Edwards, Jeremy D; Strickler, Susan R; Mueller, Lukas A

    2013-01-01

    High-quality manual annotation methods and practices need to be scaled to the increased rate of genomic data production. Curation based on gene families and gene networks is one approach that can significantly increase both curation efficiency and quality. The Sol Genomics Network (SGN; http://solgenomics.net) is a comparative genomics platform, with genetic, genomic and phenotypic information of the Solanaceae family and its closely related species that incorporates a community-based gene and phenotype curation system. In this article, we describe a manual curation system for gene families aimed at facilitating curation, querying and visualization of gene interaction patterns underlying complex biological processes, including an interface for efficiently capturing information from experiments with large data sets reported in the literature. Well-annotated multigene families are useful for further exploration of genome organization and gene evolution across species. As an example, we illustrate the system with the multigene transcription factor families, WRKY and Small Auxin Up-regulated RNA (SAUR), which both play important roles in responding to abiotic stresses in plants. Database URL: http://solgenomics.net/

  15. A computational platform to maintain and migrate manual functional annotations for BioCyc databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Model organism databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integratio...

  16. Targeted journal curation as a method to improve data currency at the Comparative Toxicogenomics Database.

    PubMed

    Davis, Allan Peter; Johnson, Robin J; Lennon-Hopkins, Kelley; Sciaky, Daniela; Rosenstein, Michael C; Wiegers, Thomas C; Mattingly, Carolyn J

    2012-01-01

    The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and manually curate a triad of chemical-gene, chemical-disease and gene-disease interactions. Typically, articles for CTD are selected using a chemical-centric approach by querying PubMed to retrieve a corpus containing the chemical of interest. Although this technique ensures adequate coverage of knowledge about the chemical (i.e. data completeness), it does not necessarily reflect the most current state of all toxicological research in the community at large (i.e. data currency). Keeping databases current with the most recent scientific results, as well as providing a rich historical background from legacy articles, is a challenging process. To address this issue of data currency, CTD designed and tested a journal-centric approach of curation to complement our chemical-centric method. We first identified priority journals based on defined criteria. Next, over 7 weeks, three biocurators reviewed 2425 articles from three consecutive years (2009-2011) of three targeted journals. From this corpus, 1252 articles contained relevant data for CTD and 52 752 interactions were manually curated. Here, we describe our journal selection process, two methods of document delivery for the biocurators and the analysis of the resulting curation metrics, including data currency, and both intra-journal and inter-journal comparisons of research topics. Based on our results, we expect that curation by select journals can (i) be easily incorporated into the curation pipeline to complement our chemical-centric approach; (ii) build content more evenly for chemicals, genes and diseases in CTD (rather than biasing data by chemicals-of-interest); (iii) reflect developing areas in environmental health and (iv) improve overall data currency for chemicals, genes and diseases. Database URL

  17. Targeted journal curation as a method to improve data currency at the Comparative Toxicogenomics Database

    PubMed Central

    Davis, Allan Peter; Johnson, Robin J.; Lennon-Hopkins, Kelley; Sciaky, Daniela; Rosenstein, Michael C.; Wiegers, Thomas C.; Mattingly, Carolyn J.

    2012-01-01

    The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and manually curate a triad of chemical–gene, chemical–disease and gene–disease interactions. Typically, articles for CTD are selected using a chemical-centric approach by querying PubMed to retrieve a corpus containing the chemical of interest. Although this technique ensures adequate coverage of knowledge about the chemical (i.e. data completeness), it does not necessarily reflect the most current state of all toxicological research in the community at large (i.e. data currency). Keeping databases current with the most recent scientific results, as well as providing a rich historical background from legacy articles, is a challenging process. To address this issue of data currency, CTD designed and tested a journal-centric approach of curation to complement our chemical-centric method. We first identified priority journals based on defined criteria. Next, over 7 weeks, three biocurators reviewed 2425 articles from three consecutive years (2009–2011) of three targeted journals. From this corpus, 1252 articles contained relevant data for CTD and 52 752 interactions were manually curated. Here, we describe our journal selection process, two methods of document delivery for the biocurators and the analysis of the resulting curation metrics, including data currency, and both intra-journal and inter-journal comparisons of research topics. Based on our results, we expect that curation by select journals can (i) be easily incorporated into the curation pipeline to complement our chemical-centric approach; (ii) build content more evenly for chemicals, genes and diseases in CTD (rather than biasing data by chemicals-of-interest); (iii) reflect developing areas in environmental health and (iv) improve overall data currency for chemicals, genes and diseases. Database

  18. OntoMate: a text-mining tool aiding curation at the Rat Genome Database

    PubMed Central

    Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558

  19. OntoMate: a text-mining tool aiding curation at the Rat Genome Database.

    PubMed

    Liu, Weisong; Laulederkind, Stanley J F; Hayman, G Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu.

  20. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database

    PubMed Central

    Jia, Baofeng; Raphenya, Amogelang R.; Alcock, Brian; Waglechner, Nicholas; Guo, Peiyao; Tsang, Kara K.; Lago, Briony A.; Dave, Biren M.; Pereira, Sheldon; Sharma, Arjun N.; Doshi, Sachin; Courtot, Mélanie; Lo, Raymond; Williams, Laura E.; Frye, Jonathan G.; Elsayegh, Tariq; Sardar, Daim; Westman, Erin L.; Pawlowski, Andrew C.; Johnson, Timothy A.; Brinkman, Fiona S.L.; Wright, Gerard D.; McArthur, Andrew G.

    2017-01-01

    The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR. CARD is ontologically structured, model centric, and spans the breadth of AMR drug classes and resistance mechanisms, including intrinsic, mutation-driven and acquired resistance. It is built upon the Antibiotic Resistance Ontology (ARO), a custom built, interconnected and hierarchical controlled vocabulary allowing advanced data sharing and organization. Its design allows the development of novel genome analysis tools, such as the Resistance Gene Identifier (RGI) for resistome prediction from raw genome sequence. Recent improvements include extensive curation of additional reference sequences and mutations, development of a unique Model Ontology and accompanying AMR detection models to power sequence analysis, new visualization tools, and expansion of the RGI for detection of emergent AMR threats. CARD curation is updated monthly based on an interplay of manual literature curation, computational text mining, and genome analysis. PMID:27789705

  1. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database.

    PubMed

    Jia, Baofeng; Raphenya, Amogelang R; Alcock, Brian; Waglechner, Nicholas; Guo, Peiyao; Tsang, Kara K; Lago, Briony A; Dave, Biren M; Pereira, Sheldon; Sharma, Arjun N; Doshi, Sachin; Courtot, Mélanie; Lo, Raymond; Williams, Laura E; Frye, Jonathan G; Elsayegh, Tariq; Sardar, Daim; Westman, Erin L; Pawlowski, Andrew C; Johnson, Timothy A; Brinkman, Fiona S L; Wright, Gerard D; McArthur, Andrew G

    2017-01-04

    The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR. CARD is ontologically structured, model centric, and spans the breadth of AMR drug classes and resistance mechanisms, including intrinsic, mutation-driven and acquired resistance. It is built upon the Antibiotic Resistance Ontology (ARO), a custom built, interconnected and hierarchical controlled vocabulary allowing advanced data sharing and organization. Its design allows the development of novel genome analysis tools, such as the Resistance Gene Identifier (RGI) for resistome prediction from raw genome sequence. Recent improvements include extensive curation of additional reference sequences and mutations, development of a unique Model Ontology and accompanying AMR detection models to power sequence analysis, new visualization tools, and expansion of the RGI for detection of emergent AMR threats. CARD curation is updated monthly based on an interplay of manual literature curation, computational text mining, and genome analysis.

  2. Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the comparative toxicogenomics database.

    PubMed

    Davis, Allan Peter; Wiegers, Thomas C; Johnson, Robin J; Lay, Jean M; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; Murphy, Cynthia Grondin; Mattingly, Carolyn J

    2013-01-01

    The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency.

  3. A manual curation strategy to improve genome annotation: application to a set of haloarchael genomes.

    PubMed

    Pfeiffer, Friedhelm; Oesterhelt, Dieter

    2015-06-02

    Genome annotation errors are a persistent problem that impede research in the biosciences. A manual curation effort is described that attempts to produce high-quality genome annotations for a set of haloarchaeal genomes (Halobacterium salinarum and Hbt. hubeiense, Haloferax volcanii and Hfx. mediterranei, Natronomonas pharaonis and Nmn. moolapensis, Haloquadratum walsbyi strains HBSQ001 and C23, Natrialba magadii, Haloarcula marismortui and Har. hispanica, and Halohasta litchfieldiae). Genomes are checked for missing genes, start codon misassignments, and disrupted genes. Assignments of a specific function are preferably based on experimentally characterized homologs (Gold Standard Proteins). To avoid overannotation, which is a major source of database errors, we restrict annotation to only general function assignments when support for a specific substrate assignment is insufficient. This strategy results in annotations that are resistant to the plethora of errors that compromise public databases. Annotation consistency is rigorously validated for ortholog pairs from the genomes surveyed. The annotation is regularly crosschecked against the UniProt database to further improve annotations and increase the level of standardization. Enhanced genome annotations are submitted to public databases (EMBL/GenBank, UniProt), to the benefit of the scientific community. The enhanced annotations are also publically available via HaloLex.

  4. Curating and Preserving the Big Canopy Database System: an Active Curation Approach using SEAD

    NASA Astrophysics Data System (ADS)

    Myers, J.; Cushing, J. B.; Lynn, P.; Weiner, N.; Ovchinnikova, A.; Nadkarni, N.; McIntosh, A.

    2015-12-01

    Modern research is increasingly dependent upon highly heterogeneous data and on the associated cyberinfrastructure developed to organize, analyze, and visualize that data. However, due to the complexity and custom nature of such combined data-software systems, it can be very challenging to curate and preserve them for the long term at reasonable cost and in a way that retains their scientific value. In this presentation, we describe how this challenge was met in preserving the Big Canopy Database (CanopyDB) system using an agile approach and leveraging the Sustainable Environment - Actionable Data (SEAD) DataNet project's hosted data services. The CanopyDB system was developed over more than a decade at Evergreen State College to address the needs of forest canopy researchers. It is an early yet sophisticated exemplar of the type of system that has become common in biological research and science in general, including multiple relational databases for different experiments, a custom database generation tool used to create them, an image repository, and desktop and web tools to access, analyze, and visualize this data. SEAD provides secure project spaces with a semantic content abstraction (typed content with arbitrary RDF metadata statements and relationships to other content), combined with a standards-based curation and publication pipeline resulting in packaged research objects with Digital Object Identifiers. Using SEAD, our cross-project team was able to incrementally ingest CanopyDB components (images, datasets, software source code, documentation, executables, and virtualized services) and to iteratively define and extend the metadata and relationships needed to document them. We believe that both the process, and the richness of the resultant standards-based (OAI-ORE) preservation object, hold lessons for the development of best-practice solutions for preserving scientific data in association with the tools and services needed to derive value from it.

  5. HPIDB 2.0: a curated database for host-pathogen interactions.

    PubMed

    Ammari, Mais G; Gresham, Cathy R; McCarthy, Fiona M; Nanduri, Bindu

    2016-01-01

    Identification and analysis of host-pathogen interactions (HPI) is essential to study infectious diseases. However, HPI data are sparse in existing molecular interaction databases, especially for agricultural host-pathogen systems. Therefore, resources that annotate, predict and display the HPI that underpin infectious diseases are critical for developing novel intervention strategies. HPIDB 2.0 (http://www.agbase.msstate.edu/hpi/main.html) is a resource for HPI data, and contains 45, 238 manually curated entries in the current release. Since the first description of the database in 2010, multiple enhancements to HPIDB data and interface services were made that are described here. Notably, HPIDB 2.0 now provides targeted biocuration of molecular interaction data. As a member of the International Molecular Exchange consortium, annotations provided by HPIDB 2.0 curators meet community standards to provide detailed contextual experimental information and facilitate data sharing. Moreover, HPIDB 2.0 provides access to rapidly available community annotations that capture minimum molecular interaction information to address immediate researcher needs for HPI network analysis. In addition to curation, HPIDB 2.0 integrates HPI from existing external sources and contains tools to infer additional HPI where annotated data are scarce. Compared to other interaction databases, our data collection approach ensures HPIDB 2.0 users access the most comprehensive HPI data from a wide range of pathogens and their hosts (594 pathogen and 70 host species, as of February 2016). Improvements also include enhanced search capacity, addition of Gene Ontology functional information, and implementation of network visualization. The changes made to HPIDB 2.0 content and interface ensure that users, especially agricultural researchers, are able to easily access and analyse high quality, comprehensive HPI data. All HPIDB 2.0 data are updated regularly, are publically available for direct

  6. Advancing Exposure Science through Chemical Data Curation and Integration in the Comparative Toxicogenomics Database

    PubMed Central

    Grondin, Cynthia J.; Davis, Allan Peter; Wiegers, Thomas C.; King, Benjamin L.; Wiegers, Jolene A.; Reif, David M.; Hoppin, Jane A.; Mattingly, Carolyn J.

    2016-01-01

    Background: Exposure science studies the interactions and outcomes between environmental stressors and human or ecological receptors. To augment its role in understanding human health and the exposome, we aimed to centralize and integrate exposure science data into the broader biological framework of the Comparative Toxicogenomics Database (CTD), a public resource that promotes understanding of environmental chemicals and their effects on human health. Objectives: We integrated exposure data within the CTD to provide a centralized, freely available resource that facilitates identification of connections between real-world exposures, chemicals, genes/proteins, diseases, biological processes, and molecular pathways. Methods: We developed a manual curation paradigm that captures exposure data from the scientific literature using controlled vocabularies and free text within the context of four primary exposure concepts: stressor, receptor, exposure event, and exposure outcome. Using data from the Agricultural Health Study, we have illustrated the benefits of both centralization and integration of exposure information with CTD core data. Results: We have described our curation process, demonstrated how exposure data can be accessed and analyzed in the CTD, and shown how this integration provides a broad biological context for exposure data to promote mechanistic understanding of environmental influences on human health. Conclusions: Curation and integration of exposure data within the CTD provides researchers with new opportunities to correlate exposures with human health outcomes, to identify underlying potential molecular mechanisms, and to improve understanding about the exposome. Citation: Grondin CJ, Davis AP, Wiegers TC, King BL, Wiegers JA, Reif DM, Hoppin JA, Mattingly CJ. 2016. Advancing exposure science through chemical data curation and integration in the Comparative Toxicogenomics Database. Environ Health Perspect 124:1592–1599; http://dx.doi.org/10

  7. Instruction manual for the Wahoo computerized database

    SciTech Connect

    Lasota, D.; Watts, K.

    1995-05-01

    As part of our research on the Lisburne Group, we have developed a powerful relational computerized database to accommodate the huge amounts of data generated by our multi-disciplinary research project. The Wahoo database has data files on petrographic data, conodont analyses, locality and sample data, well logs and diagenetic (cement) studies. Chapter 5 is essentially an instruction manual that summarizes some of the unique attributes and operating procedures of the Wahoo database. The main purpose of a database is to allow users to manipulate their data and produce reports and graphs for presentation. We present a variety of data tables in appendices at the end of this report, each encapsulating a small part of the data contained in the Wahoo database. All the data are sorted and listed by map index number and stratigraphic position (depth). The Locality data table (Appendix A) lists of the stratigraphic sections examined in our study. It gives names of study areas, stratigraphic units studied, locality information, and researchers. Most localities are keyed to a geologic map that shows the distribution of the Lisburne Group and location of our sections in ANWR. Petrographic reports (Appendix B) are detailed summaries of data the composition and texture of the Lisburne Group carbonates. The relative abundance of different carbonate grains (allochems) and carbonate texture are listed using symbols that portray data in a format similar to stratigraphic columns. This enables researchers to recognize trends in the evolution of the Lisburne carbonate platform and to check their paleoenvironmental interpretations in a stratigraphic context. Some of the figures in Chapter 1 were made using the Wahoo database.

  8. Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine

    PubMed Central

    Simmons, Michael; Lu, Zhiyong

    2016-01-01

    The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient’s genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer’s disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease

  9. Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

    PubMed

    Singhal, Ayush; Simmons, Michael; Lu, Zhiyong

    2016-11-01

    The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease

  10. PMRD: a curated database for genes and mutants involved in plant male reproduction

    PubMed Central

    2012-01-01

    Background Male reproduction is an essential biological event in the plant life cycle separating the diploid sporophyte and haploid gametophyte generations, which involves expression of approximately 20,000 genes. The control of male reproduction is also of economic importance for plant breeding and hybrid seed production. With the advent of forward and reverse genetics and genomic technologies, a large number of male reproduction-related genes have been identified. Thus it is extremely challenging for individual researchers to systematically collect, and continually update, all the available information on genes and mutants related to plant male reproduction. The aim of this study is to manually curate such gene and mutant information and provide a web-accessible resource to facilitate the effective study of plant male reproduction. Description Plant Male Reproduction Database (PMRD) is a comprehensive resource for browsing and retrieving knowledge on genes and mutants related to plant male reproduction. It is based upon literature and biological databases and includes 506 male sterile genes and 484 mutants with defects of male reproduction from a variety of plant species. Based on Gene Ontology (GO) annotations and literature, information relating to a further 3697 male reproduction related genes were systematically collected and included, and using in text curation, gene expression and phenotypic information were captured from the literature. PMRD provides a web interface which allows users to easily access the curated annotations and genomic information, including full names, symbols, locations, sequences, expression patterns, functions of genes, mutant phenotypes, male sterile categories, and corresponding publications. PMRD also provides mini tools to search and browse expression patterns of genes in microarray datasets, run BLAST searches, convert gene ID and generate gene networks. In addition, a Mediawiki engine and a forum have been integrated within the

  11. A survey of locus-specific database curation. Human Genome Variation Society.

    PubMed

    Cotton, Richard G H; Phillips, Kate; Horaitis, Ourania

    2007-04-01

    It is widely accepted that curation of variation in genes is best performed by experts in those genes and their variation. However, obtaining funding for such variation is difficult even though up-to-date lists of variations in genes are essential for optimum delivery of genetic healthcare and for medical research. This study was undertaken to gather information on gene-specific databases (locus-specific databases) in an effort to understand their functioning, funding and needs. A questionnaire was sent to 125 curators and we received 47 responses. Individuals performed curation of up to 69 genes. The time curators spent curating was extremely variable. This ranged from 0 h per week up to 5 curators spending over 4 h per week. The funding required ranged from US$600 to US$45,000 per year. Most databases were stimulated by the Human Genome Organization-Mutation Database Initiative and used their guidelines. Many databases reported unpublished mutations, with all but one respondent reporting errors in the literature. Of the 13 who reported hit rates, 9 reported over 52,000 hits per year. On the basis of this, five recommendations were made to improve the curation of variation information, particularly that of mutations causing single-gene disorder: 1. A curator for each gene, who is an expert in it, should be identified or nominated. 2. Curation at a minimum of 2 h per week at US$2000 per gene per year should be encouraged. 3. Guidelines and custom software use should be encouraged to facilitate easy setup and curation. 4. Hits per week on the website should be recorded to allow the importance of the site to be illustrated for grant-giving purposes. 5. Published protocols should be followed in the establishment of locus-specific databases.

  12. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences.

    PubMed

    McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta

    2016-01-01

    BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation.Database URL: https://www.biosharing.org.

  13. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences

    PubMed Central

    McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta

    2016-01-01

    BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation. Database URL: https://www.biosharing.org PMID:27189610

  14. National Solar Radiation Database 1991-2005 Update: User's Manual

    SciTech Connect

    Wilcox, S.

    2007-04-01

    This manual describes how to obtain and interpret the data products from the updated 1991-2005 National Solar Radiation Database (NSRDB). This is an update of the original 1961-1990 NSRDB released in 1992.

  15. National Solar Radiation Database 1991-2010 Update: User's Manual

    SciTech Connect

    Wilcox, S. M.

    2012-08-01

    This user's manual provides information on the updated 1991-2010 National Solar Radiation Database. Included are data format descriptions, data sources, production processes, and information about data uncertainty.

  16. Hydrologic database user`s manual

    SciTech Connect

    Champman, J.B.; Gray, K.J.; Thompson, C.B.

    1993-09-01

    The Hydrologic Database is an electronic filing cabinet containing water-related data for the Nevada Test Site (NTS). The purpose of the database is to enhance research on hydrologic issues at the NTS by providing efficient access to information gathered by a variety of scientists. Data are often generated for specific projects and are reported to DOE in the context of specific project goals. The originators of the database recognized that much of this information has a general value that transcends project-specific requirements. Allowing researchers access to information generated by a wide variety of projects can prevent needless duplication of data-gathering efforts and can augment new data collection and interpretation. In addition, collecting this information in the database ensures that the results are not lost at the end of discrete projects as long as the database is actively maintained. This document is a guide to using the database.

  17. AtlasT4SS: A curated database for type IV secretion systems

    PubMed Central

    2012-01-01

    Background The type IV secretion system (T4SS) can be classified as a large family of macromolecule transporter systems, divided into three recognized sub-families, according to the well-known functions. The major sub-family is the conjugation system, which allows transfer of genetic material, such as a nucleoprotein, via cell contact among bacteria. Also, the conjugation system can transfer genetic material from bacteria to eukaryotic cells; such is the case with the T-DNA transfer of Agrobacterium tumefaciens to host plant cells. The system of effector protein transport constitutes the second sub-family, and the third one corresponds to the DNA uptake/release system. Genome analyses have revealed numerous T4SS in Bacteria and Archaea. The purpose of this work was to organize, classify, and integrate the T4SS data into a single database, called AtlasT4SS - the first public database devoted exclusively to this prokaryotic secretion system. Description The AtlasT4SS is a manual curated database that describes a large number of proteins related to the type IV secretion system reported so far in Gram-negative and Gram-positive bacteria, as well as in Archaea. The database was created using the RDBMS MySQL and the Catalyst Framework based in the Perl programming language and using the Model-View-Controller (MVC) design pattern for Web. The current version holds a comprehensive collection of 1,617 T4SS proteins from 58 Bacteria (49 Gram-negative and 9 Gram-Positive), one Archaea and 11 plasmids. By applying the bi-directional best hit (BBH) relationship in pairwise genome comparison, it was possible to obtain a core set of 134 clusters of orthologous genes encoding T4SS proteins. Conclusions In our database we present one way of classifying orthologous groups of T4SSs in a hierarchical classification scheme with three levels. The first level comprises four classes that are based on the organization of genetic determinants, shared homologies, and evolutionary

  18. DEPOT database: Reference manual and user's guide

    SciTech Connect

    Clancey, P.; Logg, C.

    1991-03-01

    DEPOT has been developed to provide tracking for the Stanford Linear Collider (SLC) control system equipment. For each piece of equipment entered into the database, complete location, service, maintenance, modification, certification, and radiation exposure histories can be maintained. To facilitate data entry accuracy, efficiency, and consistency, barcoding technology has been used extensively. DEPOT has been an important tool in improving the reliability of the microsystems controlling SLC. This document describes the components of the DEPOT database, the elements in the database records, and the use of the supporting programs for entering data, searching the database, and producing reports from the information.

  19. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database.

    PubMed

    Drabkin, Harold J; Blake, Judith A

    2012-01-01

    The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as 'GO' or 'homology' or 'phenotype'. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as 'papers selected for GO that refer to genes with NO GO annotation'. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported

  20. Geroprotectors.org: a new, structured and curated database of current therapeutic interventions in aging and age-related disease.

    PubMed

    Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex

    2015-09-01

    As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions.

  1. A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions.

    PubMed

    Davis, Allan Peter; Wiegers, Thomas C; Roberts, Phoebe M; King, Benjamin L; Lay, Jean M; Lennon-Hopkins, Kelley; Sciaky, Daniela; Johnson, Robin; Keating, Heather; Greene, Nigel; Hernandez, Robert; McConnell, Kevin J; Enayetallah, Ahmed E; Mattingly, Carolyn J

    2013-01-01

    Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88,629 articles relating over 1,200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 254,173 toxicogenomic interactions (152,173 chemical-disease, 58,572 chemical-gene, 5,345 gene-disease and 38,083 phenotype interactions). All chemical-gene-disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer's text-mining process to collate the articles, and CTD's curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug-disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades' worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/

  2. SpermatogenesisOnline 1.0: a resource for spermatogenesis based on manual literature curation and genome-wide data mining.

    PubMed

    Zhang, Yuanwei; Zhong, Liangwen; Xu, Bo; Yang, Yifan; Ban, Rongjun; Zhu, Jun; Cooke, Howard J; Hao, Qiaomei; Shi, Qinghua

    2013-01-01

    Human infertility affects 10-15% of couples, half of which is attributed to the male partner. Abnormal spermatogenesis is a major cause of male infertility. Characterizing the genes involved in spermatogenesis is fundamental to understand the mechanisms underlying this biological process and in developing treatments for male infertility. Although many genes have been implicated in spermatogenesis, no dedicated bioinformatic resource for spermatogenesis is available. We have developed such a database, SpermatogenesisOnline 1.0 (http://mcg.ustc.edu.cn/sdap1/spermgenes/), using manual curation from 30 233 articles published before 1 May 2012. It provides detailed information for 1666 genes reported to participate in spermatogenesis in 37 organisms. Based on the analysis of these genes, we developed an algorithm, Greed AUC Stepwise (GAS) model, which predicted 762 genes to participate in spermatogenesis (GAS probability >0.5) based on genome-wide transcriptional data in Mus musculus testis from the ArrayExpress database. These predicted and experimentally verified genes were annotated, with several identical spermatogenesis-related GO terms being enriched for both classes. Furthermore, protein-protein interaction analysis indicates direct interactions of predicted genes with the experimentally verified ones, which supports the reliability of GAS. The strategy (manual curation and data mining) used to develop SpermatogenesisOnline 1.0 can be easily extended to other biological processes.

  3. Estimating the annotation error rate of curated GO database sequence annotations

    PubMed Central

    Jones, Craig E; Brown, Alfred L; Baumann, Ute

    2007-01-01

    Background Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO) sequence database (GOSeqLite). This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. Results We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006) at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS) had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS) had an estimated error rate of 49%. Conclusion While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information. PMID:17519041

  4. A curated gluten protein sequence database to support development of proteomics methods for determination of gluten in gluten-free foods.

    PubMed

    Bromilow, Sophie; Gethings, Lee A; Buckley, Mike; Bromley, Mike; Shewry, Peter R; Langridge, James I; Clare Mills, E N

    2017-04-03

    The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative. However, proteomic methods require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten.

  5. The SIB Swiss Institute of Bioinformatics’ resources: focus on curated databases

    PubMed Central

    2016-01-01

    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article. PMID:26615188

  6. The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases.

    PubMed

    2016-01-04

    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article.

  7. CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by 13C-fluxomics.

    PubMed

    Zhang, Zhengdong; Shen, Tie; Rui, Bin; Zhou, Wenwei; Zhou, Xiangfei; Shang, Chuanyu; Xin, Chenwei; Liu, Xiaoguang; Li, Gang; Jiang, Jiansi; Li, Chao; Li, Ruiyuan; Han, Mengshu; You, Shanping; Yu, Guojun; Yi, Yin; Wen, Han; Liu, Zhijie; Xie, Xiaoyao

    2015-01-01

    The Central Carbon Metabolic Flux Database (CeCaFDB, available at http://www.cecafdb.org) is a manually curated, multipurpose and open-access database for the documentation, visualization and comparative analysis of the quantitative flux results of central carbon metabolism among microbes and animal cells. It encompasses records for more than 500 flux distributions among 36 organisms and includes information regarding the genotype, culture medium, growth conditions and other specific information gathered from hundreds of journal articles. In addition to its comprehensive literature-derived data, the CeCaFDB supports a common text search function among the data and interactive visualization of the curated flux distributions with compartmentation information based on the Cytoscape Web API, which facilitates data interpretation. The CeCaFDB offers four modules to calculate a similarity score or to perform an alignment between the flux distributions. One of the modules was built using an inter programming algorithm for flux distribution alignment that was specifically designed for this study. Based on these modules, the CeCaFDB also supports an extensive flux distribution comparison function among the curated data. The CeCaFDB is strenuously designed to address the broad demands of biochemists, metabolic engineers, systems biologists and members of the -omics community.

  8. Laminin database: a tool to retrieve high-throughput and curated data for studies on laminins.

    PubMed

    Golbert, Daiane C F; Linhares-Lacerda, Leandra; Almeida, Luiz G; Correa-de-Santana, Eliane; de Oliveira, Alice R; Mundstein, Alex S; Savino, Wilson; de Vasconcelos, Ana T R

    2011-01-01

    The Laminin(LM)-database, hosted at http://www.lm.lncc.br, is the first database focusing a non-collagenous extracellular matrix protein family, the LMs. Part of the knowledge available in this website is automatically retrieved, whereas a significant amount of information is curated and annotated, thus placing LM-database beyond a simple repository of data. In its home page, an overview of the rationale for the database is seen and readers can access a tutorial to facilitate navigation in the website, which in turn is presented with tabs subdivided into LMs, receptors, extracellular binding and other related proteins. Each tab opens into a given LM or LM-related molecule, where the reader finds a series of further tabs for 'protein', 'gene structure', 'gene expression' and 'tissue distribution' and 'therapy'. Data are separated as a function of species, comprising Homo sapiens, Mus musculus and Rattus novergicus. Furthermore, there is specific tab displaying the LM nomenclatures. In another tab, a direct link to PubMed, which can be then consulted in a specific way, in terms of the biological functions of each molecule, knockout animals and genetic diseases, immune response and lymphomas/leukemias. LM-database will hopefully be a relevant tool for retrieving information concerning LMs in health and disease, particularly regarding the hemopoietic system.

  9. Strategies for annotation and curation of translational databases: the eTUMOUR project.

    PubMed

    Julià-Sapé, Margarida; Lurgi, Miguel; Mier, Mariola; Estanyol, Francesc; Rafael, Xavier; Candiota, Ana Paula; Barceló, Anna; García, Alina; Martínez-Bisbal, M Carmen; Ferrer-Luna, Rubén; Moreno-Torres, Ángel; Celda, Bernardo; Arús, Carles

    2012-01-01

    The eTUMOUR (eT) multi-centre project gathered in vivo and ex vivo magnetic resonance (MR) data, as well as transcriptomic and clinical information from brain tumour patients, with the purpose of improving the diagnostic and prognostic evaluation of future patients. In order to carry this out, among other work, a database--the eTDB--was developed. In addition to complex permission rules and software and management quality control (QC), it was necessary to develop anonymization, processing and data visualization tools for the data uploaded. It was also necessary to develop sophisticated curation strategies that involved on one hand, dedicated fields for QC-generated meta-data and specialized queries and global permissions for senior curators and on the other, to establish a set of metrics to quantify its contents. The indispensable dataset (ID), completeness and pairedness indices were set. The database contains 1317 cases created as a result of the eT project and 304 from a previous project, INTERPRET. The number of cases fulfilling the ID was 656. Completeness and pairedness were heterogeneous, depending on the data type involved.

  10. Manual curation and reannotation of the genomes of Clostridium difficile 630Δerm and Clostridium difficile 630.

    PubMed

    Dannheim, Henning; Riedel, Thomas; Neumann-Schaal, Meina; Bunk, Boyke; Schober, Isabel; Spröer, Cathrin; Chibani, Cynthia Maria; Gronow, Sabine; Liesegang, Heiko; Overmann, Jörg; Schomburg, Dietmar

    2017-01-09

    We resequenced the genome of Clostridium difficile 630Δerm (DSM 28645), a model strain commonly used for the generation of insertion mutants. The genome sequence was obtained by a combination of single-molecule real-time (SMRT) and Illumina sequencing technology. Detailed manual curation and comparison to the previously published genomic sequence revealed sequence differences including inverted regions and the presence of plasmid pCD630. Manual curation of our previously deposited genome sequence of the parental strain 630 (DSM 27543) led to an improved genome sequence. In addition, the sequence of the transposon Tn5397 was completely identified. We manually revised the current manual annotation of the initial sequence of strain 630 and modified either gene names, gene product names or assigned EC numbers of 57 % of genes. The number of hypothetical and conserved hypothetical proteins was reduced by 152. This annotation was used as a template to annotate the most recent genome sequences of the strains 630Δerm and 630. Based on the genomic analysis, several new metabolic features of C. difficile are proposed and could be supported by literature and subsequent experiments.

  11. ProPepper: a curated database for identification and analysis of peptide and immune-responsive epitope composition of cereal grain protein families.

    PubMed

    Juhász, Angéla; Haraszi, Réka; Maulis, Csaba

    2015-01-01

    ProPepper is a database that contains prolamin proteins identified from true grasses (Poaceae), their peptides obtained with single- and multi-enzyme in silico digestions as well as linear T- and B-cell-specific epitopes that are responsible for wheat-related food disorders. The integrated database and analysis platform contains datasets that are collected from multiple public databases (UniprotKB, IEDB, NCBI GenBank), manually curated and annotated, and interpreted in three main data tables: Protein-, Peptide- and Epitope list views that are cross-connected by unique identifications. Altogether 21 genera and 80 different species are represented. Currently, the database contains 2146 unique and complete protein sequences related to 2618 GenBank entries and 35 657 unique peptide sequences that are a result of 575 110 unique digestion events obtained by in silico digestion methods involving six proteolytic enzymes and their combinations. The interface allows advanced global and parametric search functions along with a download option, with direct connections to the relevant public databases. Database URL: https://propepper.net.

  12. Strategies for annotation and curation of translational databases: the eTUMOUR project

    PubMed Central

    Julià-Sapé, Margarida; Lurgi, Miguel; Mier, Mariola; Estanyol, Francesc; Rafael, Xavier; Candiota, Ana Paula; Barceló, Anna; García, Alina; Martínez-Bisbal, M. Carmen; Ferrer-Luna, Rubén; Moreno-Torres, Àngel; Celda, Bernardo; Arús, Carles

    2012-01-01

    The eTUMOUR (eT) multi-centre project gathered in vivo and ex vivo magnetic resonance (MR) data, as well as transcriptomic and clinical information from brain tumour patients, with the purpose of improving the diagnostic and prognostic evaluation of future patients. In order to carry this out, among other work, a database—the eTDB—was developed. In addition to complex permission rules and software and management quality control (QC), it was necessary to develop anonymization, processing and data visualization tools for the data uploaded. It was also necessary to develop sophisticated curation strategies that involved on one hand, dedicated fields for QC-generated meta-data and specialized queries and global permissions for senior curators and on the other, to establish a set of metrics to quantify its contents. The indispensable dataset (ID), completeness and pairedness indices were set. The database contains 1317 cases created as a result of the eT project and 304 from a previous project, INTERPRET. The number of cases fulfilling the ID was 656. Completeness and pairedness were heterogeneous, depending on the data type involved. PMID:23180768

  13. The Developmental Brain Disorders Database (DBDB): a curated neurogenetics knowledge base with clinical and research applications.

    PubMed

    Mirzaa, Ghayda M; Millen, Kathleen J; Barkovich, A James; Dobyns, William B; Paciorkowski, Alex R

    2014-06-01

    The number of single genes associated with neurodevelopmental disorders has increased dramatically over the past decade. The identification of causative genes for these disorders is important to clinical outcome as it allows for accurate assessment of prognosis, genetic counseling, delineation of natural history, inclusion in clinical trials, and in some cases determines therapy. Clinicians face the challenge of correctly identifying neurodevelopmental phenotypes, recognizing syndromes, and prioritizing the best candidate genes for testing. However, there is no central repository of definitions for many phenotypes, leading to errors of diagnosis. Additionally, there is no system of levels of evidence linking genes to phenotypes, making it difficult for clinicians to know which genes are most strongly associated with a given condition. We have developed the Developmental Brain Disorders Database (DBDB: https://www.dbdb.urmc.rochester.edu/home), a publicly available, online-curated repository of genes, phenotypes, and syndromes associated with neurodevelopmental disorders. DBDB contains the first referenced ontology of developmental brain phenotypes, and uses a novel system of levels of evidence for gene-phenotype associations. It is intended to assist clinicians in arriving at the correct diagnosis, select the most appropriate genetic test for that phenotype, and improve the care of patients with developmental brain disorders. For researchers interested in the discovery of novel genes for developmental brain disorders, DBDB provides a well-curated source of important genes against which research sequencing results can be compared. Finally, DBDB allows novel observations about the landscape of the neurogenetics knowledge base.

  14. SkinSensDB: a curated database for skin sensitization assays.

    PubMed

    Wang, Chia-Chi; Lin, Ying-Chi; Wang, Shan-Shan; Shih, Chieh; Lin, Yi-Hui; Tung, Chun-Wei

    2017-01-01

    Skin sensitization is an important toxicological endpoint for chemical hazard determination and safety assessment. Prediction of chemical skin sensitizer had traditionally relied on data from rodent models. The development of the adverse outcome pathway (AOP) and associated alternative in vitro assays have reshaped the assessment of skin sensitizers. The integration of multiple assays as key events in the AOP has been shown to have improved prediction performance. Current computational models to predict skin sensitization mainly based on in vivo assays without incorporating alternative in vitro assays. However, there are few freely available databases integrating both the in vivo and the in vitro skin sensitization assays for development of AOP-based skin sensitization prediction models. To facilitate the development of AOP-based prediction models, a skin sensitization database named SkinSensDB has been constructed by curating data from published AOP-related assays. In addition to providing datasets for developing computational models, SkinSensDB is equipped with browsing and search tools which enable the assessment of new compounds for their skin sensitization potentials based on data from structurally similar compounds. SkinSensDB is publicly available at http://cwtung.kmu.edu.tw/skinsensdb.

  15. Single nucleotide polymorphisms in Mycobacterium tuberculosis and the need for a curated database.

    PubMed

    Stucki, David; Gagneux, Sebastien

    2013-01-01

    Recent advances in DNA sequencing have led to the discovery of thousands of single nucleotide polymorphisms (SNPs) in clinical isolates of Mycobacterium tuberculosis complex (MTBC). This genetic variation has changed our understanding of the differences and phylogenetic relationships between strains. Many of these mutations can serve as phylogenetic markers for strain classification, while others cause drug resistance. Moreover, SNPs can affect the bacterial phenotype in various ways, which may have an impact on the outcome of tuberculosis (TB) infection and disease. Despite the importance of SNPs for our understanding of the diversity of MTBC populations, the research community currently lacks a comprehensive, well-curated and user-friendly database dedicated to SNP data. First attempts to catalogue and annotate SNPs in MTBC have been made, but more work is needed. In this review, we discuss the biological and epidemiological relevance of SNPs in MTBC. We then review some of the analytical challenges involved in processing SNP data, and end with a list of features, which should be included in a new SNP database for MTBC.

  16. Single nucleotide polymorphisms in Mycobacterium tuberculosis and the need for a curated database

    PubMed Central

    Stucki, David; Gagneux, Sebastien

    2013-01-01

    Summary Recent advances in DNA sequencing have lead to the discovery of thousands of single nucleotide polymorphisms (SNPs) in clinical isolates of Mycobacterium tuberculosis complex (MTBC). This genetic variation has changed our understanding of the differences and phylogenetic relationships between strains. Many of these mutations can serve as phylogenetic markers for strain classification, while others cause drug resistance. Moreover, SNPs can affect the bacterial phenotype in various ways, which may have an impact on the outcome of tuberculosis (TB) infection and disease. Despite the importance of SNPs for our understanding of the diversity of MTBC populations, the research community is currently lacking a comprehensive, well-curated and user-friendly database dedicated to SNP data. First attempts to catalogue and annotate SNPs in MTBC have been made, but more work is needed. In this review, we discuss the biological and epidemiological relevance of SNPs in MTBC. We then review some of the analytical challenges involved in processing SNP data, and end with a list of features, which should be included in a new SNP database for MTBC. PMID:23266261

  17. Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database.

    PubMed

    Vishnyakova, Dina; Pasche, Emilie; Ruch, Patrick

    2012-01-01

    We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat.

  18. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    PubMed

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

  19. UNSODA UNSATURATED SOIL HYDRAULIC DATABASE USER'S MANUAL VERSION 1.0

    EPA Science Inventory

    This report contains general documentation and serves as a user manual of the UNSODA program. UNSODA is a database of unsaturated soil hydraulic properties (water retention, hydraulic conductivity, and soil water diffusivity), basic soil properties (particle-size distribution, b...

  20. Business Students' Reading Skills Related to Those Required for Use of Database Software Manuals.

    ERIC Educational Resources Information Center

    Schmidt, B. June

    1986-01-01

    A test was given that (1) assessed business students' ability to read materials from microcomputer database software manuals and (2) assessed their knowledge of technical vocabulary terms from the manuals. Test results revealed that the students' technical reading and vocabulary skills were considerably below those they would need to use…

  1. Selective databases distributed on the basis of Frascati manual

    PubMed Central

    Kujundzic, Enes; Masic, Izet

    2013-01-01

    Introduction The answer to the question of what a database is and what is its relevance to the scientific research is not easy. We may not be wrong if we say that it is, basically, a kind of information resource, often incomparably richer than it is, for example, a single book or magazine. Discussion and conclusion As a form of storing and retrieval of the knowledge, appeared in the information age, which we’ve just participated and witnesses. In it, thanks to the technical possibilities of information networks, it can be searched for a number of more or less relevant information, and that scientific and profound content. Databases are divided into: bibliographic databases, citation databases and databases containing full-text. In the paper are shortly presented most important on-line databases with their web site links. Thanks to those online databases scientific knowledge is spreading much more easy and useful. PMID:23572867

  2. TWRS information locator database system administrator`s manual

    SciTech Connect

    Knutson, B.J., Westinghouse Hanford

    1996-09-13

    This document is a guide for use by the Tank Waste Remediation System (TWRS) Information Locator Database (ILD) System Administrator. The TWRS ILD System is an inventory of information used in the TWRS Systems Engineering process to represent the TWRS Technical Baseline. The inventory is maintained in the form of a relational database developed in Paradox 4.5.

  3. Nuclear Energy Infrastructure Database Description and User’s Manual

    SciTech Connect

    Heidrich, Brenden

    2015-11-01

    In 2014, the Deputy Assistant Secretary for Science and Technology Innovation initiated the Nuclear Energy (NE)–Infrastructure Management Project by tasking the Nuclear Science User Facilities, formerly the Advanced Test Reactor National Scientific User Facility, to create a searchable and interactive database of all pertinent NE-supported and -related infrastructure. This database, known as the Nuclear Energy Infrastructure Database (NEID), is used for analyses to establish needs, redundancies, efficiencies, distributions, etc., to best understand the utility of NE’s infrastructure and inform the content of infrastructure calls. The Nuclear Science User Facilities developed the database by utilizing data and policy direction from a variety of reports from the U.S. Department of Energy, the National Research Council, the International Atomic Energy Agency, and various other federal and civilian resources. The NEID currently contains data on 802 research and development instruments housed in 377 facilities at 84 institutions in the United States and abroad. The effort to maintain and expand the database is ongoing. Detailed information on many facilities must be gathered from associated institutions and added to complete the database. The data must be validated and kept current to capture facility and instrumentation status as well as to cover new acquisitions and retirements. This document provides a short tutorial on the navigation of the NEID web portal at NSUF-Infrastructure.INL.gov.

  4. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    PubMed

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2007-01-01

    NCBI's reference sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. The database includes 3774 organisms spanning prokaryotes, eukaryotes and viruses, and has records for 2,879,860 proteins (RefSeq release 19). RefSeq records integrate information from multiple sources, when additional data are available from those sources and therefore represent a current description of the sequence and its features. Annotations include coding regions, conserved domains, tRNAs, sequence tagged sites (STS), variation, references, gene and protein product names, and database cross-references. Sequence is reviewed and features are added using a combined approach of collaboration and other input from the scientific community, prediction, propagation from GenBank and curation by NCBI staff. The format of all RefSeq records is validated, and an increasing number of tests are being applied to evaluate the quality of sequence and annotation, especially in the context of complete genomic sequence.

  5. TRENDS: A flight test relational database user's guide and reference manual

    NASA Technical Reports Server (NTRS)

    Bondi, M. J.; Bjorkman, W. S.; Cross, J. L.

    1994-01-01

    This report is designed to be a user's guide and reference manual for users intending to access rotocraft test data via TRENDS, the relational database system which was developed as a tool for the aeronautical engineer with no programming background. This report has been written to assist novice and experienced TRENDS users. TRENDS is a complete system for retrieving, searching, and analyzing both numerical and narrative data, and for displaying time history and statistical data in graphical and numerical formats. This manual provides a 'guided tour' and a 'user's guide' for the new and intermediate-skilled users. Examples for the use of each menu item within TRENDS is provided in the Menu Reference section of the manual, including full coverage for TIMEHIST, one of the key tools. This manual is written around the XV-15 Tilt Rotor database, but does include an appendix on the UH-60 Blackhawk database. This user's guide and reference manual establishes a referrable source for the research community and augments NASA TM-101025, TRENDS: The Aeronautical Post-Test, Database Management System, Jan. 1990, written by the same authors.

  6. Improving the Discoverability and Availability of Sample Data and Imagery in NASA's Astromaterials Curation Digital Repository Using a New Common Architecture for Sample Databases

    NASA Technical Reports Server (NTRS)

    Todd, N. S.; Evans, C.

    2015-01-01

    The Astromaterials Acquisition and Curation Office at NASA's Johnson Space Center (JSC) is the designated facility for curating all of NASA's extraterrestrial samples. The suite of collections includes the lunar samples from the Apollo missions, cosmic dust particles falling into the Earth's atmosphere, meteorites collected in Antarctica, comet and interstellar dust particles from the Stardust mission, asteroid particles from the Japanese Hayabusa mission, and solar wind atoms collected during the Genesis mission. To support planetary science research on these samples, NASA's Astromaterials Curation Office hosts the Astromaterials Curation Digital Repository, which provides descriptions of the missions and collections, and critical information about each individual sample. Our office is implementing several informatics initiatives with the goal of better serving the planetary research community. One of these initiatives aims to increase the availability and discoverability of sample data and images through the use of a newly designed common architecture for Astromaterials Curation databases.

  7. The Intelligent Monitoring System: Generic Database Interface (GDI). User Manual. Revision

    DTIC Science & Technology

    1994-01-03

    SUBTITLE -- -S. FUNDING NUMBERS The Intelligent monitoring System Generic Database Interface (GDI) User Manual MDA972-92-.C-0026 6. AUTHOR(S) Jean ...1992 Load 8plus Database Interface by typing 1lbsdit(vendor)’. "oracle" (default) or amontagew Working data will be in /bame/gyoer/ jean /.Data...FILE gdi cbannel.c SEE ALSO gdlabort) AUTHOR Jean T. Anderson, SAIC Geophysical Systems Operation, Open Systems Division Sun Release 4.1 Last change

  8. CIPRO 2.5: Ciona intestinalis protein database, a unique integrated repository of large-scale omics data, bioinformatic analyses and curated annotation, with user rating and reviewing functionality.

    PubMed

    Endo, Toshinori; Ueno, Keisuke; Yonezawa, Kouki; Mineta, Katsuhiko; Hotta, Kohji; Satou, Yutaka; Yamada, Lixy; Ogasawara, Michio; Takahashi, Hiroki; Nakajima, Ayako; Nakachi, Mia; Nomura, Mamoru; Yaguchi, Junko; Sasakura, Yasunori; Yamasaki, Chisato; Sera, Miho; Yoshizawa, Akiyasu C; Imanishi, Tadashi; Taniguchi, Hisaaki; Inaba, Kazuo

    2011-01-01

    The Ciona intestinalis protein database (CIPRO) is an integrated protein database for the tunicate species C. intestinalis. The database is unique in two respects: first, because of its phylogenetic position, Ciona is suitable model for understanding vertebrate evolution; and second, the database includes original large-scale transcriptomic and proteomic data. Ciona intestinalis has also been a favorite of developmental biologists. Therefore, large amounts of data exist on its development and morphology, along with a recent genome sequence and gene expression data. The CIPRO database is aimed at collecting those published data as well as providing unique information from unpublished experimental data, such as 3D expression profiling, 2D-PAGE and mass spectrometry-based large-scale analyses at various developmental stages, curated annotation data and various bioinformatic data, to facilitate research in diverse areas, including developmental, comparative and evolutionary biology. For medical and evolutionary research, homologs in humans and major model organisms are intentionally included. The current database is based on a recently developed KH model containing 36,034 unique sequences, but for higher usability it covers 89,683 all known and predicted proteins from all gene models for this species. Of these sequences, more than 10,000 proteins have been manually annotated. Furthermore, to establish a community-supported protein database, these annotations are open to evaluation by users through the CIPRO website. CIPRO 2.5 is freely accessible at http://cipro.ibio.jp/2.5.

  9. Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis.

    PubMed

    Blohm, Philipp; Frishman, Goar; Smialowski, Pawel; Goebels, Florian; Wachinger, Benedikt; Ruepp, Andreas; Frishman, Dmitrij

    2014-01-01

    Knowledge about non-interacting proteins (NIPs) is important for training the algorithms to predict protein-protein interactions (PPIs) and for assessing the false positive rates of PPI detection efforts. We present the second version of Negatome, a database of proteins and protein domains that are unlikely to engage in physical interactions (available online at http://mips.helmholtz-muenchen.de/proj/ppi/negatome). Negatome is derived by manual curation of literature and by analyzing three-dimensional structures of protein complexes. The main methodological innovation in Negatome 2.0 is the utilization of an advanced text mining procedure to guide the manual annotation process. Potential non-interactions were identified by a modified version of Excerbt, a text mining tool based on semantic sentence analysis. Manual verification shows that nearly a half of the text mining results with the highest confidence values correspond to NIP pairs. Compared to the first version the contents of the database have grown by over 300%.

  10. Practical guidelines addressing ethical issues pertaining to the curation of human locus-specific variation databases (LSDBs)

    PubMed Central

    Povey, Sue; Al Aqeel, Aida I; Cambon-Thomsen, Anne; Dalgleish, Raymond; den Dunnen, Johan T; Firth, Helen V; Greenblatt, Marc S; Barash, Carol Isaacson; Parker, Michael; Patrinos, George P; Savige, Judith; Sobrido, Maria-Jesus; Winship, Ingrid; Cotton, Richard GH

    2010-01-01

    More than 1,000 Web-based locus-specific variation databases (LSDBs) are listed on the Website of the Human Genetic Variation Society (HGVS). These individual efforts, which often relate phenotype to genotype, are a valuable source of information for clinicians, patients, and their families, as well as for basic research. The initiators of the Human Variome Project recently recognized that having access to some of the immense resources of unpublished information already present in diagnostic laboratories would provide critical data to help manage genetic disorders. However, there are significant ethical issues involved in sharing these data worldwide. An international working group presents second-generation guidelines addressing ethical issues relating to the curation of human LSDBs that provide information via a Web-based interface. It is intended that these should help current and future curators and may also inform the future decisions of ethics committees and legislators. These guidelines have been reviewed by the Ethics Committee of the Human Genome Organization (HUGO). Hum Mutat 31:–6, 2010. © 2010 Wiley-Liss, Inc. PMID:20683926

  11. R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring

    PubMed Central

    Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès

    2016-01-01

    Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the

  12. R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring.

    PubMed

    Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès

    2016-01-01

    Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the

  13. NEMiD: A Web-Based Curated Microbial Diversity Database with Geo-Based Plotting

    PubMed Central

    Bhattacharjee, Kaushik; Joshi, Santa Ram

    2014-01-01

    The majority of the Earth's microbes remain unknown, and that their potential utility cannot be exploited until they are discovered and characterized. They provide wide scope for the development of new strains as well as biotechnological uses. The documentation and bioprospection of microorganisms carry enormous significance considering their relevance to human welfare. This calls for an urgent need to develop a database with emphasis on the microbial diversity of the largest untapped reservoirs in the biosphere. The data annotated in the North-East India Microbial database (NEMiD) were obtained by the isolation and characterization of microbes from different parts of the Eastern Himalayan region. The database was constructed as a relational database management system (RDBMS) for data storage in MySQL in the back-end on a Linux server and implemented in an Apache/PHP environment. This database provides a base for understanding the soil microbial diversity pattern in this megabiodiversity hotspot and indicates the distribution patterns of various organisms along with identification. The NEMiD database is freely available at www.mblabnehu.info/nemid/. PMID:24714636

  14. Classifying the bacterial gut microbiota of termites and cockroaches: A curated phylogenetic reference database (DictDb).

    PubMed

    Mikaelyan, Aram; Köhler, Tim; Lampert, Niclas; Rohland, Jeffrey; Boga, Hamadi; Meuser, Katja; Brune, Andreas

    2015-10-01

    Recent developments in sequencing technology have given rise to a large number of studies that assess bacterial diversity and community structure in termite and cockroach guts based on large amplicon libraries of 16S rRNA genes. Although these studies have revealed important ecological and evolutionary patterns in the gut microbiota, classification of the short sequence reads is limited by the taxonomic depth and resolution of the reference databases used in the respective studies. Here, we present a curated reference database for accurate taxonomic analysis of the bacterial gut microbiota of dictyopteran insects. The Dictyopteran gut microbiota reference Database (DictDb) is based on the Silva database but was significantly expanded by the addition of clones from 11 mostly unexplored termite and cockroach groups, which increased the inventory of bacterial sequences from dictyopteran guts by 26%. The taxonomic depth and resolution of DictDb was significantly improved by a general revision of the taxonomic guide tree for all important lineages, including a detailed phylogenetic analysis of the Treponema and Alistipes complexes, the Fibrobacteres, and the TG3 phylum. The performance of this first documented version of DictDb (v. 3.0) using the revised taxonomic guide tree in the classification of short-read libraries obtained from termites and cockroaches was highly superior to that of the current Silva and RDP databases. DictDb uses an informative nomenclature that is consistent with the literature also for clades of uncultured bacteria and provides an invaluable tool for anyone exploring the gut community structure of termites and cockroaches.

  15. The development of an Ada programming support environment database: SEAD (Software Engineering and Ada Database), user's manual

    NASA Technical Reports Server (NTRS)

    Liaw, Morris; Evesson, Donna

    1988-01-01

    This is a manual for users of the Software Engineering and Ada Database (SEAD). SEAD was developed to provide an information resource to NASA and NASA contractors with respect to Ada-based resources and activities that are available or underway either in NASA or elsewhere in the worldwide Ada community. The sharing of such information will reduce the duplication of effort while improving quality in the development of future software systems. The manual describes the organization of the data in SEAD, the user interface from logging in to logging out, and concludes with a ten chapter tutorial on how to use the information in SEAD. Two appendices provide quick reference for logging into SEAD and using the keyboard of an IBM 3270 or VT100 computer terminal.

  16. The Coral Trait Database, a curated database of trait information for coral species from the global oceans

    NASA Astrophysics Data System (ADS)

    Madin, Joshua S.; Anderson, Kristen D.; Andreasen, Magnus Heide; Bridge, Tom C. L.; Cairns, Stephen D.; Connolly, Sean R.; Darling, Emily S.; Diaz, Marcela; Falster, Daniel S.; Franklin, Erik C.; Gates, Ruth D.; Hoogenboom, Mia O.; Huang, Danwei; Keith, Sally A.; Kosnik, Matthew A.; Kuo, Chao-Yang; Lough, Janice M.; Lovelock, Catherine E.; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M.; Pochon, Xavier; Pratchett, Morgan S.; Putnam, Hollie M.; Roberts, T. Edward; Stat, Michael; Wallace, Carden C.; Widman, Elizabeth; Baird, Andrew H.

    2016-03-01

    Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism’s function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research.

  17. The Coral Trait Database, a curated database of trait information for coral species from the global oceans.

    PubMed

    Madin, Joshua S; Anderson, Kristen D; Andreasen, Magnus Heide; Bridge, Tom C L; Cairns, Stephen D; Connolly, Sean R; Darling, Emily S; Diaz, Marcela; Falster, Daniel S; Franklin, Erik C; Gates, Ruth D; Hoogenboom, Mia O; Huang, Danwei; Keith, Sally A; Kosnik, Matthew A; Kuo, Chao-Yang; Lough, Janice M; Lovelock, Catherine E; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M; Pochon, Xavier; Pratchett, Morgan S; Putnam, Hollie M; Roberts, T Edward; Stat, Michael; Wallace, Carden C; Widman, Elizabeth; Baird, Andrew H

    2016-03-29

    Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism's function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research.

  18. The Coral Trait Database, a curated database of trait information for coral species from the global oceans

    PubMed Central

    Madin, Joshua S.; Anderson, Kristen D.; Andreasen, Magnus Heide; Bridge, Tom C.L.; Cairns, Stephen D.; Connolly, Sean R.; Darling, Emily S.; Diaz, Marcela; Falster, Daniel S.; Franklin, Erik C.; Gates, Ruth D.; Hoogenboom, Mia O.; Huang, Danwei; Keith, Sally A.; Kosnik, Matthew A.; Kuo, Chao-Yang; Lough, Janice M.; Lovelock, Catherine E.; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M.; Pochon, Xavier; Pratchett, Morgan S.; Putnam, Hollie M.; Roberts, T. Edward; Stat, Michael; Wallace, Carden C.; Widman, Elizabeth; Baird, Andrew H.

    2016-01-01

    Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism’s function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research. PMID:27023900

  19. dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes.

    PubMed

    Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P S

    2016-01-18

    We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem.

  20. dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes

    NASA Astrophysics Data System (ADS)

    Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P. S.

    2016-01-01

    We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem.

  1. DREMECELS: A Curated Database for Base Excision and Mismatch Repair Mechanisms Associated Human Malignancies

    PubMed Central

    Shukla, Ankita; Singh, Tiratha Raj

    2016-01-01

    DNA repair mechanisms act as a warrior combating various damaging processes that ensue critical malignancies. DREMECELS was designed considering the malignancies with frequent alterations in DNA repair pathways, that is, colorectal and endometrial cancers, associated with Lynch syndrome (also known as HNPCC). Since lynch syndrome carries high risk (~40–60%) for both cancers, therefore we decided to cover all three diseases in this portal. Although a large population is presently affected by these malignancies, many resources are available for various cancer types but no database archives information on the genes specifically for only these cancers and disorders. The database contains 156 genes and two repair mechanisms, base excision repair (BER) and mismatch repair (MMR). Other parameters include some of the regulatory processes that have roles in these disease progressions due to incompetent repair mechanisms, specifically BER and MMR. However, our unique database mainly provides qualitative and quantitative information on these cancer types along with methylation, drug sensitivity, miRNAs, copy number variation (CNV) and somatic mutations data. This database would serve the scientific community by providing integrated information on these disease types, thus sustaining diagnostic and therapeutic processes. This repository would serve as an excellent accompaniment for researchers and biomedical professionals and facilitate in understanding such critical diseases. DREMECELS is publicly available at http://www.bioinfoindia.org/dremecels. PMID:27276067

  2. The Aspergillus Genome Database: multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations

    PubMed Central

    Cerqueira, Gustavo C.; Arnaud, Martha B.; Inglis, Diane O.; Skrzypek, Marek S.; Binkley, Gail; Simison, Matt; Miyasato, Stuart R.; Binkley, Jonathan; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Sherlock, Gavin; Wortman, Jennifer R.

    2014-01-01

    The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome. PMID:24194595

  3. aglgenes, A curated and searchable database of archaeal N-glycosylation pathway components.

    PubMed

    Godin, Noa; Eichler, Jerry

    2014-01-01

    Whereas N-glycosylation is a posttranslational modification performed across evolution, the archaeal version of this protein-processing event presents a degree of diversity not seen in either bacteria or eukarya. Accordingly, archaeal N-glycosylation relies on a large number of enzymes that are often species-specific or restricted to a select group of species. As such, there is a need for an organized platform upon which amassing information about archaeal glycosylation (agl) genes can rest. Accordingly, the aglgenes database provides detailed descriptions of experimentally characterized archaeal N-glycosyation pathway components. For each agl gene, genomic information, supporting literature and relevant external links are provided at a functional intuitive web-interface designed for data browsing. Routine updates ensure that novel experimental information on genes and proteins contributing to archaeal N-glycosylation is incorporated into aglgenes in a timely manner. As such, aglgenes represents a specialized resource for sharing validated experimental information online, providing support for workers in the field of archaeal protein glycosylation. Database URL: www.bgu.ac.il/aglgenes.

  4. Development of an Ada programming support environment database SEAD (Software Engineering and Ada Database) administration manual

    NASA Technical Reports Server (NTRS)

    Liaw, Morris; Evesson, Donna

    1988-01-01

    Software Engineering and Ada Database (SEAD) was developed to provide an information resource to NASA and NASA contractors with respect to Ada-based resources and activities which are available or underway either in NASA or elsewhere in the worldwide Ada community. The sharing of such information will reduce duplication of effort while improving quality in the development of future software systems. SEAD data is organized into five major areas: information regarding education and training resources which are relevant to the life cycle of Ada-based software engineering projects such as those in the Space Station program; research publications relevant to NASA projects such as the Space Station Program and conferences relating to Ada technology; the latest progress reports on Ada projects completed or in progress both within NASA and throughout the free world; Ada compilers and other commercial products that support Ada software development; and reusable Ada components generated both within NASA and from elsewhere in the free world. This classified listing of reusable components shall include descriptions of tools, libraries, and other components of interest to NASA. Sources for the data include technical newletters and periodicals, conference proceedings, the Ada Information Clearinghouse, product vendors, and project sponsors and contractors.

  5. Building an efficient curation workflow for the Arabidopsis literature corpus.

    PubMed

    Li, Donghui; Berardini, Tanya Z; Muller, Robert J; Huala, Eva

    2012-01-01

    TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org

  6. Curation of food-relevant chemicals in ToxCast.

    PubMed

    Karmaus, Agnes L; Trautman, Thomas D; Krishan, Mansi; Filer, Dayne L; Fix, Laurel A

    2017-03-07

    High-throughput in vitro assays and exposure prediction efforts are paving the way for modeling chemical risk; however, the utility of such extensive datasets can be limited or misleading when annotation fails to capture current chemical usage. To address this data gap and provide context for food-use in the United States (US), manual curation of food-relevant chemicals in ToxCast was conducted. Chemicals were categorized into three food-use categories: (1) direct food additives, (2) indirect food additives, or (3) pesticide residues. Manual curation resulted in 30% of chemicals having new annotation as well as the removal of 319 chemicals, most due to cancellation or only foreign usage. These results highlight that manual curation of chemical use information provided significant insight affecting the overall inventory and chemical categorization. In total, 1211 chemicals were confirmed as current day food-use in the US by manual curation; 1154 of these chemicals were also identified as food-related in the globally sourced chemical use information from Chemical/Product Categories database (CPCat). The refined list of food-use chemicals and the sources highlighted for compiling annotated information required to confirm food-use are valuable resources for providing needed context when evaluating large-scale inventories such as ToxCast.

  7. The NCBI Taxonomy database.

    PubMed

    Federhen, Scott

    2012-01-01

    The NCBI Taxonomy database (http://www.ncbi.nlm.nih.gov/taxonomy) is the standard nomenclature and classification repository for the International Nucleotide Sequence Database Collaboration (INSDC), comprising the GenBank, ENA (EMBL) and DDBJ databases. It includes organism names and taxonomic lineages for each of the sequences represented in the INSDC's nucleotide and protein sequence databases. The taxonomy database is manually curated by a small group of scientists at the NCBI who use the current taxonomic literature to maintain a phylogenetic taxonomy for the source organisms represented in the sequence databases. The taxonomy database is a central organizing hub for many of the resources at the NCBI, and provides a means for clustering elements within other domains of NCBI web site, for internal linking between domains of the Entrez system and for linking out to taxon-specific external resources on the web. Our primary purpose is to index the domain of sequences as conveniently as possible for our user community.

  8. The needs for chemistry standards, database tools and data curation at the chemical-biology interface (SLAS meeting)

    EPA Science Inventory

    This presentation will highlight known challenges with the production of high quality chemical databases and outline recent efforts made to address these challenges. Specific examples will be provided illustrating these challenges within the U.S. Environmental Protection Agency ...

  9. PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy.

    PubMed

    Decelle, Johan; Romac, Sarah; Stern, Rowena F; Bendif, El Mahdi; Zingone, Adriana; Audic, Stéphane; Guiry, Michael D; Guillou, Laure; Tessier, Désiré; Le Gall, Florence; Gourvil, Priscillia; Dos Santos, Adriana L; Probert, Ian; Vaulot, Daniel; de Vargas, Colomban; Christen, Richard

    2015-11-01

    Photosynthetic eukaryotes have a critical role as the main producers in most ecosystems of the biosphere. The ongoing environmental metabarcoding revolution opens the perspective for holistic ecosystems biological studies of these organisms, in particular the unicellular microalgae that often lack distinctive morphological characters and have complex life cycles. To interpret environmental sequences, metabarcoding necessarily relies on taxonomically curated databases containing reference sequences of the targeted gene (or barcode) from identified organisms. To date, no such reference framework exists for photosynthetic eukaryotes. In this study, we built the PhytoREF database that contains 6490 plastidial 16S rDNA reference sequences that originate from a large diversity of eukaryotes representing all known major photosynthetic lineages. We compiled 3333 amplicon sequences available from public databases and 879 sequences extracted from plastidial genomes, and generated 411 novel sequences from cultured marine microalgal strains belonging to different eukaryotic lineages. A total of 1867 environmental Sanger 16S rDNA sequences were also included in the database. Stringent quality filtering and a phylogeny-based taxonomic classification were applied for each 16S rDNA sequence. The database mainly focuses on marine microalgae, but sequences from land plants (representing half of the PhytoREF sequences) and freshwater taxa were also included to broaden the applicability of PhytoREF to different aquatic and terrestrial habitats. PhytoREF, accessible via a web interface (http://phytoref.fr), is a new resource in molecular ecology to foster the discovery, assessment and monitoring of the diversity of photosynthetic eukaryotes using high-throughput sequencing.

  10. Egas: a collaborative and interactive document curation platform.

    PubMed

    Campos, David; Lourenço, Jóni; Matos, Sérgio; Oliveira, José Luís

    2014-01-01

    With the overwhelming amount of biomedical textual information being produced, several manual curation efforts have been set up to extract and store concepts and their relationships into structured resources. As manual annotation is a demanding and expensive task, computerized solutions were developed to perform such tasks automatically. However, high-end information extraction techniques are still not widely used by biomedical research communities, mainly because of the lack of standards and limitations in usability. Interactive annotation tools intend to fill this gap, taking advantage of automatic techniques and existing knowledge bases to assist expert curators in their daily tasks. This article presents Egas, a web-based platform for biomedical text mining and assisted curation with highly usable interfaces for manual and automatic in-line annotation of concepts and relations. A comprehensive set of de facto standard knowledge bases are integrated and indexed to provide straightforward concept normalization features. Real-time collaboration and conversation functionalities allow discussing details of the annotation task as well as providing instant feedback of curator's interactions. Egas also provides interfaces for on-demand management of the annotation task settings and guidelines, and supports standard formats and literature services to import and export documents. By taking advantage of Egas, we participated in the BioCreative IV interactive annotation task, targeting the assisted identification of protein-protein interactions described in PubMed abstracts related to neuropathological disorders. When evaluated by expert curators, it obtained positive scores in terms of usability, reliability and performance. These results, together with the provided innovative features, place Egas as a state-of-the-art solution for fast and accurate curation of information, facilitating the task of creating and updating knowledge bases and annotated resources. Database

  11. miRGate: a curated database of human, mouse and rat miRNA–mRNA targets

    PubMed Central

    Andrés-León, Eduardo; González Peña, Daniel; Gómez-López, Gonzalo; Pisano, David G.

    2015-01-01

    MicroRNAs (miRNAs) are small non-coding elements involved in the post-transcriptional down-regulation of gene expression through base pairing with messenger RNAs (mRNAs). Through this mechanism, several miRNA–mRNA pairs have been described as critical in the regulation of multiple cellular processes, including early embryonic development and pathological conditions. Many of these pairs (such as miR-15 b/BCL2 in apoptosis or BART-6/BCL6 in diffuse large B-cell lymphomas) were experimentally discovered and/or computationally predicted. Available tools for target prediction are usually based on sequence matching, thermodynamics and conservation, among other approaches. Nevertheless, the main issue on miRNA–mRNA pair prediction is the little overlapping results among different prediction methods, or even with experimentally validated pairs lists, despite the fact that all rely on similar principles. To circumvent this problem, we have developed miRGate, a database containing novel computational predicted miRNA–mRNA pairs that are calculated using well-established algorithms. In addition, it includes an updated and complete dataset of sequences for both miRNA and mRNAs 3′-Untranslated region from human (including human viruses), mouse and rat, as well as experimentally validated data from four well-known databases. The underlying methodology of miRGate has been successfully applied to independent datasets providing predictions that were convincingly validated by functional assays. miRGate is an open resource available at http://mirgate.bioinfo.cnio.es. For programmatic access, we have provided a representational state transfer web service application programming interface that allows accessing the database at http://mirgate.bioinfo.cnio.es/API/ Database URL: http://mirgate.bioinfo.cnio.es PMID:25858286

  12. AT_CHLORO, a Comprehensive Chloroplast Proteome Database with Subplastidial Localization and Curated Information on Envelope Proteins*

    PubMed Central

    Ferro, Myriam; Brugière, Sabine; Salvi, Daniel; Seigneurin-Berny, Daphné; Court, Magali; Moyet, Lucas; Ramus, Claire; Miras, Stéphane; Mellal, Mourad; Le Gall, Sophie; Kieffer-Jaquinod, Sylvie; Bruley, Christophe; Garin, Jérôme; Joyard, Jacques; Masselon, Christophe; Rolland, Norbert

    2010-01-01

    Recent advances in the proteomics field have allowed a series of high throughput experiments to be conducted on chloroplast samples, and the data are available in several public databases. However, the accurate localization of many chloroplast proteins often remains hypothetical. This is especially true for envelope proteins. We went a step further into the knowledge of the chloroplast proteome by focusing, in the same set of experiments, on the localization of proteins in the stroma, the thylakoids, and envelope membranes. LC-MS/MS-based analyses first allowed building the AT_CHLORO database (http://www.grenoble.prabi.fr/protehome/grenoble-plant-proteomics/), a comprehensive repertoire of the 1323 proteins, identified by 10,654 unique peptide sequences, present in highly purified chloroplasts and their subfractions prepared from Arabidopsis thaliana leaves. This database also provides extensive proteomics information (peptide sequences and molecular weight, chromatographic retention times, MS/MS spectra, and spectral count) for a unique chloroplast protein accurate mass and time tag database gathering identified peptides with their respective and precise analytical coordinates, molecular weight, and retention time. We assessed the partitioning of each protein in the three chloroplast compartments by using a semiquantitative proteomics approach (spectral count). These data together with an in-depth investigation of the literature were compiled to provide accurate subplastidial localization of previously known and newly identified proteins. A unique knowledge base containing extensive information on the proteins identified in envelope fractions was thus obtained, allowing new insights into this membrane system to be revealed. Altogether, the data we obtained provide unexpected information about plastidial or subplastidial localization of some proteins that were not suspected to be associated to this membrane system. The spectral counting-based strategy was further

  13. Solid Waste Projection Model: Database (Version 1.4). Technical reference manual

    SciTech Connect

    Blackburn, C.; Cillan, T.

    1993-09-01

    The Solid Waste Projection Model (SWPM) system is an analytical tool developed by Pacific Northwest Laboratory (PNL) for Westinghouse Hanford Company (WHC). The SWPM system provides a modeling and analysis environment that supports decisions in the process of evaluating various solid waste management alternatives. This document, one of a series describing the SWPM system, contains detailed information regarding the software and data structures utilized in developing the SWPM Version 1.4 Database. This document is intended for use by experienced database specialists and supports database maintenance, utility development, and database enhancement. Those interested in using the SWPM database should refer to the SWPM Database User`s Guide. This document is available from the PNL Task M Project Manager (D. L. Stiles, 509-372-4358), the PNL Task L Project Manager (L. L. Armacost, 509-372-4304), the WHC Restoration Projects Section Manager (509-372-1443), or the WHC Waste Characterization Manager (509-372-1193).

  14. Solid Waste Projection Model: Database (Version 1.3). Technical reference manual

    SciTech Connect

    Blackburn, C.L.

    1991-11-01

    The Solid Waste Projection Model (SWPM) system is an analytical tool developed by Pacific Northwest Laboratory (PNL) for Westinghouse Hanford Company (WHC). The SWPM system provides a modeling and analysis environment that supports decisions in the process of evaluating various solid waste management alternatives. This document, one of a series describing the SWPM system, contains detailed information regarding the software and data structures utilized in developing the SWPM Version 1.3 Database. This document is intended for use by experienced database specialists and supports database maintenance, utility development, and database enhancement.

  15. A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework.

    PubMed

    Bandrowski, A E; Cachat, J; Li, Y; Müller, H M; Sternberg, P W; Ciccarese, P; Clark, T; Marenco, L; Wang, R; Astakhov, V; Grethe, J S; Martone, M E

    2012-01-01

    The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is 'hidden' from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to

  16. A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework

    PubMed Central

    Bandrowski, A. E.; Cachat, J.; Li, Y.; Müller, H. M.; Sternberg, P. W.; Ciccarese, P.; Clark, T.; Marenco, L.; Wang, R.; Astakhov, V.; Grethe, J. S.; Martone, M. E.

    2012-01-01

    The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is ‘hidden’ from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to

  17. Quality Control of Biomedicinal Allergen Products - Highly Complex Isoallergen Composition Challenges Standard MS Database Search and Requires Manual Data Analyses.

    PubMed

    Spiric, Jelena; Engin, Anna M; Karas, Michael; Reuter, Andreas

    2015-01-01

    Allergy against birch pollen is among the most common causes of spring pollinosis in Europe and is diagnosed and treated using extracts from natural sources. Quality control is crucial for safe and effective diagnosis and treatment. However, current methods are very difficult to standardize and do not address individual allergen or isoallergen composition. MS provides information regarding selected proteins or the entire proteome and could overcome the aforementioned limitations. We studied the proteome of birch pollen, focusing on allergens and isoallergens, to clarify which of the 93 published sequence variants of the major allergen, Bet v 1, are expressed as proteins within one source material in parallel. The unexpectedly complex Bet v 1 isoallergen composition required manual data interpretation and a specific design of databases, as current database search engines fail to unambiguously assign spectra to highly homologous, partially identical proteins. We identified 47 non-allergenic proteins and all 5 known birch pollen allergens, and unambiguously proved the existence of 18 Bet v 1 isoallergens and variants by manual data analysis. This highly complex isoallergen composition raises questions whether isoallergens can be ignored or must be included for the quality control of allergen products, and which data analysis strategies are to be applied.

  18. Data Curation

    ERIC Educational Resources Information Center

    Mallon, Melissa, Ed.

    2012-01-01

    In their Top Trends of 2012, the Association of College and Research Libraries (ACRL) named data curation as one of the issues to watch in academic libraries in the near future (ACRL, 2012, p. 312). Data curation can be summarized as "the active and ongoing management of data through its life cycle of interest and usefulness to scholarship,…

  19. How much does curation cost?

    PubMed

    Karp, Peter D

    2016-01-01

    NIH administrators have recently expressed concerns about the cost of curation for biological databases. However, they did not articulate the exact costs of curation. Here we calculate the cost of biocuration of articles for the EcoCyc database as $219 per article over a 5-year period. That cost is 6-15% of the cost of open-access publication fees for publishing biomedical articles, and we estimate that cost is 0.088% of the cost of the overall research project that generated the experimental results. Thus, curation costs are small in an absolute sense, and represent a miniscule fraction of the cost of the research.

  20. How much does curation cost?

    PubMed Central

    2016-01-01

    NIH administrators have recently expressed concerns about the cost of curation for biological databases. However, they did not articulate the exact costs of curation. Here we calculate the cost of biocuration of articles for the EcoCyc database as $219 per article over a 5-year period. That cost is 6–15% of the cost of open-access publication fees for publishing biomedical articles, and we estimate that cost is 0.088% of the cost of the overall research project that generated the experimental results. Thus, curation costs are small in an absolute sense, and represent a miniscule fraction of the cost of the research. PMID:27504008

  1. Egas: a collaborative and interactive document curation platform

    PubMed Central

    Campos, David; Lourenço, Jóni; Matos, Sérgio; Oliveira, José Luís

    2014-01-01

    With the overwhelming amount of biomedical textual information being produced, several manual curation efforts have been set up to extract and store concepts and their relationships into structured resources. As manual annotation is a demanding and expensive task, computerized solutions were developed to perform such tasks automatically. However, high-end information extraction techniques are still not widely used by biomedical research communities, mainly because of the lack of standards and limitations in usability. Interactive annotation tools intend to fill this gap, taking advantage of automatic techniques and existing knowledge bases to assist expert curators in their daily tasks. This article presents Egas, a web-based platform for biomedical text mining and assisted curation with highly usable interfaces for manual and automatic in-line annotation of concepts and relations. A comprehensive set of de facto standard knowledge bases are integrated and indexed to provide straightforward concept normalization features. Real-time collaboration and conversation functionalities allow discussing details of the annotation task as well as providing instant feedback of curator’s interactions. Egas also provides interfaces for on-demand management of the annotation task settings and guidelines, and supports standard formats and literature services to import and export documents. By taking advantage of Egas, we participated in the BioCreative IV interactive annotation task, targeting the assisted identification of protein–protein interactions described in PubMed abstracts related to neuropathological disorders. When evaluated by expert curators, it obtained positive scores in terms of usability, reliability and performance. These results, together with the provided innovative features, place Egas as a state-of-the-art solution for fast and accurate curation of information, facilitating the task of creating and updating knowledge bases and annotated resources

  2. GSOSTATS Database: USAF Synchronous Satellite Catalog Data Conversion Software. User's Guide and Software Maintenance Manual, Version 2.1

    NASA Technical Reports Server (NTRS)

    Mallasch, Paul G.; Babic, Slavoljub

    1994-01-01

    The United States Air Force (USAF) provides NASA Lewis Research Center with monthly reports containing the Synchronous Satellite Catalog and the associated Two Line Mean Element Sets. The USAF Synchronous Satellite Catalog supplies satellite orbital parameters collected by an automated monitoring system and provided to Lewis Research Center as text files on magnetic tape. Software was developed to facilitate automated formatting, data normalization, cross-referencing, and error correction of Synchronous Satellite Catalog files before loading into the NASA Geosynchronous Satellite Orbital Statistics Database System (GSOSTATS). This document contains the User's Guide and Software Maintenance Manual with information necessary for installation, initialization, start-up, operation, error recovery, and termination of the software application. It also contains implementation details, modification aids, and software source code adaptations for use in future revisions.

  3. Argo: an integrative, interactive, text mining-based workbench supporting curation.

    PubMed

    Rak, Rafal; Rowley, Andrew; Black, William; Ananiadou, Sophia

    2012-01-01

    Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the curation pipeline, catering for a variety of tasks, types of information and applications. Processing components usually come from different sources and often lack interoperability. The well established Unstructured Information Management Architecture is a framework that addresses interoperability by defining common data structures and interfaces. However, most of the efforts are targeted towards software developers and are not suitable for curators, or are otherwise inconvenient to use on a higher level of abstraction. To overcome these issues we introduce Argo, an interoperable, integrative, interactive and collaborative system for text analysis with a convenient graphic user interface to ease the development of processing workflows and boost productivity in labour-intensive manual curation. Robust, scalable text analytics follow a modular approach, adopting component modules for distinct levels of text analysis. The user interface is available entirely through a web browser that saves the user from going through often complicated and platform-dependent installation procedures. Argo comes with a predefined set of processing components commonly used in text analysis, while giving the users the ability to deposit their own components. The system accommodates various areas and levels of user expertise, from TM and computational linguistics to ontology-based curation. One of the key functionalities of Argo is its ability to seamlessly incorporate user-interactive components, such as manual annotation editors, into otherwise completely automatic pipelines. As a use case, we demonstrate the functionality of an in

  4. User manual for CSP{_}VANA: A check standards measurement and database program for microwave network analyzers

    SciTech Connect

    Duda, L.E.

    1997-10-01

    Vector network analyzers are a convenient way to measure scattering parameters of a variety of microwave devices. However, these instruments, unlike oscilloscopes for example, require a relatively high degree of user knowledge and expertise. Due to the complexity of the instrument and of the calibration process, there are many ways in which an incorrect measurement may be produced. The Microwave Project, which is part of SNL`s Primary Standards laboratory, routinely uses check standards to verify that the network analyzer is operating properly. In the past, these measurements were recorded manually and, sometimes, interpretation of the results was problematic. To aid the measurement assurance process, a software program was developed to automatically measure a check standard and compare the new measurements with a historical database of measurements of the same device. The program acquires new measurement data from selected check standards, plots the new data against the mean and standard deviation of prior data for the same check standard, and updates the database files for the check standard. The program is entirely menu-driven requiring little additional work by the user. This report describes the function of the software, including a discussion of its capabilities, and the way in which the software is used in the lab.

  5. Curation of characterized glycoside hydrolases of Fungal origin

    PubMed Central

    Murphy, Caitlin; Powlowski, Justin; Wu, Min; Butler, Greg; Tsang, Adrian

    2011-01-01

    Fungi produce a wide range of extracellular enzymes to break down plant cell walls, which are composed mainly of cellulose, lignin and hemicellulose. Among them are the glycoside hydrolases (GH), the largest and most diverse family of enzymes active on these substrates. To facilitate research and development of enzymes for the conversion of cell-wall polysaccharides into fermentable sugars, we have manually curated a comprehensive set of characterized fungal glycoside hydrolases. Characterized glycoside hydrolases were retrieved from protein and enzyme databases, as well as literature repositories. A total of 453 characterized glycoside hydrolases have been cataloged. They come from 131 different fungal species, most of which belong to the phylum Ascomycota. These enzymes represent 46 different GH activities and cover 44 of the 115 CAZy GH families. In addition to enzyme source and enzyme family, available biochemical properties such as temperature and pH optima, specific activity, kinetic parameters and substrate specificities were recorded. To simplify comparative studies, enzyme and species abbreviations have been standardized, Gene Ontology terms assigned and reference to supporting evidence provided. The annotated genes have been organized in a searchable, online database called mycoCLAP (Characterized Lignocellulose-Active Proteins of fungal origin). It is anticipated that this manually curated collection of biochemically characterized fungal proteins will be used to enhance functional annotation of novel GH genes. Database URL: http://mycoCLAP.fungalgenomics.ca/ PMID:21622642

  6. Human events reference for ATHEANA (HERA) database description and preliminary user`s manual

    SciTech Connect

    Auflick, J.L.; Hahn, H.A.; Pond, D.J.

    1998-05-27

    The Technique for Human Error Analysis (ATHEANA) is a newly developed human reliability analysis (HRA) methodology that aims to facilitate better representation and integration of human performance into probabilistic risk assessment (PRA) modeling and quantification by analyzing risk-significant operating experience in the context of existing behavioral science models. The fundamental premise of ATHEANA is that error-forcing contexts (EFCs), which refer to combinations of equipment/material conditions and performance shaping factors (PSFs), set up or create the conditions under which unsafe actions (UAs) can occur. Because ATHEANA relies heavily on the analysis of operational events that have already occurred as a mechanism for generating creative thinking about possible EFCs, a database, called the Human Events Reference for ATHEANA (HERA), has been developed to support the methodology. This report documents the initial development efforts for HERA.

  7. Human Events Reference for ATHEANA (HERA) Database Description and Preliminary User's Manual

    SciTech Connect

    Auflick, J.L.

    1999-08-12

    The Technique for Human Error Analysis (ATHEANA) is a newly developed human reliability analysis (HRA) methodology that aims to facilitate better representation and integration of human performance into probabilistic risk assessment (PRA) modeling and quantification by analyzing risk-significant operating experience in the context of existing behavioral science models. The fundamental premise of ATHEANA is that error forcing contexts (EFCs), which refer to combinations of equipment/material conditions and performance shaping factors (PSFs), set up or create the conditions under which unsafe actions (UAs) can occur. Because ATHEANA relies heavily on the analysis of operational events that have already occurred as a mechanism for generating creative thinking about possible EFCs, a database (db) of analytical operational events, called the Human Events Reference for ATHEANA (HERA), has been developed to support the methodology. This report documents the initial development efforts for HERA.

  8. NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases.

    PubMed

    Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin; Senger, Philipp

    2015-01-01

    Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article's supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in

  9. Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts.

    PubMed

    Neves, Mariana; Damaschun, Alexander; Mah, Nancy; Lekschas, Fritz; Seltmann, Stefanie; Stachelscheid, Harald; Fontaine, Jean-Fred; Kurtz, Andreas; Leser, Ulf

    2013-01-01

    Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ~50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org/

  10. Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts

    PubMed Central

    Neves, Mariana; Damaschun, Alexander; Mah, Nancy; Lekschas, Fritz; Seltmann, Stefanie; Stachelscheid, Harald; Fontaine, Jean-Fred; Kurtz, Andreas; Leser, Ulf

    2013-01-01

    Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ∼50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org/ PMID:23599415

  11. System administrator's manual (SAM) for the enhanced logistics intratheater support tool (ELIST) database instance segment version 8.1.0.0 for solaris 7.

    SciTech Connect

    Dritz, K.

    2002-03-06

    This document is the System Administrator's Manual (SAM) for the Enhanced Logistics Intratheater Support Tool (ELIST) Database Instance Segment. It covers errors that can arise during the segment's installation and deinstallation, and it outlines appropriate recovery actions. It also tells how to change the password for the SYSTEM account of the database instance after the instance is created, and it discusses the creation of a suitable database instance for ELIST by means other than the installation of the segment. The latter subject is covered in more depth than its introductory discussion in the Installation Procedures (IP) for the Enhanced Logistics Intratheater Support Tool (ELIST) Global Data Segment, Database Instance Segment, Database Fill Segment, Database Segment, Database Utility Segment, Software Segment, and Reference Data Segment (referred to in portions of this document as the ELIST IP). The information in this document is expected to be of use only rarely. Other than errors arising from the failure to follow instructions, difficulties are not expected to be encountered during the installation or deinstallation of the segment. By the same token, the need to create a database instance for ELIST by means other than the installation of the segment is expected to be the exception, rather than the rule. Most administrators will only need to be aware of the help that is provided in this document and will probably not actually need to read and make use of it.

  12. The Transporter Classification Database

    PubMed Central

    Saier, Milton H.; Reddy, Vamsee S.; Tamang, Dorjee G.; Västermark, Åke

    2014-01-01

    The Transporter Classification Database (TCDB; http://www.tcdb.org) serves as a common reference point for transport protein research. The database contains more than 10 000 non-redundant proteins that represent all currently recognized families of transmembrane molecular transport systems. Proteins in TCDB are organized in a five level hierarchical system, where the first two levels are the class and subclass, the second two are the family and subfamily, and the last one is the transport system. Superfamilies that contain multiple families are included as hyperlinks to the five tier TC hierarchy. TCDB includes proteins from all types of living organisms and is the only transporter classification system that is both universal and recognized by the International Union of Biochemistry and Molecular Biology. It has been expanded by manual curation, contains extensive text descriptions providing structural, functional, mechanistic and evolutionary information, is supported by unique software and is interconnected to many other relevant databases. TCDB is of increasing usefulness to the international scientific community and can serve as a model for the expansion of database technologies. This manuscript describes an update of the database descriptions previously featured in NAR database issues. PMID:24225317

  13. MetaBase—the wiki-database of biological databases

    PubMed Central

    Bolser, Dan M.; Chibon, Pierre-Yves; Palopoli, Nicolas; Gong, Sungsam; Jacob, Daniel; Angel, Victoria Dominguez Del; Swan, Dan; Bassi, Sebastian; González, Virginia; Suravajhala, Prashanth; Hwang, Seungwoo; Romano, Paolo; Edwards, Rob; Bishop, Bryan; Eargle, John; Shtatland, Timur; Provart, Nicholas J.; Clements, Dave; Renfro, Daniel P.; Bhak, Daeui; Bhak, Jong

    2012-01-01

    Biology is generating more data than ever. As a result, there is an ever increasing number of publicly available databases that analyse, integrate and summarize the available data, providing an invaluable resource for the biological community. As this trend continues, there is a pressing need to organize, catalogue and rate these resources, so that the information they contain can be most effectively exploited. MetaBase (MB) (http://MetaDatabase.Org) is a community-curated database containing more than 2000 commonly used biological databases. Each entry is structured using templates and can carry various user comments and annotations. Entries can be searched, listed, browsed or queried. The database was created using the same MediaWiki technology that powers Wikipedia, allowing users to contribute on many different levels. The initial release of MB was derived from the content of the 2007 Nucleic Acids Research (NAR) Database Issue. Since then, approximately 100 databases have been manually collected from the literature, and users have added information for over 240 databases. MB is synchronized annually with the static Molecular Biology Database Collection provided by NAR. To date, there have been 19 significant contributors to the project; each one is listed as an author here to highlight the community aspect of the project. PMID:22139927

  14. Curation, integration and visualization of bacterial virulence factors in PATRIC

    PubMed Central

    Mao, Chunhong; Abraham, David; Wattam, Alice R.; Wilson, Meredith J.C.; Shukla, Maulik; Yoo, Hyun Seung; Sobral, Bruno W.

    2015-01-01

    Motivation: We’ve developed a highly curated bacterial virulence factor (VF) library in PATRIC (Pathosystems Resource Integration Center, www.patricbrc.org) to support infectious disease research. Although several VF databases are available, there is still a need to incorporate new knowledge found in published experimental evidence and integrate these data with other information known for these specific VF genes, including genomic and other omics data. This integration supports the identification of VFs, comparative studies and hypothesis generation, which facilitates the understanding of virulence and pathogenicity. Results: We have manually curated VFs from six prioritized NIAID (National Institute of Allergy and Infectious Diseases) category A–C bacterial pathogen genera, Mycobacterium, Salmonella, Escherichia, Shigella, Listeria and Bartonella, using published literature. This curated information on virulence has been integrated with data from genomic functional annotations, trancriptomic experiments, protein–protein interactions and disease information already present in PATRIC. Such integration gives researchers access to a broad array of information about these individual genes, and also to a suite of tools to perform comparative genomic and transcriptomics analysis that are available at PATRIC. Availability and implementation: All tools and data are freely available at PATRIC (http://patricbrc.org). Contact: cmao@vbi.vt.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25273106

  15. Large scale explorative oligonucleotide probe selection for thousands of genetic groups on a computing grid: application to phylogenetic probe design using a curated small subunit ribosomal RNA gene database.

    PubMed

    Jaziri, Faouzi; Peyretaillade, Eric; Missaoui, Mohieddine; Parisot, Nicolas; Cipière, Sébastien; Denonfoux, Jérémie; Mahul, Antoine; Peyret, Pierre; Hill, David R C

    2014-01-01

    Phylogenetic Oligonucleotide Arrays (POAs) were recently adapted for studying the huge microbial communities in a flexible and easy-to-use way. POA coupled with the use of explorative probes to detect the unknown part is now one of the most powerful approaches for a better understanding of microbial community functioning. However, the selection of probes remains a very difficult task. The rapid growth of environmental databases has led to an exponential increase of data to be managed for an efficient design. Consequently, the use of high performance computing facilities is mandatory. In this paper, we present an efficient parallelization method to select known and explorative oligonucleotide probes at large scale using computing grids. We implemented a software that generates and monitors thousands of jobs over the European Computing Grid Infrastructure (EGI). We also developed a new algorithm for the construction of a high-quality curated phylogenetic database to avoid erroneous design due to bad sequence affiliation. We present here the performance and statistics of our method on real biological datasets based on a phylogenetic prokaryotic database at the genus level and a complete design of about 20,000 probes for 2,069 genera of prokaryotes.

  16. Creation of a genome-wide metabolic pathway database for Populus trichocarpa using a new approach for reconstruction and curation of metabolic pathways for plants.

    PubMed

    Zhang, Peifen; Dreher, Kate; Karthikeyan, A; Chi, Anjo; Pujar, Anuradha; Caspi, Ron; Karp, Peter; Kirkup, Vanessa; Latendresse, Mario; Lee, Cynthia; Mueller, Lukas A; Muller, Robert; Rhee, Seung Yon

    2010-08-01

    Metabolic networks reconstructed from sequenced genomes or transcriptomes can help visualize and analyze large-scale experimental data, predict metabolic phenotypes, discover enzymes, engineer metabolic pathways, and study metabolic pathway evolution. We developed a general approach for reconstructing metabolic pathway complements of plant genomes. Two new reference databases were created and added to the core of the infrastructure: a comprehensive, all-plant reference pathway database, PlantCyc, and a reference enzyme sequence database, RESD, for annotating metabolic functions of protein sequences. PlantCyc (version 3.0) includes 714 metabolic pathways and 2,619 reactions from over 300 species. RESD (version 1.0) contains 14,187 literature-supported enzyme sequences from across all kingdoms. We used RESD, PlantCyc, and MetaCyc (an all-species reference metabolic pathway database), in conjunction with the pathway prediction software Pathway Tools, to reconstruct a metabolic pathway database, PoplarCyc, from the recently sequenced genome of Populus trichocarpa. PoplarCyc (version 1.0) contains 321 pathways with 1,807 assigned enzymes. Comparing PoplarCyc (version 1.0) with AraCyc (version 6.0, Arabidopsis [Arabidopsis thaliana]) showed comparable numbers of pathways distributed across all domains of metabolism in both databases, except for a higher number of AraCyc pathways in secondary metabolism and a 1.5-fold increase in carbohydrate metabolic enzymes in PoplarCyc. Here, we introduce these new resources and demonstrate the feasibility of using them to identify candidate enzymes for specific pathways and to analyze metabolite profiling data through concrete examples. These resources can be searched by text or BLAST, browsed, and downloaded from our project Web site (http://plantcyc.org).

  17. Quality Control of Biomedicinal Allergen Products – Highly Complex Isoallergen Composition Challenges Standard MS Database Search and Requires Manual Data Analyses

    PubMed Central

    Spiric, Jelena; Engin, Anna M.; Karas, Michael; Reuter, Andreas

    2015-01-01

    Allergy against birch pollen is among the most common causes of spring pollinosis in Europe and is diagnosed and treated using extracts from natural sources. Quality control is crucial for safe and effective diagnosis and treatment. However, current methods are very difficult to standardize and do not address individual allergen or isoallergen composition. MS provides information regarding selected proteins or the entire proteome and could overcome the aforementioned limitations. We studied the proteome of birch pollen, focusing on allergens and isoallergens, to clarify which of the 93 published sequence variants of the major allergen, Bet v 1, are expressed as proteins within one source material in parallel. The unexpectedly complex Bet v 1 isoallergen composition required manual data interpretation and a specific design of databases, as current database search engines fail to unambiguously assign spectra to highly homologous, partially identical proteins. We identified 47 non-allergenic proteins and all 5 known birch pollen allergens, and unambiguously proved the existence of 18 Bet v 1 isoallergens and variants by manual data analysis. This highly complex isoallergen composition raises questions whether isoallergens can be ignored or must be included for the quality control of allergen products, and which data analysis strategies are to be applied. PMID:26561299

  18. DIP: The Database of Interacting Proteins

    DOE Data Explorer

    The DIP Database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. By interaction, the DIP Database creators mean that two amino acid chains were experimentally identified to bind to each other. The database lists such pairs to aid those studying a particular protein-protein interaction but also those investigating entire regulatory and signaling pathways as well as those studying the organisation and complexity of the protein interaction network at the cellular level. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data. It is a relational database that can be searched by protein, sequence, motif, article information, and pathBLAST. The website also serves as an access point to a number of projects related to DIP, such as LiveDIP, The Database of Ligand-Receptor Partners (DLRP) and JDIP. Users have free and open access to DIP after login. [Taken from the DIP Guide and the DIP website] (Specialized Interface) (Registration Required)

  19. Crowd-sourcing and author submission as alternatives to professional curation.

    PubMed

    Karp, Peter D

    2016-01-01

    Can we decrease the costs of database curation by crowd-sourcing curation work or by offloading curation to publication authors? This perspective considers the significant experience accumulated by the bioinformatics community with these two alternatives to professional curation in the last 20 years; that experience should be carefully considered when formulating new strategies for biological databases. The vast weight of empirical evidence to date suggests that crowd-sourced curation is not a successful model for biological databases. Multiple approaches to crowd-sourced curation have been attempted by multiple groups, and extremely low participation rates by 'the crowd' are the overwhelming outcome. The author-curation model shows more promise for boosting curator efficiency. However, its limitations include that the quality of author-submitted annotations is uncertain, the response rate is low (but significant), and to date author curation has involved relatively simple forms of annotation involving one or a few types of data. Furthermore, shifting curation to authors may simply redistribute costs rather than decreasing costs; author curation may in fact increase costs because of the overhead involved in having every curating author learn what professional curators know: curation conventions, curation software and curation procedures.

  20. Crowd-sourcing and author submission as alternatives to professional curation

    PubMed Central

    Karp, Peter D.

    2016-01-01

    Can we decrease the costs of database curation by crowd-sourcing curation work or by offloading curation to publication authors? This perspective considers the significant experience accumulated by the bioinformatics community with these two alternatives to professional curation in the last 20 years; that experience should be carefully considered when formulating new strategies for biological databases. The vast weight of empirical evidence to date suggests that crowd-sourced curation is not a successful model for biological databases. Multiple approaches to crowd-sourced curation have been attempted by multiple groups, and extremely low participation rates by ‘the crowd’ are the overwhelming outcome. The author-curation model shows more promise for boosting curator efficiency. However, its limitations include that the quality of author-submitted annotations is uncertain, the response rate is low (but significant), and to date author curation has involved relatively simple forms of annotation involving one or a few types of data. Furthermore, shifting curation to authors may simply redistribute costs rather than decreasing costs; author curation may in fact increase costs because of the overhead involved in having every curating author learn what professional curators know: curation conventions, curation software and curation procedures. PMID:28025340

  1. Collaborative biocuration--text-mining development task for document prioritization for curation.

    PubMed

    Wiegers, Thomas C; Davis, Allan Peter; Mattingly, Carolyn J

    2012-01-01

    The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The 'BioCreative Workshop 2012' subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought community input: literature triage (Track I); curation workflow (Track II) and text mining/natural language processing (NLP) systems (Track III). Track I participants were invited to develop tools or systems that would effectively triage and prioritize articles for curation and present results in a prototype web interface. Training and test datasets were derived from the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) and consisted of manuscripts from which chemical-gene-disease data were manually curated. A total of seven groups participated in Track I. For the triage component, the effectiveness of participant systems was measured by aggregate gene, disease and chemical 'named-entity recognition' (NER) across articles; the effectiveness of 'information retrieval' (IR) was also measured based on 'mean average precision' (MAP). Top recall scores for gene, disease and chemical NER were 49, 65 and 82%, respectively; the top MAP score was 80%. Each participating group also developed a prototype web interface; these interfaces were evaluated based on functionality and ease-of-use by CTD's biocuration project manager. In this article, we present a detailed description of the challenge and a summary of the results.

  2. Marine Curators Gather

    ERIC Educational Resources Information Center

    McCoy, Floyd W.

    1977-01-01

    Reports on a recent meeting of marine curators in which data dissemination, standardization of marine curating techniques and methods, responsibilities of curators, funding problems, and sampling equipment were the main areas of discussion. A listing of the major deep sea sample collections in the United States is also provided. (CP)

  3. NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases

    PubMed Central

    Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin

    2015-01-01

    Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article’s supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in

  4. The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community.

    PubMed

    Rhee, Seung Yon; Beavis, William; Berardini, Tanya Z; Chen, Guanghong; Dixon, David; Doyle, Aisling; Garcia-Hernandez, Margarita; Huala, Eva; Lander, Gabriel; Montoya, Mary; Miller, Neil; Mueller, Lukas A; Mundodi, Suparna; Reiser, Leonore; Tacklind, Julie; Weems, Dan C; Wu, Yihe; Xu, Iris; Yoo, Daniel; Yoon, Jungwon; Zhang, Peifen

    2003-01-01

    Arabidopsis thaliana is the most widely-studied plant today. The concerted efforts of over 11 000 researchers and 4000 organizations around the world are generating a rich diversity and quantity of information and materials. This information is made available through a comprehensive on-line resource called the Arabidopsis Information Resource (TAIR) (http://arabidopsis.org), which is accessible via commonly used web browsers and can be searched and downloaded in a number of ways. In the last two years, efforts have been focused on increasing data content and diversity, functionally annotating genes and gene products with controlled vocabularies, and improving data retrieval, analysis and visualization tools. New information include sequence polymorphisms including alleles, germplasms and phenotypes, Gene Ontology annotations, gene families, protein information, metabolic pathways, gene expression data from microarray experiments and seed and DNA stocks. New data visualization and analysis tools include SeqViewer, which interactively displays the genome from the whole chromosome down to 10 kb of nucleotide sequence and AraCyc, a metabolic pathway database and map tool that allows overlaying expression data onto the pathway diagrams. Finally, we have recently incorporated seed and DNA stock information from the Arabidopsis Biological Resource Center (ABRC) and implemented a shopping-cart style on-line ordering system.

  5. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.

    PubMed

    O'Leary, Nuala A; Wright, Mathew W; Brister, J Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S; Kodali, Vamsi K; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M; Murphy, Michael R; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H; Rausch, Daniel; Riddick, Lillian D; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E; Vatsan, Anjana R; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D; Pruitt, Kim D

    2016-01-04

    The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

  6. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

    PubMed Central

    O'Leary, Nuala A.; Wright, Mathew W.; Brister, J. Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M.; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S.; Kodali, Vamsi K.; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M.; Murphy, Michael R.; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H.; Rausch, Daniel; Riddick, Lillian D.; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S.; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E.; Vatsan, Anjana R.; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J.; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D.; Pruitt, Kim D.

    2016-01-01

    The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. PMID:26553804

  7. Curation of Frozen Samples

    NASA Technical Reports Server (NTRS)

    Fletcher, L. A.; Allen, C. C.; Bastien, R.

    2008-01-01

    NASA's Johnson Space Center (JSC) and the Astromaterials Curator are charged by NPD 7100.10D with the curation of all of NASA s extraterrestrial samples, including those from future missions. This responsibility includes the development of new sample handling and preparation techniques; therefore, the Astromaterials Curator must begin developing procedures to preserve, prepare and ship samples at sub-freezing temperatures in order to enable future sample return missions. Such missions might include the return of future frozen samples from permanently-shadowed lunar craters, the nuclei of comets, the surface of Mars, etc. We are demonstrating the ability to curate samples under cold conditions by designing, installing and testing a cold curation glovebox. This glovebox will allow us to store, document, manipulate and subdivide frozen samples while quantifying and minimizing contamination throughout the curation process.

  8. The Immune Epitope Database 2.0

    PubMed Central

    Vita, Randi; Zarebski, Laura; Greenbaum, Jason A.; Emami, Hussein; Hoof, Ilka; Salimi, Nima; Damle, Rohini; Sette, Alessandro; Peters, Bjoern

    2010-01-01

    The Immune Epitope Database (IEDB, www.iedb.org) provides a catalog of experimentally characterized B and T cell epitopes, as well as data on Major Histocompatibility Complex (MHC) binding and MHC ligand elution experiments. The database represents the molecular structures recognized by adaptive immune receptors and the experimental contexts in which these molecules were determined to be immune epitopes. Epitopes recognized in humans, nonhuman primates, rodents, pigs, cats and all other tested species are included. Both positive and negative experimental results are captured. Over the course of 4 years, the data from 180 978 experiments were curated manually from the literature, which covers ∼99% of all publicly available information on peptide epitopes mapped in infectious agents (excluding HIV) and 93% of those mapped in allergens. In addition, data that would otherwise be unavailable to the public from 129 186 experiments were submitted directly by investigators. The curation of epitopes related to autoimmunity is expected to be completed by the end of 2010. The database can be queried by epitope structure, source organism, MHC restriction, assay type or host organism, among other criteria. The database structure, as well as its querying, browsing and reporting interfaces, was completely redesigned for the IEDB 2.0 release, which became publicly available in early 2009. PMID:19906713

  9. Toward an interactive article: integrating journals and biological databases

    PubMed Central

    2011-01-01

    Background Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five years and have resulted in the development of automated tools that can recognize entities within a document and link those entities to a relevant database. Unfortunately, automated tools cannot resolve ambiguities that arise from one term being used to signify entities that are quite distinct from one another. Instead, resolving these ambiguities requires some manual oversight. Finding the right balance between the speed and portability of automation and the accuracy and flexibility of manual effort is a crucial goal to making text markup a successful venture. Results We have established a journal article mark-up pipeline that links GENETICS journal articles and the model organism database (MOD) WormBase. This pipeline uses a lexicon built with entities from the database as a first step. The entity markup pipeline results in links from over nine classes of objects including genes, proteins, alleles, phenotypes and anatomical terms. New entities and ambiguities are discovered and resolved by a database curator through a manual quality control (QC) step, along with help from authors via a web form that is provided to them by the journal. New entities discovered through this pipeline are immediately sent to an appropriate curator at the database. Ambiguous entities that do not automatically resolve to one link are resolved by hand ensuring an accurate link. This pipeline has been extended to other databases, namely Saccharomyces Genome Database (SGD) and FlyBase, and has been implemented in marking up a paper with links to multiple databases. Conclusions Our semi-automated pipeline hyperlinks articles published in GENETICS to model organism databases

  10. HypoxiaDB: a database of hypoxia-regulated proteins

    PubMed Central

    Khurana, Pankaj; Sugadev, Ragumani; Jain, Jaspreet; Singh, Shashi Bala

    2013-01-01

    There has been intense interest in the cellular response to hypoxia, and a large number of differentially expressed proteins have been identified through various high-throughput experiments. These valuable data are scattered, and there have been no systematic attempts to document the various proteins regulated by hypoxia. Compilation, curation and annotation of these data are important in deciphering their role in hypoxia and hypoxia-related disorders. Therefore, we have compiled HypoxiaDB, a database of hypoxia-regulated proteins. It is a comprehensive, manually-curated, non-redundant catalog of proteins whose expressions are shown experimentally to be altered at different levels and durations of hypoxia. The database currently contains 72 000 manually curated entries taken on 3500 proteins extracted from 73 peer-reviewed publications selected from PubMed. HypoxiaDB is distinctive from other generalized databases: (i) it compiles tissue-specific protein expression changes under different levels and duration of hypoxia. Also, it provides manually curated literature references to support the inclusion of the protein in the database and establish its association with hypoxia. (ii) For each protein, HypoxiaDB integrates data on gene ontology, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, protein–protein interactions, protein family (Pfam), OMIM (Online Mendelian Inheritance in Man), PDB (Protein Data Bank) structures and homology to other sequenced genomes. (iii) It also provides pre-compiled information on hypoxia-proteins, which otherwise requires tedious computational analysis. This includes information like chromosomal location, identifiers like Entrez, HGNC, Unigene, Uniprot, Ensembl, Vega, GI numbers and Genbank accession numbers associated with the protein. These are further cross-linked to respective public databases augmenting HypoxiaDB to the external repositories. (iv) In addition, HypoxiaDB provides an online sequence-similarity search tool for

  11. DBAASP: database of antimicrobial activity and structure of peptides.

    PubMed

    Gogoladze, Giorgi; Grigolava, Maia; Vishnepolsky, Boris; Chubinidze, Mindia; Duroux, Patrice; Lefranc, Marie-Paule; Pirtskhalava, Malak

    2014-08-01

    The Database of Antimicrobial Activity and Structure of Peptides (DBAASP) is a manually curated database for those peptides for which antimicrobial activity against particular targets has been evaluated experimentally. The database is a depository of complete information on: the chemical structure of peptides; target species; target object of cell; peptide antimicrobial/haemolytic/cytotoxic activities; and experimental conditions at which activities were estimated. The DBAASP search page allows the user to search peptides according to their structural characteristics, complexity type (monomer, dimer and two-peptide), source, synthesis type (ribosomal, nonribosomal and synthetic) and target species. The database prediction algorithm provides a tool for rational design of new antimicrobial peptides. DBAASP is accessible at http://www.biomedicine.org.ge/dbaasp/.

  12. Can we replace curation with information extraction software?

    PubMed

    Karp, Peter D

    2016-01-01

    Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL.

  13. Can we replace curation with information extraction software?

    PubMed Central

    Karp, Peter D.

    2016-01-01

    Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs. Database URL: PMID:28025341

  14. Mining clinical attributes of genomic variants through assisted literature curation in Egas.

    PubMed

    Matos, Sérgio; Campos, David; Pinho, Renato; Silva, Raquel M; Mort, Matthew; Cooper, David N; Oliveira, José Luís

    2016-01-01

    The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/Database URL: https://demo.bmd-software.com/egas/.

  15. Mining clinical attributes of genomic variants through assisted literature curation in Egas

    PubMed Central

    Matos, Sérgio; Campos, David; Pinho, Renato; Silva, Raquel M.; Mort, Matthew; Cooper, David N.; Oliveira, José Luís

    2016-01-01

    The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/. Database URL: https://demo.bmd-software.com/egas/ PMID:27278817

  16. HEDD: the human epigenetic drug database

    PubMed Central

    Qi, Yunfeng; Wang, Dadong; Wang, Daying; Jin, Taicheng; Yang, Liping; Wu, Hui; Li, Yaoyao; Zhao, Jing; Du, Fengping; Song, Mingxia; Wang, Renjun

    2016-01-01

    Epigenetic drugs are chemical compounds that target disordered post-translational modification of histone proteins and DNA through enzymes, and the recognition of these changes by adaptor proteins. Epigenetic drug-related experimental data such as gene expression probed by high-throughput sequencing, co-crystal structure probed by X-RAY diffraction and binding constants probed by bio-assay have become widely available. The mining and integration of multiple kinds of data can be beneficial to drug discovery and drug repurposing. HEMD and other epigenetic databases store comprehensively epigenetic data where users can acquire segmental information of epigenetic drugs. However, some data types such as high-throughput datasets are not provide by these databases and they do not support flexible queries for epigenetic drug-related experimental data. Therefore, in reference to HEMD and other epigenetic databases, we developed a relatively comprehensive database for human epigenetic drugs. The human epigenetic drug database (HEDD) focuses on the storage and integration of epigenetic drug datasets obtained from laboratory experiments and manually curated information. The latest release of HEDD incorporates five kinds of datasets: (i) drug, (ii) target, (iii) disease, (vi) high-throughput and (v) complex. In order to facilitate data extraction, flexible search options were built in HEDD, which allowed an unlimited condition query for specific kinds of datasets using drug names, diseases and experiment types. Database URL: http://hedds.org/ PMID:28025347

  17. The Protein-DNA Interface database

    PubMed Central

    2010-01-01

    The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 Å or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface. We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes. PMID:20482798

  18. CMR Metadata Curation

    NASA Technical Reports Server (NTRS)

    Shum, Dana; Bugbee, Kaylin

    2017-01-01

    This talk explains the ongoing metadata curation activities in the Common Metadata Repository. It explores tools that exist today which are useful for building quality metadata and also opens up the floor for discussions on other potentially useful tools.

  19. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases

    PubMed Central

    Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    Motivation First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. Results We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material. PMID:27489953

  20. The Structure-Function Linkage Database.

    PubMed

    Akiva, Eyal; Brown, Shoshana; Almonacid, Daniel E; Barber, Alan E; Custer, Ashley F; Hicks, Michael A; Huang, Conrad C; Lauck, Florian; Mashiyama, Susan T; Meng, Elaine C; Mischel, David; Morris, John H; Ojha, Sunil; Schnoes, Alexandra M; Stryke, Doug; Yunes, Jeffrey M; Ferrin, Thomas E; Holliday, Gemma L; Babbitt, Patricia C

    2014-01-01

    The Structure-Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure-function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies 'look alike', making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.

  1. DDRprot: a database of DNA damage response-related proteins

    PubMed Central

    Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M.

    2016-01-01

    The DNA Damage Response (DDR) signalling network is an essential system that protects the genome’s integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used. Database URL: http://ddr.cbbio.es. PMID:27577567

  2. dictyBase, the model organism database for Dictyostelium discoideum.

    PubMed

    Chisholm, Rex L; Gaudet, Pascale; Just, Eric M; Pilcher, Karen E; Fey, Petra; Merchant, Sohel N; Kibbe, Warren A

    2006-01-01

    dictyBase (http://dictybase.org) is the model organism database (MOD) for the social amoeba Dictyostelium discoideum. The unique biology and phylogenetic position of Dictyostelium offer a great opportunity to gain knowledge of processes not characterized in other organisms. The recent completion of the 34 MB genome sequence, together with the sizable scientific literature using Dictyostelium as a research organism, provided the necessary tools to create a well-annotated genome. dictyBase has leveraged software developed by the Saccharomyces Genome Database and the Generic Model Organism Database project. This has reduced the time required to develop a full-featured MOD and greatly facilitated our ability to focus on annotation and providing new functionality. We hope that manual curation of the Dictyostelium genome will facilitate the annotation of other genomes.

  3. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies.

    PubMed

    Cserhati, Matyas F; Pandey, Sanjit; Beaudoin, James J; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33,017,407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database.

  4. The Princeton Protein Orthology Database (P-POD): A Comparative Genomics Analysis Tool for Biologists

    PubMed Central

    Kang, Fan; Angiuoli, Samuel V.; White, Owen; Botstein, David; Dolinski, Kara

    2007-01-01

    Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs (based on the OrthoMCL method) to a query gene from any of eight eukaryotic organisms, and to see the orthologs in a wider evolutionary context (based on the Jaccard clustering method). In addition to the phylogenetic information, the database contains experimental results manually collected from the literature that can be compared to the computational analyses, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases. Our aim is for the P-POD resource to be extremely useful to typical experimental biologists wanting to learn more about the evolutionary context of their favorite genes. P-POD is based on the commonly used Generic Model Organism Database (GMOD) schema and can be downloaded in its entirety for installation on one's own system. Thus, bioinformaticians and software developers may also find P-POD useful because they can use the P-POD database infrastructure when developing their own comparative genomics resources and database tools. PMID:17712414

  5. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  6. Advanced Curation Preparation for Mars Sample Return and Cold Curation

    NASA Technical Reports Server (NTRS)

    Fries, M. D.; Harrington, A. D.; McCubbin, F. M.; Mitchell, J.; Regberg, A. B.; Snead, C.

    2017-01-01

    NASA Curation is tasked with the care and distribution of NASA's sample collections, such as the Apollo lunar samples and cometary material collected by the Stardust spacecraft. Curation is also mandated to perform Advanced Curation research and development, which includes improving the curation of existing collections as well as preparing for future sample return missions. Advanced Curation has identified a suite of technologies and techniques that will require attention ahead of Mars sample return (MSR) and missions with cold curation (CCur) requirements, perhaps including comet sample return missions.

  7. Public chemical compound databases.

    PubMed

    Williams, Anthony J

    2008-05-01

    The internet has rapidly become the first port of call for all information searches. The increasing array of chemistry-related resources that are now available provides chemists with a direct path to the information that was previously accessed via library services and was limited by commercial and costly resources. The diversity of the information that can be accessed online is expanding at a dramatic rate, and the support for publicly available resources offers significant opportunities in terms of the benefits to science and society. While the data online do not generally meet the quality standards of manually curated sources, there are efforts underway to gather scientists together and 'crowdsource' an improvement in the quality of the available data. This review discusses the types of public compound databases that are available online and provides a series of examples. Focus is also given to the benefits and disruptions associated with the increased availability of such data and the integration of technologies to data mine this information.

  8. MicroScope--an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data.

    PubMed

    Vallenet, David; Belda, Eugeni; Calteau, Alexandra; Cruveiller, Stéphane; Engelen, Stefan; Lajus, Aurélie; Le Fèvre, François; Longin, Cyrille; Mornico, Damien; Roche, David; Rouy, Zoé; Salvignol, Gregory; Scarpelli, Claude; Thil Smith, Adam Alexander; Weiman, Marion; Médigue, Claudine

    2013-01-01

    MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest.

  9. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015

    PubMed Central

    Davis, Allan Peter; Grondin, Cynthia J.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; King, Benjamin L.; Wiegers, Thomas C.; Mattingly, Carolyn J.

    2015-01-01

    Ten years ago, the Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) was developed out of a need to formalize, harmonize and centralize the information on numerous genes and proteins responding to environmental toxic agents across diverse species. CTD's initial approach was to facilitate comparisons of nucleotide and protein sequences of toxicologically significant genes by curating these sequences and electronically annotating them with chemical terms from their associated references. Since then, however, CTD has vastly expanded its scope to robustly represent a triad of chemical–gene, chemical–disease and gene–disease interactions that are manually curated from the scientific literature by professional biocurators using controlled vocabularies, ontologies and structured notation. Today, CTD includes 24 million toxicogenomic connections relating chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, Gene Ontology annotations, pathways and interaction modules. In this 10th year anniversary update, we outline the evolution of CTD, including our increased data content, new ‘Pathway View’ visualization tool, enhanced curation practices, pilot chemical–phenotype results and impending exposure data set. The prototype database originally described in our first report has transformed into a sophisticated resource used actively today to help scientists develop and test hypotheses about the etiologies of environmentally influenced diseases. PMID:25326323

  10. VHLdb: A database of von Hippel-Lindau protein interactors and mutations

    PubMed Central

    Tabaro, Francesco; Minervini, Giovanni; Sundus, Faiza; Quaglia, Federica; Leonardi, Emanuela; Piovesan, Damiano; Tosatto, Silvio C. E.

    2016-01-01

    Mutations in von Hippel-Lindau tumor suppressor protein (pVHL) predispose to develop tumors affecting specific target organs, such as the retina, epididymis, adrenal glands, pancreas and kidneys. Currently, more than 400 pVHL interacting proteins are either described in the literature or predicted in public databases. This data is scattered among several different sources, slowing down the comprehension of pVHL’s biological role. Here we present VHLdb, a novel database collecting available interaction and mutation data on pVHL to provide novel integrated annotations. In VHLdb, pVHL interactors are organized according to two annotation levels, manual and automatic. Mutation data are easily accessible and a novel visualization tool has been implemented. A user-friendly feedback function to improve database content through community-driven curation is also provided. VHLdb presently contains 478 interactors, of which 117 have been manually curated, and 1,074 mutations. This makes it the largest available database for pVHL-related information. VHLdb is available from URL: http://vhldb.bio.unipd.it/. PMID:27511743

  11. Curating NASA's Past, Present, and Future Astromaterial Sample Collections

    NASA Technical Reports Server (NTRS)

    Zeigler, R. A.; Allton, J. H.; Evans, C. A.; Fries, M. D.; McCubbin, F. M.; Nakamura-Messenger, K.; Righter, K.; Zolensky, M.; Stansbery, E. K.

    2016-01-01

    The Astromaterials Acquisition and Curation Office at NASA Johnson Space Center (hereafter JSC curation) is responsible for curating all of NASA's extraterrestrial samples. JSC presently curates 9 different astromaterials collections in seven different clean-room suites: (1) Apollo Samples (ISO (International Standards Organization) class 6 + 7); (2) Antarctic Meteorites (ISO 6 + 7); (3) Cosmic Dust Particles (ISO 5); (4) Microparticle Impact Collection (ISO 7; formerly called Space-Exposed Hardware); (5) Genesis Solar Wind Atoms (ISO 4); (6) Stardust Comet Particles (ISO 5); (7) Stardust Interstellar Particles (ISO 5); (8) Hayabusa Asteroid Particles (ISO 5); (9) OSIRIS-REx Spacecraft Coupons and Witness Plates (ISO 7). Additional cleanrooms are currently being planned to house samples from two new collections, Hayabusa 2 (2021) and OSIRIS-REx (2023). In addition to the labs that house the samples, we maintain a wide variety of infra-structure facilities required to support the clean rooms: HEPA-filtered air-handling systems, ultrapure dry gaseous nitrogen systems, an ultrapure water system, and cleaning facilities to provide clean tools and equipment for the labs. We also have sample preparation facilities for making thin sections, microtome sections, and even focused ion-beam sections. We routinely monitor the cleanliness of our clean rooms and infrastructure systems, including measurements of inorganic or organic contamination, weekly airborne particle counts, compositional and isotopic monitoring of liquid N2 deliveries, and daily UPW system monitoring. In addition to the physical maintenance of the samples, we track within our databases the current and ever changing characteristics (weight, location, etc.) of more than 250,000 individually numbered samples across our various collections, as well as more than 100,000 images, and countless "analog" records that record the sample processing records of each individual sample. JSC Curation is co-located with JSC

  12. FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana.

    PubMed

    Bouché, Frédéric; Lobet, Guillaume; Tocquin, Pierre; Périlleux, Claire

    2016-01-04

    Flowering is a hot topic in Plant Biology and important progress has been made in Arabidopsis thaliana toward unraveling the genetic networks involved. The increasing complexity and the explosion of literature however require development of new tools for information management and update. We therefore created an evolutive and interactive database of flowering time genes, named FLOR-ID (Flowering-Interactive Database), which is freely accessible at http://www.flor-id.org. The hand-curated database contains information on 306 genes and links to 1595 publications gathering the work of >4500 authors. Gene/protein functions and interactions within the flowering pathways were inferred from the analysis of related publications, included in the database and translated into interactive manually drawn snapshots.

  13. Putting The Plant Metabolic Network pathway databases to work: going offline to gain new capabilities.

    PubMed

    Dreher, Kate

    2014-01-01

    Metabolic databases such as The Plant Metabolic Network/MetaCyc and KEGG PATHWAY are publicly accessible resources providing organism-specific information on reactions and metabolites. KEGG PATHWAY depicts metabolic networks as wired, electronic circuit-like maps, whereas the MetaCyc family of databases uses a canonical textbook-like representation. The first MetaCyc-based database for a plant species was AraCyc, which describes metabolism in the model plant Arabidopsis. This database was created over 10 years ago and has since then undergone extensive manual curation to reflect updated information on enzymes and pathways in Arabidopsis. This chapter describes accessing and using AraCyc and its underlying Pathway Tools software. Specifically, methods for (1) navigating Pathway Tools, (2) visualizing omics data and superimposing the data on a metabolic pathway map, and (3) creating pathways and pathway components are discussed.

  14. BGDB: a database of bivalent genes.

    PubMed

    Li, Qingyan; Lian, Shuabin; Dai, Zhiming; Xiang, Qian; Dai, Xianhua

    2013-01-01

    Bivalent gene is a gene marked with both H3K4me3 and H3K27me3 epigenetic modification in the same area, and is proposed to play a pivotal role related to pluripotency in embryonic stem (ES) cells. Identification of these bivalent genes and understanding their functions are important for further research of lineage specification and embryo development. So far, lots of genome-wide histone modification data were generated in mouse and human ES cells. These valuable data make it possible to identify bivalent genes, but no comprehensive data repositories or analysis tools are available for bivalent genes currently. In this work, we develop BGDB, the database of bivalent genes. The database contains 6897 bivalent genes in human and mouse ES cells, which are manually collected from scientific literature. Each entry contains curated information, including genomic context, sequences, gene ontology and other relevant information. The web services of BGDB database were implemented with PHP + MySQL + JavaScript, and provide diverse query functions. Database URL: http://dailab.sysu.edu.cn/bgdb/

  15. Current Challenges in Development of a Database of Three-Dimensional Chemical Structures

    PubMed Central

    Maeda, Miki H.

    2015-01-01

    We are developing a database named 3DMET, a three-dimensional structure database of natural metabolites. There are two major impediments to the creation of 3D chemical structures from a set of planar structure drawings: the limited accuracy of computer programs and insufficient human resources for manual curation. We have tested some 2D–3D converters to convert 2D structure files from external databases. These automatic conversion processes yielded an excessive number of improper conversions. To ascertain the quality of the conversions, we compared IUPAC Chemical Identifier and canonical SMILES notations before and after conversion. Structures whose notations correspond to each other were regarded as a correct conversion in our present work. We found that chiral inversion is the most serious factor during the improper conversion. In the current stage of our database construction, published books or articles have been resources for additions to our database. Chemicals are usually drawn as pictures on the paper. To save human resources, an optical structure reader was introduced. The program was quite useful but some particular errors were observed during our operation. We hope our trials for producing correct 3D structures will help other developers of chemical programs and curators of chemical databases. PMID:26075200

  16. GEAR: A database of Genomic Elements Associated with drug Resistance

    PubMed Central

    Wang, Yin-Ying; Chen, Wei-Hua; Xiao, Pei-Pei; Xie, Wen-Bin; Luo, Qibin; Bork, Peer; Zhao, Xing-Ming

    2017-01-01

    Drug resistance is becoming a serious problem that leads to the failure of standard treatments, which is generally developed because of genetic mutations of certain molecules. Here, we present GEAR (A database of Genomic Elements Associated with drug Resistance) that aims to provide comprehensive information about genomic elements (including genes, single-nucleotide polymorphisms and microRNAs) that are responsible for drug resistance. Right now, GEAR contains 1631 associations between 201 human drugs and 758 genes, 106 associations between 29 human drugs and 66 miRNAs, and 44 associations between 17 human drugs and 22 SNPs. These relationships are firstly extracted from primary literature with text mining and then manually curated. The drug resistome deposited in GEAR provides insights into the genetic factors underlying drug resistance. In addition, new indications and potential drug combinations can be identified based on the resistome. The GEAR database can be freely accessed through http://gear.comp-sysbio.org. PMID:28294141

  17. GEAR: A database of Genomic Elements Associated with drug Resistance.

    PubMed

    Wang, Yin-Ying; Chen, Wei-Hua; Xiao, Pei-Pei; Xie, Wen-Bin; Luo, Qibin; Bork, Peer; Zhao, Xing-Ming

    2017-03-15

    Drug resistance is becoming a serious problem that leads to the failure of standard treatments, which is generally developed because of genetic mutations of certain molecules. Here, we present GEAR (A database of Genomic Elements Associated with drug Resistance) that aims to provide comprehensive information about genomic elements (including genes, single-nucleotide polymorphisms and microRNAs) that are responsible for drug resistance. Right now, GEAR contains 1631 associations between 201 human drugs and 758 genes, 106 associations between 29 human drugs and 66 miRNAs, and 44 associations between 17 human drugs and 22 SNPs. These relationships are firstly extracted from primary literature with text mining and then manually curated. The drug resistome deposited in GEAR provides insights into the genetic factors underlying drug resistance. In addition, new indications and potential drug combinations can be identified based on the resistome. The GEAR database can be freely accessed through http://gear.comp-sysbio.org.

  18. Text Mining to Support Gene Ontology Curation and Vice Versa.

    PubMed

    Ruch, Patrick

    2017-01-01

    In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.

  19. PHI-base: a new interface and further additions for the multi-species pathogen-host interactions database.

    PubMed

    Urban, Martin; Cuzick, Alayne; Rutherford, Kim; Irvine, Alistair; Pedro, Helder; Pant, Rashmi; Sadanadan, Vidyendra; Khamari, Lokanath; Billal, Santoshkumar; Mohanty, Sagar; Hammond-Kosack, Kim E

    2017-01-04

    The pathogen-host interactions database (PHI-base) is available at www.phi-base.org PHI-base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions reported in peer reviewed research articles. In addition, literature that indicates specific gene alterations that did not affect the disease interaction phenotype are curated to provide complete datasets for comparative purposes. Viruses are not included. Here we describe a revised PHI-base Version 4 data platform with improved search, filtering and extended data display functions. A PHIB-BLAST search function is provided and a link to PHI-Canto, a tool for authors to directly curate their own published data into PHI-base. The new release of PHI-base Version 4.2 (October 2016) has an increased data content containing information from 2219 manually curated references. The data provide information on 4460 genes from 264 pathogens tested on 176 hosts in 8046 interactions. Prokaryotic and eukaryotic pathogens are represented in almost equal numbers. Host species belong ∼70% to plants and 30% to other species of medical and/or environmental importance. Additional data types included into PHI-base 4 are the direct targets of pathogen effector proteins in experimental and natural host organisms. The curation problems encountered and the future directions of the PHI-base project are briefly discussed.

  20. PHI-base: a new interface and further additions for the multi-species pathogen–host interactions database

    PubMed Central

    Urban, Martin; Cuzick, Alayne; Rutherford, Kim; Irvine, Alistair; Pedro, Helder; Pant, Rashmi; Sadanadan, Vidyendra; Khamari, Lokanath; Billal, Santoshkumar; Mohanty, Sagar; Hammond-Kosack, Kim E.

    2017-01-01

    The pathogen–host interactions database (PHI-base) is available at www.phi-base.org. PHI-base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen–host interactions reported in peer reviewed research articles. In addition, literature that indicates specific gene alterations that did not affect the disease interaction phenotype are curated to provide complete datasets for comparative purposes. Viruses are not included. Here we describe a revised PHI-base Version 4 data platform with improved search, filtering and extended data display functions. A PHIB-BLAST search function is provided and a link to PHI-Canto, a tool for authors to directly curate their own published data into PHI-base. The new release of PHI-base Version 4.2 (October 2016) has an increased data content containing information from 2219 manually curated references. The data provide information on 4460 genes from 264 pathogens tested on 176 hosts in 8046 interactions. Prokaryotic and eukaryotic pathogens are represented in almost equal numbers. Host species belong ∼70% to plants and 30% to other species of medical and/or environmental importance. Additional data types included into PHI-base 4 are the direct targets of pathogen effector proteins in experimental and natural host organisms. The curation problems encountered and the future directions of the PHI-base project are briefly discussed. PMID:27915230

  1. AAgAtlas 1.0: a human autoantigen database

    PubMed Central

    Wang, Dan; Yang, Liuhui; Zhang, Ping; LaBaer, Joshua; Hermjakob, Henning; Li, Dong; Yu, Xiaobo

    2017-01-01

    Autoantibodies refer to antibodies that target self-antigens, which can play pivotal roles in maintaining homeostasis, distinguishing normal from tumor tissue and trigger autoimmune diseases. In the last three decades, tremendous efforts have been devoted to elucidate the generation, evolution and functions of autoantibodies, as well as their target autoantigens. However, reports of these countless previously identified autoantigens are randomly dispersed in the literature. Here, we constructed an AAgAtlas database 1.0 using text-mining and manual curation. We extracted 45 830 autoantigen-related abstracts and 94 313 sentences from PubMed using the keywords of either ‘autoantigen’ or ‘autoantibody’ or their lexical variants, which were further refined to 25 520 abstracts, 43 253 sentences and 3984 candidates by our bio-entity recognizer based on the Protein Ontology. Finally, we identified 1126 genes as human autoantigens and 1071 related human diseases, with which we constructed a human autoantigen database (AAgAtlas database 1.0). The database provides a user-friendly interface to conveniently browse, retrieve and download human autoantigens as well as their associated diseases. The database is freely accessible at http://biokb.ncpsb.org/aagatlas/. We believe this database will be a valuable resource to track and understand human autoantigens as well as to investigate their functions in basic and translational research. PMID:27924021

  2. Establishment of a integrative multi-omics expression database CKDdb in the context of chronic kidney disease (CKD)

    PubMed Central

    Fernandes, Marco; Husi, Holger

    2017-01-01

    Complex human traits such as chronic kidney disease (CKD) are a major health and financial burden in modern societies. Currently, the description of the CKD onset and progression at the molecular level is still not fully understood. Meanwhile, the prolific use of high-throughput omic technologies in disease biomarker discovery studies yielded a vast amount of disjointed data that cannot be easily collated. Therefore, we aimed to develop a molecule-centric database featuring CKD-related experiments from available literature publications. We established the Chronic Kidney Disease database CKDdb, an integrated and clustered information resource that covers multi-omic studies (microRNAs, genomics, peptidomics, proteomics and metabolomics) of CKD and related disorders by performing literature data mining and manual curation. The CKDdb database contains differential expression data from 49395 molecule entries (redundant), of which 16885 are unique molecules (non-redundant) from 377 manually curated studies of 230 publications. This database was intentionally built to allow disease pathway analysis through a systems approach in order to yield biological meaning by integrating all existing information and therefore has the potential to unravel and gain an in-depth understanding of the key molecular events that modulate CKD pathogenesis. PMID:28079125

  3. HistoneDB 2.0: a histone database with variants--an integrated resource to explore histones and their variants.

    PubMed

    Draizen, Eli J; Shaytan, Alexey K; Mariño-Ramírez, Leonardo; Talbert, Paul B; Landsman, David; Panchenko, Anna R

    2016-01-01

    Compaction of DNA into chromatin is a characteristic feature of eukaryotic organisms. The core (H2A, H2B, H3, H4) and linker (H1) histone proteins are responsible for this compaction through the formation of nucleosomes and higher order chromatin aggregates. Moreover, histones are intricately involved in chromatin functioning and provide a means for genome dynamic regulation through specific histone variants and histone post-translational modifications. 'HistoneDB 2.0--with variants' is a comprehensive database of histone protein sequences, classified by histone types and variants. All entries in the database are supplemented by rich sequence and structural annotations with many interactive tools to explore and compare sequences of different variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant. HistoneDB 2.0 is a resource for the interactive comparative analysis of histone protein sequences and their implications for chromatin function. Database URL: http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0.

  4. Safety manual

    SciTech Connect

    Not Available

    1983-05-01

    This manual, dedicated to that effort, has been prepared to help TVA employees perform their work safely. The information in this manual is the result of the experience and observations of many people in the Division of Power System Operations (PSO). As such, the manual offers a PSO employee the benefit of that accumulated knowledge of safe working in the Division.

  5. HHMD: the human histone modification database.

    PubMed

    Zhang, Yan; Lv, Jie; Liu, Hongbo; Zhu, Jiang; Su, Jianzhong; Wu, Qiong; Qi, Yunfeng; Wang, Fang; Li, Xia

    2010-01-01

    Histone modifications play important roles in chromatin remodeling, gene transcriptional regulation, stem cell maintenance and differentiation. Alterations in histone modifications may be linked to human diseases especially cancer. Histone modifications including methylation, acetylation and ubiquitylation probed by ChIP-seq, ChIP-chip and qChIP have become widely available. Mining and integration of histone modification data can be beneficial to novel biological discoveries. There has been no comprehensive data repository that is exclusive for human histone modifications. Therefore, we developed a relatively comprehensive database for human histone modifications. Human Histone Modification Database (HHMD, http://bioinfo.hrbmu.edu.cn/hhmd) focuses on the storage and integration of histone modification datasets that were obtained from laboratory experiments. The latest release of HHMD incorporates 43 location-specific histone modifications in human. To facilitate data extraction, flexible search options are built in HHMD. It can be searched by histone modification, gene ID, functional categories, chromosome location and cancer name. HHMD also includes a user-friendly visualization tool named HisModView, by which genome-wide histone modification map can be shown. HisModView facilitates the acquisition and visualization of histone modifications. The database also has manually curated information of histone modification dysregulation in nine human cancers.

  6. Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.

    PubMed

    Kim, Sun; Kim, Won; Wei, Chih-Hsuan; Lu, Zhiyong; Wilbur, W John

    2012-01-01

    The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical-gene interactions, chemical-disease relationships and gene-disease relationships. Finding articles containing this information is the first and an important step to assist manual curation efficiency. However, the complex nature of named entities and their relationships make it challenging to choose relevant articles. In this article, we introduce a machine learning framework for prioritizing CTD-relevant articles based on our prior system for the protein-protein interaction article classification task in BioCreative III. To address new challenges in the CTD task, we explore a new entity identification method for genes, chemicals and diseases. In addition, latent topics are analyzed and used as a feature type to overcome the small size of the training set. Applied to the BioCreative 2012 Triage dataset, our method achieved 0.8030 mean average precision (MAP) in the official runs, resulting in the top MAP system among participants. Integrated with PubTator, a Web interface for annotating biomedical literature, the proposed system also received a positive review from the CTD curation team.

  7. Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases

    PubMed Central

    2011-01-01

    Background The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data. Results The Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms. Conclusion The development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines. PMID:22151178

  8. Curation and Computational Design of Bioenergy-Related Metabolic Pathways

    SciTech Connect

    Karp, Peter D.

    2014-09-12

    Pathway Tools is a systems-biology software package written by SRI International (SRI) that produces Pathway/Genome Databases (PGDBs) for organisms with a sequenced genome. Pathway Tools also provides a wide range of capabilities for analyzing predicted metabolic networks and user-generated omics data. More than 5,000 academic, industrial, and government groups have licensed Pathway Tools. This user community includes researchers at all three DOE bioenergy centers, as well as academic and industrial metabolic engineering (ME) groups. An integral part of the Pathway Tools software is MetaCyc, a large, multiorganism database of metabolic pathways and enzymes that SRI and its academic collaborators manually curate. This project included two main goals: I. Enhance the MetaCyc content of bioenergy-related enzymes and pathways. II. Develop computational tools for engineering metabolic pathways that satisfy specified design goals, in particular for bioenergy-related pathways. In part I, SRI proposed to significantly expand the coverage of bioenergy-related metabolic information in MetaCyc, followed by the generation of organism-specific PGDBs for all energy-relevant organisms sequenced at the DOE Joint Genome Institute (JGI). Part I objectives included: 1: Expand the content of MetaCyc to include bioenergy-related enzymes and pathways. 2: Enhance the Pathway Tools software to enable display of complex polymer degradation processes. 3: Create new PGDBs for the energy-related organisms sequenced by JGI, update existing PGDBs with new MetaCyc content, and make these data available to JBEI via the BioCyc website. In part II, SRI proposed to develop an efficient computational tool for the engineering of metabolic pathways. Part II objectives included: 4: Develop computational tools for generating metabolic pathways that satisfy specified design goals, enabling users to specify parameters such as starting and ending compounds, and preferred or disallowed intermediate compounds

  9. Instrumentation Manuals On-Line

    NASA Astrophysics Data System (ADS)

    Bryson, E.

    This database will serve as an international clearing house for observatory manuals and information on how to access them. At present, CFHT, STScI, and IRTF are participating in this project. It is the author's intention that each observatory will submit electronically a pre-formatted template which is now available on NCSA Mosaic (URL http://www.cfht.hawaii.edu/html/obs-manuals.html). The template describes the instrumentation manual in the following manner: location, version, revision, institution, description wavelength region, field, keywords, contact person, size of document, figures, tables and availability. In addition the template provides the user with a direct link to the manual if it is on-line. The project author will contact the individual or institution in charge of the template at six month intervals in order to insure the manual's accuracy. It is hoped the availability of this service will encourage all observatories to make information about their manuals available electronically.

  10. CEBS: a comprehensive annotated database of toxicological data

    PubMed Central

    Lea, Isabel A.; Gong, Hui; Paleja, Anand; Rashid, Asif; Fostel, Jennifer

    2017-01-01

    The Chemical Effects in Biological Systems database (CEBS) is a comprehensive and unique toxicology resource that compiles individual and summary animal data from the National Toxicology Program (NTP) testing program and other depositors into a single electronic repository. CEBS has undergone significant updates in recent years and currently contains over 11 000 test articles (exposure agents) and over 8000 studies including all available NTP carcinogenicity, short-term toxicity and genetic toxicity studies. Study data provided to CEBS are manually curated, accessioned and subject to quality assurance review prior to release to ensure high quality. The CEBS database has two main components: data collection and data delivery. To accommodate the breadth of data produced by NTP, the CEBS data collection component is an integrated relational design that allows the flexibility to capture any type of electronic data (to date). The data delivery component of the database comprises a series of dedicated user interface tables containing pre-processed data that support each component of the user interface. The user interface has been updated to include a series of nine Guided Search tools that allow access to NTP summary and conclusion data and larger non-NTP datasets. The CEBS database can be accessed online at http://www.niehs.nih.gov/research/resources/databases/cebs/. PMID:27899660

  11. Kalium: a database of potassium channel toxins from scorpion venom

    PubMed Central

    Kuzmenkov, Alexey I.; Krylov, Nikolay A.; Chugunov, Anton O.; Grishin, Eugene V.; Vassilevski, Alexander A.

    2016-01-01

    Kalium (http://kaliumdb.org/) is a manually curated database that accumulates data on potassium channel toxins purified from scorpion venom (KTx). This database is an open-access resource, and provides easy access to pages of other databases of interest, such as UniProt, PDB, NCBI Taxonomy Browser, and PubMed. General achievements of Kalium are a strict and easy regulation of KTx classification based on the unified nomenclature supported by researchers in the field, removal of peptides with partial sequence and entries supported by transcriptomic information only, classification of β-family toxins, and addition of a novel λ-family. Molecules presented in the database can be processed by the Clustal Omega server using a one-click option. Molecular masses of mature peptides are calculated and available activity data are compiled for all KTx. We believe that Kalium is not only of high interest to professional toxinologists, but also of general utility to the scientific community. Database URL: http://kaliumdb.org/ PMID:27087309

  12. JSC Stardust Curation Team

    NASA Technical Reports Server (NTRS)

    Zolensky, Michael E.

    2000-01-01

    STARDUST, a NASA Discovery-class mission, is the first to return samples from a comet. Grains from comet Wild 2's coma-the gas and dust envelope that surrounds the nucleus-will be collected as well as interstellar dust. The mission which launched on February 7, 1999 will encounter the comet on January 10, 2004. As the spacecraft passes through the coma, a tray of silica aerogel will be exposed, and coma grains will impact there and become captured. Following the collection, the aerogel tray is closed for return to Earth in 2006. A dust impact mass spectrometer on board the STARDUST spacecraft will be used to gather spectra. of dust during the entire mission, including the coma passage. This instrument will be the best chance to obtain data on volatile grains, which will not be well-collected in the aerogel. The dust impact mass spectrometer will also be used to study the composition of interstellar grains. In the past 5 years, analysis of data from dust detectors aboard the Ulysses and Galileo spacecraft have revealed that there is a stream of interstellar dust flowing through our solar system. These grains will be captured during the cruise phase of the STARDUST mission, as the spacecraft travels toward the comet. The sample return capsule will parachute to Earth in February 2006, and will land in western Utah. Once on y the ground, the sample return capsule will be placed into a dry nitrogen environment and flown to the curation lab at JSC.

  13. Curation of the genome annotation of Pichia pastoris (Komagataella phaffii) CBS7435 from gene level to protein function.

    PubMed

    Valli, Minoska; Tatto, Nadine E; Peymann, Armin; Gruber, Clemens; Landes, Nils; Ekker, Heinz; Thallinger, Gerhard G; Mattanovich, Diethard; Gasser, Brigitte; Graf, Alexandra B

    2016-09-01

    As manually curated and non-automated BLAST analysis of the published Pichia pastoris genome sequences revealed many differences between the gene annotations of the strains GS115 and CBS7435, RNA-Seq analysis, supported by proteomics, was performed to improve the genome annotation. Detailed analysis of sequence alignment and protein domain predictions were made to extend the functional genome annotation to all P. pastoris sequences. This allowed the identification of 492 new ORFs, 4916 hypothetical UTRs and the correction of 341 incorrect ORF predictions, which were mainly due to the presence of upstream ATG or erroneous intron predictions. Moreover, 175 previously erroneously annotated ORFs need to be removed from the annotation. In total, we have annotated 5325 ORFs. Regarding the functionality of those genes, we improved all gene and protein descriptions. Thereby, the percentage of ORFs with functional annotation was increased from 48% to 73%. Furthermore, we defined functional groups, covering 25 biological cellular processes of interest, by grouping all genes that are part of the defined process. All data are presented in the newly launched genome browser and database available at www.pichiagenome.org In summary, we present a wide spectrum of curation of the P. pastoris genome annotation from gene level to protein function.

  14. Rfam 12.0: updates to the RNA families database

    PubMed Central

    Nawrocki, Eric P.; Burge, Sarah W.; Bateman, Alex; Daub, Jennifer; Eberhardt, Ruth Y.; Eddy, Sean R.; Floden, Evan W.; Gardner, Paul P.; Jones, Thomas A.; Tate, John; Finn, Robert D.

    2015-01-01

    The Rfam database (available at http://rfam.xfam.org) is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families. PMID:25392425

  15. Management Manual.

    ERIC Educational Resources Information Center

    San Joaquin Delta Community Coll. District, CA.

    This manual articulates the rights, responsibilities, entitlements, and conditions of employment of management personnel at San Joaquin Delta College (SJDC). The manual first presents SJDC's mission statement and then discusses the college's management goals and priorities. An examination of SJDC's administrative organization and a list of…

  16. Quality Manual

    NASA Astrophysics Data System (ADS)

    Koch, Michael

    The quality manual is the “heart” of every management system related to quality. Quality assurance in analytical laboratories is most frequently linked with ISO/IEC 17025, which lists the standard requirements for a quality manual. In this chapter examples are used to demonstrate, how these requirements can be met. But, certainly, there are many other ways to do this.

  17. Resource Manual

    ERIC Educational Resources Information Center

    Human Development Institute, 2008

    2008-01-01

    This manual was designed primarily for use by individuals with developmental disabilities and related conditions. The main focus of this manual is to provide easy-to-read information concerning available resources, and to provide immediate contact information for the purpose of applying for resources and/or locating additional information. The…

  18. VIEWCACHE: An incremental pointer-base access method for distributed databases. Part 1: The universal index system design document. Part 2: The universal index system low-level design document. Part 3: User's guide. Part 4: Reference manual. Part 5: UIMS test suite

    NASA Technical Reports Server (NTRS)

    Kelley, Steve; Roussopoulos, Nick; Sellis, Timos

    1992-01-01

    The goal of the Universal Index System (UIS), is to provide an easy-to-use and reliable interface to many different kinds of database systems. The impetus for this system was to simplify database index management for users, thus encouraging the use of indexes. As the idea grew into an actual system design, the concept of increasing database performance by facilitating the use of time-saving techniques at the user level became a theme for the project. This Final Report describes the Design, the Implementation of UIS, and its Language Interfaces. It also includes the User's Guide and the Reference Manual.

  19. PPDB, the Plant Proteomics Database at Cornell.

    PubMed

    Sun, Qi; Zybailov, Boris; Majeran, Wojciech; Friso, Giulia; Olinares, Paul Dominic B; van Wijk, Klaas J

    2009-01-01

    The Plant Proteomics Database (PPDB; http://ppdb.tc.cornell.edu), launched in 2004, provides an integrated resource for experimentally identified proteins in Arabidopsis and maize (Zea mays). Internal BLAST alignments link maize and Arabidopsis information. Experimental identification is based on in-house mass spectrometry (MS) of cell type-specific proteomes (maize), or specific subcellular proteomes (e.g. chloroplasts, thylakoids, nucleoids) and total leaf proteome samples (maize and Arabidopsis). So far more than 5000 accessions both in maize and Arabidopsis have been identified. In addition, more than 80 published Arabidopsis proteome datasets from subcellular compartments or organs are stored in PPDB and linked to each locus. Using MS-derived information and literature, more than 1500 Arabidopsis proteins have a manually assigned subcellular location, with a strong emphasis on plastid proteins. Additional new features of PPDB include searchable posttranslational modifications and searchable experimental proteotypic peptides and spectral count information for each identified accession based on in-house experiments. Various search methods are provided to extract more than 40 data types for each accession and to extract accessions for different functional categories or curated subcellular localizations. Protein report pages for each accession provide comprehensive overviews, including predicted protein properties, with hyperlinks to the most relevant databases.

  20. GAMOLA2, a Comprehensive Software Package for the Annotation and Curation of Draft and Complete Microbial Genomes.

    PubMed

    Altermann, Eric; Lu, Jingli; McCulloch, Alan

    2017-01-01

    Expert curated annotation remains one of the critical steps in achieving a reliable biological relevant annotation. Here we announce the release of GAMOLA2, a user friendly and comprehensive software package to process, annotate and curate draft and complete bacterial, archaeal, and viral genomes. GAMOLA2 represents a wrapping tool to combine gene model determination, functional Blast, COG, Pfam, and TIGRfam analyses with structural predictions including detection of tRNAs, rRNA genes, non-coding RNAs, signal protein cleavage sites, transmembrane helices, CRISPR repeats and vector sequence contaminations. GAMOLA2 has already been validated in a wide range of bacterial and archaeal genomes, and its modular concept allows easy addition of further functionality in future releases. A modified and adapted version of the Artemis Genome Viewer (Sanger Institute) has been developed to leverage the additional features and underlying information provided by the GAMOLA2 analysis, and is part of the software distribution. In addition to genome annotations, GAMOLA2 features, among others, supplemental modules that assist in the creation of custom Blast databases, annotation transfers between genome versions, and the preparation of Genbank files for submission via the NCBI Sequin tool. GAMOLA2 is intended to be run under a Linux environment, whereas the subsequent visualization and manual curation in Artemis is mobile and platform independent. The development of GAMOLA2 is ongoing and community driven. New functionality can easily be added upon user requests, ensuring that GAMOLA2 provides information relevant to microbiologists. The software is available free of charge for academic use.

  1. GAMOLA2, a Comprehensive Software Package for the Annotation and Curation of Draft and Complete Microbial Genomes

    PubMed Central

    Altermann, Eric; Lu, Jingli; McCulloch, Alan

    2017-01-01

    Expert curated annotation remains one of the critical steps in achieving a reliable biological relevant annotation. Here we announce the release of GAMOLA2, a user friendly and comprehensive software package to process, annotate and curate draft and complete bacterial, archaeal, and viral genomes. GAMOLA2 represents a wrapping tool to combine gene model determination, functional Blast, COG, Pfam, and TIGRfam analyses with structural predictions including detection of tRNAs, rRNA genes, non-coding RNAs, signal protein cleavage sites, transmembrane helices, CRISPR repeats and vector sequence contaminations. GAMOLA2 has already been validated in a wide range of bacterial and archaeal genomes, and its modular concept allows easy addition of further functionality in future releases. A modified and adapted version of the Artemis Genome Viewer (Sanger Institute) has been developed to leverage the additional features and underlying information provided by the GAMOLA2 analysis, and is part of the software distribution. In addition to genome annotations, GAMOLA2 features, among others, supplemental modules that assist in the creation of custom Blast databases, annotation transfers between genome versions, and the preparation of Genbank files for submission via the NCBI Sequin tool. GAMOLA2 is intended to be run under a Linux environment, whereas the subsequent visualization and manual curation in Artemis is mobile and platform independent. The development of GAMOLA2 is ongoing and community driven. New functionality can easily be added upon user requests, ensuring that GAMOLA2 provides information relevant to microbiologists. The software is available free of charge for academic use. PMID:28386247

  2. Manual de Carpinteria (Carpentry Manual).

    ERIC Educational Resources Information Center

    TomSing, Luisa B.

    This manual is part of a Mexican series of instructional materials designed for Spanish speaking adults who are in the process of becoming literate or have recently become literate in their native language. The manual describes a carpentry course that is structured to appeal to the student as a self-directing adult. The following units are…

  3. DBatVir: the database of bat-associated viruses.

    PubMed

    Chen, Lihong; Liu, Bo; Yang, Jian; Jin, Qi

    2014-01-01

    Emerging infectious diseases remain a significant threat to public health. Most emerging infectious disease agents in humans are of zoonotic origin. Bats are important reservoir hosts of many highly lethal zoonotic viruses and have been implicated in numerous emerging infectious disease events in recent years. It is essential to enhance our knowledge and understanding of the genetic diversity of the bat-associated viruses to prevent future outbreaks. To facilitate further research, we constructed the database of bat-associated viruses (DBatVir). Known viral sequences detected in bat samples were manually collected and curated, along with the related metadata, such as the sampling time, location, bat species and specimen type. Additional information concerning the bats, including common names, diet type, geographic distribution and phylogeny were integrated into the database to bridge the gap between virologists and zoologists. The database currently covers >4100 bat-associated animal viruses of 23 viral families detected from 196 bat species in 69 countries worldwide. It provides an overview and snapshot of the current research regarding bat-associated viruses, which is essential now that the field is rapidly expanding. With a user-friendly interface and integrated online bioinformatics tools, DBatVir provides a convenient and powerful platform for virologists and zoologists to analyze the virome diversity of bats, as well as for epidemiologists and public health researchers to monitor and track current and future bat-related infectious diseases. Database URL: http://www.mgc.ac.cn/DBatVir/.

  4. LocSigDB: a database of protein localization signals.

    PubMed

    Negi, Simarjeet; Pandey, Sanjit; Srinivasan, Satish M; Mohammed, Akram; Guda, Chittibabu

    2015-01-01

    LocSigDB (http://genome.unmc.edu/LocSigDB/) is a manually curated database of experimental protein localization signals for eight distinct subcellular locations; primarily in a eukaryotic cell with brief coverage of bacterial proteins. Proteins must be localized at their appropriate subcellular compartment to perform their desired function. Mislocalization of proteins to unintended locations is a causative factor for many human diseases; therefore, collection of known sorting signals will help support many important areas of biomedical research. By performing an extensive literature study, we compiled a collection of 533 experimentally determined localization signals, along with the proteins that harbor such signals. Each signal in the LocSigDB is annotated with its localization, source, PubMed references and is linked to the proteins in UniProt database along with the organism information that contain the same amino acid pattern as the given signal. From LocSigDB webserver, users can download the whole database or browse/search for data using an intuitive query interface. To date, LocSigDB is the most comprehensive compendium of protein localization signals for eight distinct subcellular locations. Database URL: http://genome.unmc.edu/LocSigDB/

  5. HIM-herbal ingredients in-vivo metabolism database

    PubMed Central

    2013-01-01

    Background Herbal medicine has long been viewed as a valuable asset for potential new drug discovery and herbal ingredients’ metabolites, especially the in vivo metabolites were often found to gain better pharmacological, pharmacokinetic and even better safety profiles compared to their parent compounds. However, these herbal metabolite information is still scattered and waiting to be collected. Description HIM database manually collected so far the most comprehensive available in-vivo metabolism information for herbal active ingredients, as well as their corresponding bioactivity, organs and/or tissues distribution, toxicity, ADME and the clinical research profile. Currently HIM contains 361 ingredients and 1104 corresponding in-vivo metabolites from 673 reputable herbs. Tools of structural similarity, substructure search and Lipinski’s Rule of Five are also provided. Various links were made to PubChem, PubMed, TCM-ID (Traditional Chinese Medicine Information database) and HIT (Herbal ingredients’ targets databases). Conclusions A curated database HIM is set up for the in vivo metabolites information of the active ingredients for Chinese herbs, together with their corresponding bioactivity, toxicity and ADME profile. HIM is freely accessible to academic researchers at http://www.bioinformatics.org.cn/. PMID:23721660

  6. The BioGRID interaction database: 2017 update

    PubMed Central

    Chatr-aryamontri, Andrew; Oughtred, Rose; Boucher, Lorrie; Rust, Jennifer; Chang, Christie; Kolas, Nadine K.; O'Donnell, Lara; Oster, Sara; Theesfeld, Chandra; Sellam, Adnane; Stark, Chris; Breitkreutz, Bobby-Joe; Dolinski, Kara; Tyers, Mike

    2017-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical–protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases. PMID:27980099

  7. Managing biological networks by using text mining and computer-aided curation

    NASA Astrophysics Data System (ADS)

    Yu, Seok Jong; Cho, Yongseong; Lee, Min-Ho; Lim, Jongtae; Yoo, Jaesoo

    2015-11-01

    In order to understand a biological mechanism in a cell, a researcher should collect a huge number of protein interactions with experimental data from experiments and the literature. Text mining systems that extract biological interactions from papers have been used to construct biological networks for a few decades. Even though the text mining of literature is necessary to construct a biological network, few systems with a text mining tool are available for biologists who want to construct their own biological networks. We have developed a biological network construction system called BioKnowledge Viewer that can generate a biological interaction network by using a text mining tool and biological taggers. It also Boolean simulation software to provide a biological modeling system to simulate the model that is made with the text mining tool. A user can download PubMed articles and construct a biological network by using the Multi-level Knowledge Emergence Model (KMEM), MetaMap, and A Biomedical Named Entity Recognizer (ABNER) as a text mining tool. To evaluate the system, we constructed an aging-related biological network that consist 9,415 nodes (genes) by using manual curation. With network analysis, we found that several genes, including JNK, AP-1, and BCL-2, were highly related in aging biological network. We provide a semi-automatic curation environment so that users can obtain a graph database for managing text mining results that are generated in the server system and can navigate the network with BioKnowledge Viewer, which is freely available at http://bioknowledgeviewer.kisti.re.kr.

  8. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    PubMed

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics.

  9. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches.

  10. Biological Databases for Behavioral Neurobiology

    PubMed Central

    Baker, Erich J.

    2014-01-01

    Databases are, at their core, abstractions of data and their intentionally derived relationships. They serve as a central organizing metaphor and repository, supporting or augmenting nearly all bioinformatics. Behavioral domains provide a unique stage for contemporary databases, as research in this area spans diverse data types, locations, and data relationships. This chapter provides foundational information on the diversity and prevalence of databases, how data structures support the various needs of behavioral neuroscience analysis and interpretation. The focus is on the classes of databases, data curation, and advanced applications in bioinformatics using examples largely drawn from research efforts in behavioral neuroscience. PMID:23195119

  11. The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching.

    PubMed

    Howe, Douglas G; Bradford, Yvonne M; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte

    2017-01-04

    The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, 'Fish' records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search.

  12. The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching

    PubMed Central

    Howe, Douglas G.; Bradford, Yvonne M.; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte

    2017-01-01

    The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, ‘Fish’ records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search. PMID:27899582

  13. Maize Genetics and Genomics Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The 2007 report for MaizeGDB lists the new hires who will focus on curation/outreach and the genome sequence, respectively. Currently all sequence in the database comes from a PlantGDB pipeline and is presented with deep links to external resources such as PlantGDB, Dana Farber, GenBank, the Arizona...

  14. Refining literature curated protein interactions using expert opinions.

    PubMed

    Tastan, Oznur; Qi, Yanjun; Carbonell, Jaime G; Klein-Seetharaman, Judith

    2015-01-01

    The availability of high-quality physical interaction datasets is a prerequisite for system-level analysis of interactomes and supervised models to predict protein-protein interactions (PPIs). One source is literature-curated PPI databases in which pairwise associations of proteins published in the scientific literature are deposited. However, PPIs may not be clearly labelled as physical interactions affecting the quality of the entire dataset. In order to obtain a high-quality gold standard dataset for PPIs between human immunodeficiency virus (HIV-1) and its human host, we adopted a crowd-sourcing approach. We collected expert opinions and utilized an expectation-maximization based approach to estimate expert labeling quality. These estimates are used to infer the probability of a reported PPI actually being a direct physical interaction given the set of expert opinions. The effectiveness of our approach is demonstrated through synthetic data experiments and a high quality physical interaction network between HIV and human proteins is obtained. Since many literature-curated databases suffer from similar challenges, the framework described herein could be utilized in refining other databases. The curated data is available at http://www.cs.bilkent.edu.tr/~oznur.tastan/supp/psb2015/.

  15. ChimerDB 3.0: an enhanced database for fusion genes from cancer transcriptome and literature data mining.

    PubMed

    Lee, Myunggyo; Lee, Kyubum; Yu, Namhee; Jang, Insu; Choi, Ikjung; Kim, Pora; Jang, Ye Eun; Kim, Byounggun; Kim, Sunkyu; Lee, Byungwook; Kang, Jaewoo; Lee, Sanghyuk

    2017-01-04

    Fusion gene is an important class of therapeutic targets and prognostic markers in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data and manual curations. In this update, the database coverage was enhanced considerably by adding two new modules of The Cancer Genome Atlas (TCGA) RNA-Seq analysis and PubMed abstract mining. ChimerDB 3.0 is composed of three modules of ChimerKB, ChimerPub and ChimerSeq. ChimerKB represents a knowledgebase including 1066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences. ChimerPub includes 2767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq module is designed to archive the fusion candidates from deep sequencing data. Importantly, we have analyzed RNA-Seq data of the TCGA project covering 4569 patients in 23 cancer types using two reliable programs of FusionScan and TopHat-Fusion. The new user interface supports diverse search options and graphic representation of fusion gene structure. ChimerDB 3.0 is available at http://ercsb.ewha.ac.kr/fusiongene/.

  16. ChimerDB 3.0: an enhanced database for fusion genes from cancer transcriptome and literature data mining

    PubMed Central

    Lee, Myunggyo; Lee, Kyubum; Yu, Namhee; Jang, Insu; Choi, Ikjung; Kim, Pora; Jang, Ye Eun; Kim, Byounggun; Kim, Sunkyu; Lee, Byungwook; Kang, Jaewoo; Lee, Sanghyuk

    2017-01-01

    Fusion gene is an important class of therapeutic targets and prognostic markers in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data and manual curations. In this update, the database coverage was enhanced considerably by adding two new modules of The Cancer Genome Atlas (TCGA) RNA-Seq analysis and PubMed abstract mining. ChimerDB 3.0 is composed of three modules of ChimerKB, ChimerPub and ChimerSeq. ChimerKB represents a knowledgebase including 1066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences. ChimerPub includes 2767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq module is designed to archive the fusion candidates from deep sequencing data. Importantly, we have analyzed RNA-Seq data of the TCGA project covering 4569 patients in 23 cancer types using two reliable programs of FusionScan and TopHat-Fusion. The new user interface supports diverse search options and graphic representation of fusion gene structure. ChimerDB 3.0 is available at http://ercsb.ewha.ac.kr/fusiongene/. PMID:27899563

  17. Data curation + process curation=data integration + science.

    PubMed

    Goble, Carole; Stevens, Robert; Hull, Duncan; Wolstencroft, Katy; Lopez, Rodrigo

    2008-11-01

    In bioinformatics, we are familiar with the idea of curated data as a prerequisite for data integration. We neglect, often to our cost, the curation and cataloguing of the processes that we use to integrate and analyse our data. Programmatic access to services, for data and processes, means that compositions of services can be made that represent the in silico experiments or processes that bioinformaticians perform. Data integration through workflows depends on being able to know what services exist and where to find those services. The large number of services and the operations they perform, their arbitrary naming and lack of documentation, however, mean that they can be difficult to use. The workflows themselves are composite processes that could be pooled and reused but only if they too can be found and understood. Thus appropriate curation, including semantic mark-up, would enable processes to be found, maintained and consequently used more easily. This broader view on semantic annotation is vital for full data integration that is necessary for the modern scientific analyses in biology. This article will brief the community on the current state of the art and the current challenges for process curation, both within and without the Life Sciences.

  18. TOWARDS PATHWAY CURATION THROUGH LITERATURE MINING – A CASE STUDY USING PHARMGKB

    PubMed Central

    RAVIKUMAR, K.E.; WAGHOLIKAR, KAVISHWAR B.; LIU, HONGFANG

    2014-01-01

    The creation of biological pathway knowledge bases is largely driven by manual effort to curate based on evidences from the scientific literature. It is highly challenging for the curators to keep up with the literature. Text mining applications have been developed in the last decade to assist human curators to speed up the curation pace where majority of them aim to identify the most relevant papers for curation with little attempt to directly extract the pathway information from text. In this paper, we describe a rule-based literature mining system to extract pathway information from text. We evaluated the system using curated pharmacokinetic (PK) and pharmacodynamic (PD) pathways in PharmGKB. The system achieved an F-measure of 63.11% and 34.99% for entity extraction and event extraction respectively against all PubMed abstracts cited in PharmGKB. It may be possible to improve the system performance by incorporating using statistical machine learning approaches. This study also helped us gain insights into the barriers towards automated event extraction from text for pathway curation. PMID:24297561

  19. dbPPT: a comprehensive database of protein phosphorylation in plants

    PubMed Central

    Cheng, Han; Deng, Wankun; Wang, Yongbo; Ren, Jian; Liu, Zexian; Xue, Yu

    2014-01-01

    As one of the most important protein post-translational modifications, the reversible phosphorylation is critical for plants in regulating a variety of biological processes such as cellular metabolism, signal transduction and responses to environmental stress. Numerous efforts especially large-scale phosphoproteome profiling studies have been contributed to dissect the phosphorylation signaling in various plants, while a large number of phosphorylation events were identified. To provide an integrated data resource for further investigations, here we present a comprehensive database of dbPPT (database of Phosphorylation site in PlanTs, at http://dbppt.biocuckoo.org), which contains experimentally identified phosphorylation sites in proteins from plants. The phosphorylation sites in dbPPT were manually curated from the literatures, whereas datasets in other public databases were also integrated. In total, there were 82 175 phosphorylation sites in 31 012 proteins from 20 plant organisms in dbPPT, presenting a larger quantity of phosphorylation sites and a higher coverage of plant species in comparison with other databases. The proportions of residue types including serine, threonine and tyrosine were 77.99, 17.81 and 4.20%, respectively. All the phosphoproteins and phosphorylation sites in the database were critically annotated. Since the phosphorylation signaling in plants attracted great attention recently, such a comprehensive resource of plant protein phosphorylation can be useful for the research community. Database URL: http://dbppt.biocuckoo.org PMID:25534750

  20. How should the completeness and quality of curated nanomaterial data be evaluated?

    PubMed

    Marchese Robinson, Richard L; Lynch, Iseult; Peijnenburg, Willie; Rumble, John; Klaessig, Fred; Marquardt, Clarissa; Rauscher, Hubert; Puzyn, Tomasz; Purian, Ronit; Åberg, Christoffer; Karcher, Sandra; Vriens, Hanne; Hoet, Peter; Hoover, Mark D; Hendren, Christine Ogilvie; Harper, Stacey L

    2016-05-21

    Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated?

  1. The DrugAge database of aging-related drugs.

    PubMed

    Barardo, Diogo; Thornton, Daniel; Thoppil, Harikrishnan; Walsh, Michael; Sharifi, Samim; Ferreira, Susana; Anžič, Andreja; Fernandes, Maria; Monteiro, Patrick; Grum, Tjaša; Cordeiro, Rui; De-Souza, Evandro Araújo; Budovsky, Arie; Araujo, Natali; Gruber, Jan; Petrascheck, Michael; Fraifeld, Vadim E; Zhavoronkov, Alexander; Moskalev, Alexey; de Magalhães, João Pedro

    2017-03-16

    Aging is a major worldwide medical challenge. Not surprisingly, identifying drugs and compounds that extend lifespan in model organisms is a growing research area. Here, we present DrugAge (http://genomics.senescence.info/drugs/), a curated database of lifespan-extending drugs and compounds. At the time of writing, DrugAge contains 1316 entries featuring 418 different compounds from studies across 27 model organisms, including worms, flies, yeast and mice. Data were manually curated from 324 publications. Using drug-gene interaction data, we also performed a functional enrichment analysis of targets of lifespan-extending drugs. Enriched terms include various functional categories related to glutathione and antioxidant activity, ion transport and metabolic processes. In addition, we found a modest but significant overlap between targets of lifespan-extending drugs and known aging-related genes, suggesting that some but not most aging-related pathways have been targeted pharmacologically in longevity studies. DrugAge is freely available online for the scientific community and will be an important resource for biogerontologists.

  2. Mouse Tumor Biology (MTB): a database of mouse models for human cancer.

    PubMed

    Bult, Carol J; Krupke, Debra M; Begley, Dale A; Richardson, Joel E; Neuhauser, Steven B; Sundberg, John P; Eppig, Janan T

    2015-01-01

    The Mouse Tumor Biology (MTB; http://tumor.informatics.jax.org) database is a unique online compendium of mouse models for human cancer. MTB provides online access to expertly curated information on diverse mouse models for human cancer and interfaces for searching and visualizing data associated with these models. The information in MTB is designed to facilitate the selection of strains for cancer research and is a platform for mining data on tumor development and patterns of metastases. MTB curators acquire data through manual curation of peer-reviewed scientific literature and from direct submissions by researchers. Data in MTB are also obtained from other bioinformatics resources including PathBase, the Gene Expression Omnibus and ArrayExpress. Recent enhancements to MTB improve the association between mouse models and human genes commonly mutated in a variety of cancers as identified in large-scale cancer genomics studies, provide new interfaces for exploring regions of the mouse genome associated with cancer phenotypes and incorporate data and information related to Patient-Derived Xenograft models of human cancers.

  3. Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data.

    PubMed

    Poux, Sylvain; Magrane, Michele; Arighi, Cecilia N; Bridge, Alan; O'Donovan, Claire; Laiho, Kati

    2014-01-01

    UniProtKB/Swiss-Prot provides expert curation with information extracted from literature and curator-evaluated computational analysis. As knowledgebases continue to play an increasingly important role in scientific research, a number of studies have evaluated their accuracy and revealed various errors. While some are curation errors, others are the result of incorrect information published in the scientific literature. By taking the example of sirtuin-5, a complex annotation case, we will describe the curation procedure of UniProtKB/Swiss-Prot and detail how we report conflicting information in the database. We will demonstrate the importance of collaboration between resources to ensure curation consistency and the value of contributions from the user community in helping maintain error-free resources. Database URL: www.uniprot.org.

  4. IMAT graphics manual

    NASA Technical Reports Server (NTRS)

    Stockwell, Alan E.; Cooper, Paul A.

    1991-01-01

    The Integrated Multidisciplinary Analysis Tool (IMAT) consists of a menu driven executive system coupled with a relational database which links commercial structures, structural dynamics and control codes. The IMAT graphics system, a key element of the software, provides a common interface for storing, retrieving, and displaying graphical information. The IMAT Graphics Manual shows users of commercial analysis codes (MATRIXx, MSC/NASTRAN and I-DEAS) how to use the IMAT graphics system to obtain high quality graphical output using familiar plotting procedures. The manual explains the key features of the IMAT graphics system, illustrates their use with simple step-by-step examples, and provides a reference for users who wish to take advantage of the flexibility of the software to customize their own applications.

  5. Biosafety Manual

    SciTech Connect

    King, Bruce W.

    2010-05-18

    Work with or potential exposure to biological materials in the course of performing research or other work activities at Lawrence Berkeley National Laboratory (LBNL) must be conducted in a safe, ethical, environmentally sound, and compliant manner. Work must be conducted in accordance with established biosafety standards, the principles and functions of Integrated Safety Management (ISM), this Biosafety Manual, Chapter 26 (Biosafety) of the Health and Safety Manual (PUB-3000), and applicable standards and LBNL policies. The purpose of the Biosafety Program is to protect workers, the public, agriculture, and the environment from exposure to biological agents or materials that may cause disease or other detrimental effects in humans, animals, or plants. This manual provides workers; line management; Environment, Health, and Safety (EH&S) Division staff; Institutional Biosafety Committee (IBC) members; and others with a comprehensive overview of biosafety principles, requirements from biosafety standards, and measures needed to control biological risks in work activities and facilities at LBNL.

  6. IPD: the Immuno Polymorphism Database.

    PubMed

    Robinson, James; Marsh, Steven G E

    2007-01-01

    The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer cell immunoglobulin-like receptors (KIRs); IPD-MHC, a database of sequences of the major histocompatibility complex (MHC) of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC, share the same database structure.

  7. Teacher Training in Curative Education.

    ERIC Educational Resources Information Center

    Juul, Kristen D.; Maier, Manfred

    1992-01-01

    This article considers the application of the philosophical and educational principles of Rudolf Steiner, called "anthroposophy," to the training of teachers and curative educators in the Waldorf schools. Special emphasis is on the Camphill movement which focuses on therapeutic schools and communities for children with special needs. (DB)

  8. Cognitive Curations of Collaborative Curricula

    ERIC Educational Resources Information Center

    Ackerman, Amy S.

    2015-01-01

    Assuming the role of learning curators, 22 graduate students (in-service teachers) addressed authentic problems (challenges) within their respective classrooms by selecting digital tools as part of implementation of interdisciplinary lesson plans. Students focused on formative assessment tools as a means to gather evidence to make improvements in…

  9. ECOTONE Manual

    DTIC Science & Technology

    2005-11-01

    ER D C /C ER L C R -0 5- 2 ECOTONE Manual Tamara Hochstrasser and Debra Peters November 2005 C on st ru ct io n E ng in ee ri ng R es ea rc...h La bo ra to ry Approved for public release; distribution is unlimited. ERDC/CERL CR-05-2 November 2005 ECOTONE Manual Tamara...tools, and procedures to periodically assess and evaluate land condition. One tool, the ECOTONE model, was set up to simulate vegetation recovery from

  10. Foundry Manual

    DTIC Science & Technology

    1958-01-01

    AD-A956 386 · ~ 111111!1 illl 111111111 11!11 1/ l lllllf II!! llli DTIC SELECllED · APR091992 D t • Best Available Copy NAVSHIPS 250-0334 I...FOUNDRY MANUAL JAN BUREAU OF SHIPS NAVY DEPARTMENT WASHINGTON 25, D. C l’iiiTKlü ’h* Superintendent o( l )o*nmpnR^M^I?viTnmfnt P/lntlng ()fflo...Shipbuilding and Fleet Maintenance * »..•,. »^ ■ - • • i ♦■»■•< I . ,«,: >i > l I « ■. ■ ii » ■ _. — • I PREFACE This Manual is

  11. DaTo: an atlas of biological databases and tools.

    PubMed

    Li, Qilin; Zhou, Yincong; Jiao, Yingmin; Zhang, Zhao; Bai, Lin; Tong, Li; Yang, Xiong; Sommer, Björn; Hofestädt, Ralf; Chen, Ming

    2016-12-18

    This work presents DaTo, a semi-automatically generated world atlas of biological databases and tools. It extracts raw information from all PubMed articles which contain exact URLs in their abstract section, followed by a manual curation of the abstract and the URL accessibility. DaTo features a user-friendly query interface, providing extensible URL-related annotations, such as the status, the location and the country of the URL. A graphical interaction network browser has also been integrated into the DaTo web interface to facilitate exploration of the relationship between different tools and databases with respect to their ontology-based semantic similarity. Using DaTo, the geographical locations, the health statuses, as well as the journal associations were evaluated with respect to the historical development of bioinformatics tools and databases over the last 20 years. We hope it will inspire the biological community to gain a systematic insight into bioinformatics resources. DaTo is accessible via http://bis.zju.edu.cn/DaTo/.

  12. SpliceDisease database: linking RNA splicing and disease.

    PubMed

    Wang, Juan; Zhang, Jie; Li, Kaibo; Zhao, Wei; Cui, Qinghua

    2012-01-01

    RNA splicing is an important aspect of gene regulation in many organisms. Splicing of RNA is regulated by complicated mechanisms involving numerous RNA-binding proteins and the intricate network of interactions among them. Mutations in cis-acting splicing elements or its regulatory proteins have been shown to be involved in human diseases. Defects in pre-mRNA splicing process have emerged as a common disease-causing mechanism. Therefore, a database integrating RNA splicing and disease associations would be helpful for understanding not only the RNA splicing but also its contribution to disease. In SpliceDisease database, we manually curated 2337 splicing mutation disease entries involving 303 genes and 370 diseases, which have been supported experimentally in 898 publications. The SpliceDisease database provides information including the change of the nucleotide in the sequence, the location of the mutation on the gene, the reference Pubmed ID and detailed description for the relationship among gene mutations, splicing defects and diseases. We standardized the names of the diseases and genes and provided links for these genes to NCBI and UCSC genome browser for further annotation and genomic sequences. For the location of the mutation, we give direct links of the entry to the respective position/region in the genome browser. The users can freely browse, search and download the data in SpliceDisease at http://cmbi.bjmu.edu.cn/sdisease.

  13. The Lactamase Engineering Database: a critical survey of TEM sequences in public databases

    PubMed Central

    Thai, Quan Ke; Bös, Fabian; Pleiss, Jürgen

    2009-01-01

    Background TEM β-lactamases are the main cause for resistance against β-lactam antibiotics. Sequence information about TEM β-lactamases is mainly found in the NCBI peptide database and TEM mutation table at . While the TEM mutation table is manually curated by experts in the lactamase field, who guarantee reliable and consistent information, the rapidly growing sequence and annotation information from the NCBI peptide database is sometimes inconsistent. Therefore, the Lactamase Engineering Database has been developed to collect the TEM β-lactamase sequences from the NCBI peptide database and the TEM mutation table, systematically compare sequence information and naming, identify inconsistencies, and thus provide a versatile tool for reconciliation of data and for an investigation of the sequence-function relationship. Description The LacED currently provides 2399 sequence entries and 37 structure entries. Sequence information on 150 different TEM β-lactamases was derived from the TEM mutation table which provides a unique number to each protein classified as TEM β-lactamase. 293 TEM-like proteins were found in the NCBI protein database, but only 113 TEM β-lactamase were common to both data sets. The 180 TEM β-lactamases from the NCBI protein database which have not yet been assigned to a TEM number fall in three classes: (1) 89 proteins from microbial organisms and 35 proteins from cloning or expression vectors had a new mutation profile; (2) 55 proteins had inconsistent annotation in terms of TEM assignment or reported mutation profile; (3) 39 proteins are fragments. The LacED is web accessible at and contains multisequence alignments, structure information and reconciled annotation of TEM β-lactamases. The LacED is weekly updated and supplies all data for download. Conclusion The Lactamase Engineering Database enables a systematic analysis of TEM β-lactamase sequence and annotation data from different data sources, and thus provides a valuable tool to

  14. Cancer Stem Cells Therapeutic Target Database: The First Comprehensive Database for Therapeutic Targets of Cancer Stem Cells.

    PubMed

    Hu, Xiaoqing; Cong, Ye; Luo, Huizhe Howard; Wu, Sijin; Zhao, Liyuan Eric; Liu, Quentin; Yang, Yongliang

    2017-02-01

    Cancer stem cells (CSCs) are a subpopulation of tumor cells that have strong self-renewal capabilities and may contribute to the failure of conventional cancer therapies. Hence, therapeutics homing in on CSCs represent a novel and promising approach that may eradicate malignant tumors. However, the lack of information on validated targets of CSCs has greatly hindered the development of CSC-directed therapeutics. Herein, we describe the Cancer Stem Cells Therapeutic Target Database (CSCTT), the first online database to provide a rich bioinformatics resource for the display, search, and analysis of structure, function, and related annotation for therapeutic targets of cancer stem cells. CSCTT contains 135 proteins that are potential targets of CSCs, with validated experimental evidence manually curated from existing literatures. Proteins are carefully annotated with a detailed description of protein families, biological process, related diseases, and experimental evidences. In addition, CSCTT has compiled 213 documented therapeutic methods for cancer stem cells, including 118 small molecules and 20 biotherapy methods. The CSCTT may serve as a useful platform for the development of CSC-directed therapeutics against various malignant tumors. The CSCTT database is freely available to the public at http://www.csctt.org/. Stem Cells Translational Medicine 2017;6:331-334.

  15. The Candidate Cancer Gene Database: a database of cancer driver genes from forward genetic screens in mice.

    PubMed

    Abbott, Kenneth L; Nyre, Erik T; Abrahante, Juan; Ho, Yen-Yi; Isaksson Vogel, Rachel; Starr, Timothy K

    2015-01-01

    Identification of cancer driver gene mutations is crucial for advancing cancer therapeutics. Due to the overwhelming number of passenger mutations in the human tumor genome, it is difficult to pinpoint causative driver genes. Using transposon mutagenesis in mice many laboratories have conducted forward genetic screens and identified thousands of candidate driver genes that are highly relevant to human cancer. Unfortunately, this information is difficult to access and utilize because it is scattered across multiple publications using different mouse genome builds and strength metrics. To improve access to these findings and facilitate meta-analyses, we developed the Candidate Cancer Gene Database (CCGD, http://ccgd-starrlab.oit.umn.edu/). The CCGD is a manually curated database containing a unified description of all identified candidate driver genes and the genomic location of transposon common insertion sites (CISs) from all currently published transposon-based screens. To demonstrate relevance to human cancer, we performed a modified gene set enrichment analysis using KEGG pathways and show that human cancer pathways are highly enriched in the database. We also used hierarchical clustering to identify pathways enriched in blood cancers compared to solid cancers. The CCGD is a novel resource available to scientists interested in the identification of genetic drivers of cancer.

  16. Recreation Manual.

    ERIC Educational Resources Information Center

    North Dakota Farmers Union, Jamestown. Dept. of Youth Activities.

    Suggestions for recreational activities are outlined in this manual. Instructions are given for games to play in small places, home or party games, paper and pencil games, children's singing games, and dances. Ideas for crafts and special parties are also included. (SW)

  17. Boilermaking Manual.

    ERIC Educational Resources Information Center

    British Columbia Dept. of Education, Victoria.

    This manual is intended (1) to provide an information resource to supplement the formal training program for boilermaker apprentices; (2) to assist the journeyworker to build on present knowledge to increase expertise and qualify for formal accreditation in the boilermaking trade; and (3) to serve as an on-the-job reference with sound, up-to-date…

  18. Carbohydrate structure database merged from bacterial, archaeal, plant and fungal parts.

    PubMed

    Toukach, Philip V; Egorova, Ksenia S

    2016-01-04

    The Carbohydrate Structure Databases (CSDBs, http://csdb.glycoscience.ru) store structural, bibliographic, taxonomic, NMR spectroscopic, and other data on natural carbohydrates and their derivatives published in the scientific literature. The CSDB project was launched in 2005 for bacterial saccharides (as BCSDB). Currently, it includes two parts, the Bacterial CSDB and the Plant&Fungal CSDB. In March 2015, these databases were merged to the single CSDB. The combined CSDB includes information on bacterial and archaeal glycans and derivatives (the coverage is close to complete), as well as on plant and fungal glycans and glycoconjugates (almost all structures published up to 1998). CSDB is regularly updated via manual expert annotation of original publications. Both newly annotated data and data imported from other databases are manually curated. The CSDB data are exportable in a number of modern formats, such as GlycoRDF. CSDB provides additional services for simulation of (1)H, (13)C and 2D NMR spectra of saccharides, NMR-based structure prediction, glycan-based taxon clustering and other.

  19. Carbohydrate structure database merged from bacterial, archaeal, plant and fungal parts

    PubMed Central

    Toukach, Philip V.; Egorova, Ksenia S.

    2016-01-01

    The Carbohydrate Structure Databases (CSDBs, http://csdb.glycoscience.ru) store structural, bibliographic, taxonomic, NMR spectroscopic, and other data on natural carbohydrates and their derivatives published in the scientific literature. The CSDB project was launched in 2005 for bacterial saccharides (as BCSDB). Currently, it includes two parts, the Bacterial CSDB and the Plant&Fungal CSDB. In March 2015, these databases were merged to the single CSDB. The combined CSDB includes information on bacterial and archaeal glycans and derivatives (the coverage is close to complete), as well as on plant and fungal glycans and glycoconjugates (almost all structures published up to 1998). CSDB is regularly updated via manual expert annotation of original publications. Both newly annotated data and data imported from other databases are manually curated. The CSDB data are exportable in a number of modern formats, such as GlycoRDF. CSDB provides additional services for simulation of 1H, 13C and 2D NMR spectra of saccharides, NMR-based structure prediction, glycan-based taxon clustering and other. PMID:26286194

  20. Disease model curation improvements at Mouse Genome Informatics

    PubMed Central

    Bello, Susan M.; Richardson, Joel E.; Davis, Allan P.; Wiegers, Thomas C.; Mattingly, Carolyn J.; Dolan, Mary E.; Smith, Cynthia L.; Blake, Judith A.; Eppig, Janan T.

    2012-01-01

    Optimal curation of human diseases requires an ontology or structured vocabulary that contains terms familiar to end users, is robust enough to support multiple levels of annotation granularity, is limited to disease terms and is stable enough to avoid extensive reannotation following updates. At Mouse Genome Informatics (MGI), we currently use disease terms from Online Mendelian Inheritance in Man (OMIM) to curate mouse models of human disease. While OMIM provides highly detailed disease records that are familiar to many in the medical community, it lacks structure to support multilevel annotation. To improve disease annotation at MGI, we evaluated the merged Medical Subject Headings (MeSH) and OMIM disease vocabulary created by the Comparative Toxicogenomics Database (CTD) project. Overlaying MeSH onto OMIM provides hierarchical access to broad disease terms, a feature missing from the OMIM. We created an extended version of the vocabulary to meet the genetic disease-specific curation needs at MGI. Here we describe our evaluation of the CTD application, the extensions made by MGI and discuss the strengths and weaknesses of this approach. Database URL: http://www.informatics.jax.org/ PMID:22434831

  1. The Web-Based DNA Vaccine Database DNAVaxDB and Its Usage for Rational DNA Vaccine Design.

    PubMed

    Racz, Rebecca; He, Yongqun

    2016-01-01

    A DNA vaccine is a vaccine that uses a mammalian expression vector to express one or more protein antigens and is administered in vivo to induce an adaptive immune response. Since the 1990s, a significant amount of research has been performed on DNA vaccines and the mechanisms behind them. To meet the needs of the DNA vaccine research community, we created DNAVaxDB ( http://www.violinet.org/dnavaxdb ), the first Web-based database and analysis resource of experimentally verified DNA vaccines. All the data in DNAVaxDB, which includes plasmids, antigens, vaccines, and sources, is manually curated and experimentally verified. This chapter goes over the detail of DNAVaxDB system and shows how the DNA vaccine database, combined with the Vaxign vaccine design tool, can be used for rational design of a DNA vaccine against a pathogen, such as Mycobacterium bovis.

  2. The Comparative Toxicogenomics Database: update 2017

    PubMed Central

    Davis, Allan Peter; Grondin, Cynthia J.; Johnson, Robin J.; Sciaky, Daniela; King, Benjamin L.; McMorran, Roy; Wiegers, Jolene; Wiegers, Thomas C.; Mattingly, Carolyn J.

    2017-01-01

    The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) provides information about interactions between chemicals and gene products, and their relationships to diseases. Core CTD content (chemical-gene, chemical-disease and gene-disease interactions manually curated from the literature) are integrated with each other as well as with select external datasets to generate expanded networks and predict novel associations. Today, core CTD includes more than 30.5 million toxicogenomic connections relating chemicals/drugs, genes/proteins, diseases, taxa, Gene Ontology (GO) annotations, pathways, and gene interaction modules. In this update, we report a 33% increase in our core data content since 2015, describe our new exposure module (that harmonizes exposure science information with core toxicogenomic data) and introduce a novel dataset of GO-disease inferences (that identify common molecular underpinnings for seemingly unrelated pathologies). These advancements centralize and contextualize real-world chemical exposures with molecular pathways to help scientists generate testable hypotheses in an effort to understand the etiology and mechanisms underlying environmentally influenced diseases. PMID:27651457

  3. Biological databases for human research.

    PubMed

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-02-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation.

  4. Clean and Cold Sample Curation

    NASA Technical Reports Server (NTRS)

    Allen, C. C.; Agee, C. B.; Beer, R.; Cooper, B. L.

    2000-01-01

    Curation of Mars samples includes both samples that are returned to Earth, and samples that are collected, examined, and archived on Mars. Both kinds of curation operations will require careful planning to ensure that the samples are not contaminated by the instruments that are used to collect and contain them. In both cases, sample examination and subdivision must take place in an environment that is organically, inorganically, and biologically clean. Some samples will need to be prepared for analysis under ultra-clean or cryogenic conditions. Inorganic and biological cleanliness are achievable separately by cleanroom and biosafety lab techniques. Organic cleanliness to the <50 ng/sq cm level requires material control and sorbent removal - techniques being applied in our Class 10 cleanrooms and sample processing gloveboxes.

  5. Tools and Databases of the KOMICS Web Portal for Preprocessing, Mining, and Dissemination of Metabolomics Data

    PubMed Central

    Enomoto, Mitsuo; Morishita, Yoshihiko; Kurabayashi, Atsushi; Iijima, Yoko; Ogata, Yoshiyuki; Nakajima, Daisuke; Suzuki, Hideyuki; Shibata, Daisuke

    2014-01-01

    A metabolome—the collection of comprehensive quantitative data on metabolites in an organism—has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data. PMID:24949426

  6. Tools and databases of the KOMICS web portal for preprocessing, mining, and dissemination of metabolomics data.

    PubMed

    Sakurai, Nozomu; Ara, Takeshi; Enomoto, Mitsuo; Motegi, Takeshi; Morishita, Yoshihiko; Kurabayashi, Atsushi; Iijima, Yoko; Ogata, Yoshiyuki; Nakajima, Daisuke; Suzuki, Hideyuki; Shibata, Daisuke

    2014-01-01

    A metabolome--the collection of comprehensive quantitative data on metabolites in an organism--has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data.

  7. FR database 1.0: a resource focused on fruit development and ripening

    PubMed Central

    Yue, Junyang; Ma, Xiaojing; Ban, Rongjun; Huang, Qianli; Wang, Wenjie; Liu, Jia; Liu, Yongsheng

    2015-01-01

    Fruits form unique growing period in the life cycle of higher plants. They provide essential nutrients and have beneficial effects on human health. Characterizing the genes involved in fruit development and ripening is fundamental to understanding the biological process and improving horticultural crops. Although, numerous genes that have been characterized are participated in regulating fruit development and ripening at different stages, no dedicated bioinformatic resource for fruit development and ripening is available. In this study, we have developed such a database, FR database 1.0, using manual curation from 38 423 articles published before 1 April 2014, and integrating protein interactomes and several transcriptome datasets. It provides detailed information for 904 genes derived from 53 organisms reported to participate in fleshy fruit development and ripening. Genes from climacteric and non-climacteric fruits are also annotated, with several interesting Gene Ontology (GO) terms being enriched for these two gene sets and seven ethylene-related GO terms found only in the climacteric fruit group. Furthermore, protein–protein interaction analysis by integrating information from FR database presents the possible function network that affects fleshy fruit size formation. Collectively, FR database will be a valuable platform for comprehensive understanding and future experiments in fruit biology. Database URL: http://www.fruitech.org/ PMID:25725058

  8. Sharing and community curation of mass spectrometry data with GNPS

    PubMed Central

    Nguyen, Don Duy; Watrous, Jeramie; Kapono, Clifford A; Luzzatto-Knaan, Tal; Porto, Carla; Bouslimani, Amina; Melnik, Alexey V; Meehan, Michael J; Liu, Wei-Ting; Crüsemann, Max; Boudreau, Paul D; Esquenazi, Eduardo; Sandoval-Calderón, Mario; Kersten, Roland D; Pace, Laura A; Quinn, Robert A; Duncan, Katherine R; Hsu, Cheng-Chih; Floros, Dimitrios J; Gavilan, Ronnie G; Kleigrewe, Karin; Northen, Trent; Dutton, Rachel J; Parrot, Delphine; Carlson, Erin E; Aigle, Bertrand; Michelsen, Charlotte F; Jelsbak, Lars; Sohlenkamp, Christian; Pevzner, Pavel; Edlund, Anna; McLean, Jeffrey; Piel, Jörn; Murphy, Brian T; Gerwick, Lena; Liaw, Chih-Chuang; Yang, Yu-Liang; Humpf, Hans-Ulrich; Maansson, Maria; Keyzers, Robert A; Sims, Amy C; Johnson, Andrew R.; Sidebottom, Ashley M; Sedio, Brian E; Klitgaard, Andreas; Larson, Charles B; P., Cristopher A Boya; Torres-Mendoza, Daniel; Gonzalez, David J; Silva, Denise B; Marques, Lucas M; Demarque, Daniel P; Pociute, Egle; O'Neill, Ellis C; Briand, Enora; Helfrich, Eric J. N.; Granatosky, Eve A; Glukhov, Evgenia; Ryffel, Florian; Houson, Hailey; Mohimani, Hosein; Kharbush, Jenan J; Zeng, Yi; Vorholt, Julia A; Kurita, Kenji L; Charusanti, Pep; McPhail, Kerry L; Nielsen, Kristian Fog; Vuong, Lisa; Elfeki, Maryam; Traxler, Matthew F; Engene, Niclas; Koyama, Nobuhiro; Vining, Oliver B; Baric, Ralph; Silva, Ricardo R; Mascuch, Samantha J; Tomasi, Sophie; Jenkins, Stefan; Macherla, Venkat; Hoffman, Thomas; Agarwal, Vinayak; Williams, Philip G; Dai, Jingqui; Neupane, Ram; Gurr, Joshua; Rodríguez, Andrés M. C.; Lamsa, Anne; Zhang, Chen; Dorrestein, Kathleen; Duggan, Brendan M; Almaliti, Jehad; Allard, Pierre-Marie; Phapale, Prasad; Nothias, Louis-Felix; Alexandrov, Theodore; Litaudon, Marc; Wolfender, Jean-Luc; Kyle, Jennifer E; Metz, Thomas O; Peryea, Tyler; Nguyen, Dac-Trung; VanLeer, Danielle; Shinn, Paul; Jadhav, Ajit; Müller, Rolf; Waters, Katrina M; Shi, Wenyuan; Liu, Xueting; Zhang, Lixin; Knight, Rob; Jensen, Paul R; Palsson, Bernhard O; Pogliano, Kit; Linington, Roger G; Gutiérrez, Marcelino; Lopes, Norberto P; Gerwick, William H; Moore, Bradley S; Dorrestein, Pieter C; Bandeira, Nuno

    2017-01-01

    The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry techniques are well-suited to high-throughput characterization of natural products, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social molecular networking (GNPS, http://gnps.ucsd.edu), an open-access knowledge base for community wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of ‘living data’ through continuous reanalysis of deposited data. PMID:27504778

  9. National Radiobiology Archives Distributed Access user's manual

    SciTech Connect

    Watson, C.; Smith, S. ); Prather, J. )

    1991-11-01

    This User's Manual describes installation and use of the National Radiobiology Archives (NRA) Distributed Access package. The package consists of a distributed subset of information representative of the NRA databases and database access software which provide an introduction to the scope and style of the NRA Information Systems.

  10. SUBTLE Manual.

    DTIC Science & Technology

    1981-01-01

    behavior). The syntax of SUBTLE is a prefix version of the language of predicate calculus and is identical to that used by MRS [Genesereth, Greiner, Smith...detect incompleteness or inconsistency. For further information on SHAM, the reader should see [ Grinberg and Lark]. Chapter 2 of this manual describes...execution continues at the next sequential statement. (CONDITION <pred> <then>) which is identical to the first form except that there is no <else> function

  11. MET network in PubMed: a text-mined network visualization and curation system

    PubMed Central

    Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian

    2016-01-01

    Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET. Database URL: http://btm.tmu.edu.tw/metastasisway PMID:27242035

  12. MET network in PubMed: a text-mined network visualization and curation system.

    PubMed

    Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian

    2016-01-01

    Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway.

  13. The GOA database: Gene Ontology annotation updates for 2015

    PubMed Central

    Huntley, Rachael P.; Sawford, Tony; Mutowo-Meullenet, Prudence; Shypitsyna, Aleksandra; Bonilla, Carlos; Martin, Maria J.; O'Donovan, Claire

    2015-01-01

    The Gene Ontology Annotation (GOA) resource (http://www.ebi.ac.uk/GOA) provides evidence-based Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB). Manual annotations provided by UniProt curators are supplemented by manual and automatic annotations from model organism databases and specialist annotation groups. GOA currently supplies 368 million GO annotations to almost 54 million proteins in more than 480 000 taxonomic groups. The resource now provides annotations to five times the number of proteins it did 4 years ago. As a member of the GO Consortium, we adhere to the most up-to-date Consortium-agreed annotation guidelines via the use of quality control checks that ensures that the GOA resource supplies high-quality functional information to proteins from a wide range of species. Annotations from GOA are freely available and are accessible through a powerful web browser as well as a variety of annotation file formats. PMID:25378336

  14. The Listeria monocytogenes strain 10403S BioCyc database.

    PubMed

    Orsi, Renato H; Bergholz, Teresa M; Wiedmann, Martin; Boor, Kathryn J

    2015-01-01

    Listeria monocytogenes is a food-borne pathogen of humans and other animals. The striking ability to survive several stresses usually used for food preservation makes L. monocytogenes one of the biggest concerns to the food industry, while the high mortality of listeriosis in specific groups of humans makes it a great concern for public health. Previous studies have shown that a regulatory network involving alternative sigma (σ) factors and transcription factors is pivotal to stress survival. However, few studies have evaluated at the metabolic networks controlled by these regulatory mechanisms. The L. monocytogenes BioCyc database uses the strain 10403S as a model. Computer-generated initial annotation for all genes also allowed for identification, annotation and display of predicted reactions and pathways carried out by a single cell. Further ongoing manual curation based on published data as well as database mining for selected genes allowed the more refined annotation of functions, which, in turn, allowed for annotation of new pathways and fine-tuning of previously defined pathways to more L. monocytogenes-specific pathways. Using RNA-Seq data, several transcription start sites and promoter regions were mapped to the 10403S genome and annotated within the database. Additionally, the identification of promoter regions and a comprehensive review of available literature allowed the annotation of several regulatory interactions involving σ factors and transcription factors. The L. monocytogenes 10403S BioCyc database is a new resource for researchers studying Listeria and related organisms. It allows users to (i) have a comprehensive view of all reactions and pathways predicted to take place within the cell in the cellular overview, as well as to (ii) upload their own data, such as differential expression data, to visualize the data in the scope of predicted pathways and regulatory networks and to carry on enrichment analyses using several different annotations

  15. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

    PubMed

    Reddy, T B K; Thomas, Alex D; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A; Kyrpides, Nikos C

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  16. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  17. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    SciTech Connect

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  18. HistoneDB 2.0: a histone database with variants—an integrated resource to explore histones and their variants

    PubMed Central

    Draizen, Eli J.; Shaytan, Alexey K.; Mariño-Ramírez, Leonardo; Talbert, Paul B.; Landsman, David; Panchenko, Anna R.

    2016-01-01

    Compaction of DNA into chromatin is a characteristic feature of eukaryotic organisms. The core (H2A, H2B, H3, H4) and linker (H1) histone proteins are responsible for this compaction through the formation of nucleosomes and higher order chromatin aggregates. Moreover, histones are intricately involved in chromatin functioning and provide a means for genome dynamic regulation through specific histone variants and histone post-translational modifications. ‘HistoneDB 2.0 – with variants’ is a comprehensive database of histone protein sequences, classified by histone types and variants. All entries in the database are supplemented by rich sequence and structural annotations with many interactive tools to explore and compare sequences of different variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant. HistoneDB 2.0 is a resource for the interactive comparative analysis of histone protein sequences and their implications for chromatin function. Database URL: http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0 PMID:26989147

  19. The Human Intermediate Filament Database: comprehensive information on a gene family involved in many human diseases.

    PubMed

    Szeverenyi, Ildiko; Cassidy, Andrew J; Chung, Cheuk Wang; Lee, Bernett T K; Common, John E A; Ogg, Stephen C; Chen, Huijia; Sim, Shu Yin; Goh, Walter L P; Ng, Kee Woei; Simpson, John A; Chee, Li Lian; Eng, Goi Hui; Li, Bin; Lunny, Declan P; Chuon, Danny; Venkatesh, Aparna; Khoo, Kian Hoe; McLean, W H Irwin; Lim, Yun Ping; Lane, E Birgitte

    2008-03-01

    We describe a revised and expanded database on human intermediate filament proteins, a major component of the eukaryotic cytoskeleton. The family of 70 intermediate filament genes (including those encoding keratins, desmins, and lamins) is now known to be associated with a wide range of diverse diseases, at least 72 distinct human pathologies, including skin blistering, muscular dystrophy, cardiomyopathy, premature aging syndromes, neurodegenerative disorders, and cataract. To date, the database catalogs 1,274 manually-curated pathogenic sequence variants and 170 allelic variants in intermediate filament genes from over 459 peer-reviewed research articles. Unrelated cases were collected from all of the six sequence homology groups and the sequence variations were described at cDNA and protein levels with links to the related diseases and reference articles. The mutations and polymorphisms are presented in parallel with data on protein structure, gene, and chromosomal location and basic information on associated diseases. Detailed statistics relating to the variants records in the database are displayed by homology group, mutation type, affected domain, associated diseases, and nucleic and amino acid substitutions. Multiple sequence alignment algorithms can be run from queries to determine DNA or protein sequence conservation. Literature sources can be interrogated within the database and external links are provided to public databases. The database is freely and publicly accessible online at www.interfil.org (last accessed 13 September 2007). Users can query the database by various keywords and the search results can be downloaded. It is anticipated that the Human Intermediate Filament Database (HIFD) will provide a useful resource to study human genome variations for basic scientists, clinicians, and students alike.

  20. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies.

    PubMed

    Schnoes, Alexandra M; Brown, Shoshana D; Dodevski, Igor; Babbitt, Patricia C

    2009-12-01

    Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%-63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with "overprediction" of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation.

  1. HIVsirDB: A Database of HIV Inhibiting siRNAs

    PubMed Central

    Thakur, Nishant; Sharma, Arun; Raghava, Gajendra P. S.; Kumar, Manoj

    2011-01-01

    Background Human immunodeficiency virus (HIV) is responsible for millions of deaths every year. The current treatment involves the use of multiple antiretroviral agents that may harm patients due to their toxic nature. RNA interference (RNAi) is a potent candidate for the future treatment of HIV, uses short interfering RNA (siRNA/shRNA) for silencing HIV genes. In this study, attempts have been made to create a database HIVsirDB of siRNAs responsible for silencing HIV genes. Descriptions HIVsirDB is a manually curated database of HIV inhibiting siRNAs that provides comprehensive information about each siRNA or shRNA. Information was collected and compiled from literature and public resources. This database contains around 750 siRNAs that includes 75 partially complementary siRNAs differing by one or more bases with the target sites and over 100 escape mutant sequences. HIVsirDB structure contains sixteen fields including siRNA sequence, HIV strain, targeted genome region, efficacy and conservation of target sequences. In order to facilitate user, many tools have been integrated in this database that includes; i) siRNAmap for mapping siRNAs on target sequence, ii) HIVsirblast for BLAST search against database, iii) siRNAalign for aligning siRNAs. Conclusion HIVsirDB is a freely accessible database of siRNAs which can silence or degrade HIV genes. It covers 26 types of HIV strains and 28 cell types. This database will be very useful for developing models for predicting efficacy of HIV inhibiting siRNAs. In summary this is a useful resource for researchers working in the field of siRNA based HIV therapy. HIVsirDB database is accessible at http://crdd.osdd.net/raghava/hivsir/. PMID:22022467

  2. ArachnoServer: a database of protein toxins from spiders

    PubMed Central

    2009-01-01

    Background Venomous animals incapacitate their prey using complex venoms that can contain hundreds of unique protein toxins. The realisation that many of these toxins may have pharmaceutical and insecticidal potential due to their remarkable potency and selectivity against target receptors has led to an explosion in the number of new toxins being discovered and characterised. From an evolutionary perspective, spiders are the most successful venomous animals and they maintain by far the largest pool of toxic peptides. However, at present, there are no databases dedicated to spider toxins and hence it is difficult to realise their full potential as drugs, insecticides, and pharmacological probes. Description We have developed ArachnoServer, a manually curated database that provides detailed information about proteinaceous toxins from spiders. Key features of ArachnoServer include a new molecular target ontology designed especially for venom toxins, the most up-to-date taxonomic information available, and a powerful advanced search interface. Toxin information can be browsed through dynamic trees, and each toxin has a dedicated page summarising all available information about its sequence, structure, and biological activity. ArachnoServer currently manages 567 protein sequences, 334 nucleic acid sequences, and 51 protein structures. Conclusion ArachnoServer provides a single source of high-quality information about proteinaceous spider toxins that will be an invaluable resource for pharmacologists, neuroscientists, toxinologists, medicinal chemists, ion channel scientists, clinicians, and structural biologists. ArachnoServer is available online at http://www.arachnoserver.org. PMID:19674480

  3. The human DEPhOsphorylation database DEPOD: a 2015 update

    PubMed Central

    Duan, Guangyou; Li, Xun; Köhn, Maja

    2015-01-01

    Phosphatases are crucial enzymes in health and disease, but the knowledge of their biological roles is still limited. Identifying substrates continues to be a great challenge. To support the research on phosphatase–kinase–substrate networks we present here an update on the human DEPhOsphorylation Database: DEPOD (http://www.depod.org or http://www.koehn.embl.de/depod). DEPOD is a manually curated open access database providing human phosphatases, their protein and non-protein substrates, dephosphorylation sites, pathway involvements and external links to kinases and small molecule modulators. All internal data are fully searchable including a BLAST application. Since the first release, more human phosphatases and substrates, their associated signaling pathways (also from new sources), and interacting proteins for all phosphatases and protein substrates have been added into DEPOD. The user interface has been further optimized; for example, the interactive human phosphatase–substrate network contains now a ‘highlight node’ function for phosphatases, which includes the visualization of neighbors in the network. PMID:25332398

  4. The human DEPhOsphorylation database DEPOD: a 2015 update.

    PubMed

    Duan, Guangyou; Li, Xun; Köhn, Maja

    2015-01-01

    Phosphatases are crucial enzymes in health and disease, but the knowledge of their biological roles is still limited. Identifying substrates continues to be a great challenge. To support the research on phosphatase-kinase-substrate networks we present here an update on the human DEPhOsphorylation Database: DEPOD (http://www.depod.org or http://www.koehn.embl.de/depod). DEPOD is a manually curated open access database providing human phosphatases, their protein and non-protein substrates, dephosphorylation sites, pathway involvements and external links to kinases and small molecule modulators. All internal data are fully searchable including a BLAST application. Since the first release, more human phosphatases and substrates, their associated signaling pathways (also from new sources), and interacting proteins for all phosphatases and protein substrates have been added into DEPOD. The user interface has been further optimized; for example, the interactive human phosphatase-substrate network contains now a 'highlight node' function for phosphatases, which includes the visualization of neighbors in the network.

  5. The Pfam protein families database: towards a more sustainable future

    PubMed Central

    Finn, Robert D.; Coggill, Penelope; Eberhardt, Ruth Y.; Eddy, Sean R.; Mistry, Jaina; Mitchell, Alex L.; Potter, Simon C.; Punta, Marco; Qureshi, Matloob; Sangrador-Vegas, Amaia; Salazar, Gustavo A.; Tate, John; Bateman, Alex

    2016-01-01

    In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool. PMID:26673716

  6. Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR.

    PubMed

    Van Auken, Kimberly; Fey, Petra; Berardini, Tanya Z; Dodson, Robert; Cooper, Laurel; Li, Donghui; Chan, Juancarlos; Li, Yuling; Basu, Siddhartha; Muller, Hans-Michael; Chisholm, Rex; Huala, Eva; Sternberg, Paul W

    2012-01-01

    WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology.

  7. TarNet: An Evidence-Based Database for Natural Medicine Research

    PubMed Central

    Ren, Guomin; Sun, Guibo; Sun, Xiaobo

    2016-01-01

    Background Complex diseases seriously threaten human health. Drug discovery approaches based on “single genes, single drugs, and single targets” are limited in targeting complex diseases. The development of new multicomponent drugs for complex diseases is imperative, and the establishment of a suitable solution for drug group-target protein network analysis is a key scientific problem that must be addressed. Herbal medicines have formed the basis of sophisticated systems of traditional medicine and have given rise to some key drugs that remain in use today. The search for new molecules is currently taking a different route, whereby scientific principles of ethnobotany and ethnopharmacognosy are being used by chemists in the discovery of different sources and classes of compounds. Results In this study, we developed TarNet, a manually curated database and platform of traditional medicinal plants with natural compounds that includes potential bio-target information. We gathered information on proteins that are related to or affected by medicinal plant ingredients and data on protein–protein interactions (PPIs). TarNet includes in-depth information on both plant–compound–protein relationships and PPIs. Additionally, TarNet can provide researchers with network construction analyses of biological pathways and protein–protein interactions (PPIs) associated with specific diseases. Researchers can upload a gene or protein list mapped to our PPI database that has been manually curated to generate relevant networks. Multiple functions are accessible for network topological calculations, subnetwork analyses, pathway analyses, and compound–protein relationships. Conclusions TarNet will serve as a useful analytical tool that will provide information on medicinal plant compound-affected proteins (potential targets) and system-level analyses for systems biology and network pharmacology researchers. TarNet is freely available at http://www.herbbol.org:8001/tarnet

  8. The RadAssessor manual

    SciTech Connect

    Seitz, Sharon L.

    2007-02-01

    THIS manual will describe the functions and capabilities that are available from the RadAssessor database and will demonstrate how to retrieve and view its information. You’ll learn how to start the database application, how to log in, how to use the common commands, and how to use the online help if you have a question or need extra guidance. RadAssessor can be viewed from any standard web browser. Therefore, you will not need to install any special software before using RadAssessor.

  9. Data Curation Is for Everyone! The Case for Master's and Baccalaureate Institutional Engagement with Data Curation

    ERIC Educational Resources Information Center

    Shorish, Yasmeen

    2012-01-01

    This article describes the fundamental challenges to data curation, how these challenges may be compounded for smaller institutions, and how data management is an essential and manageable component of data curation. Data curation is often discussed within the confines of large research universities. As a result, master's and baccalaureate…

  10. HNdb: an integrated database of gene and protein information on head and neck squamous cell carcinoma

    PubMed Central

    Henrique, Tiago; José Freitas da Silveira, Nelson; Henrique Cunha Volpato, Arthur; Mioto, Mayra Mataruco; Carolina Buzzo Stefanini, Ana; Bachir Fares, Adil; Gustavo da Silva Castro Andrade, João; Masson, Carolina; Verónica Mendoza López, Rossana; Daumas Nunes, Fabio; Paulo Kowalski, Luis; Severino, Patricia; Tajara, Eloiza Helena

    2016-01-01

    The total amount of scientific literature has grown rapidly in recent years. Specifically, there are several million citations in the field of cancer. This makes it difficult, if not impossible, to manually retrieve relevant information on the mechanisms that govern tumor behavior or the neoplastic process. Furthermore, cancer is a complex disease or, more accurately, a set of diseases. The heterogeneity that permeates many tumors is particularly evident in head and neck (HN) cancer, one of the most common types of cancer worldwide. In this study, we present HNdb, a free database that aims to provide a unified and comprehensive resource of information on genes and proteins involved in HN squamous cell carcinoma, covering data on genomics, transcriptomics, proteomics, literature citations and also cross-references of external databases. Different literature searches of MEDLINE abstracts were performed using specific Medical Subject Headings (MeSH terms) for oral, oropharyngeal, hypopharyngeal and laryngeal squamous cell carcinomas. A curated gene-to-publication assignment yielded a total of 1370 genes related to HN cancer. The diversity of results allowed identifying novel and mostly unexplored gene associations, revealing, for example, that processes linked to response to steroid hormone stimulus are significantly enriched in genes related to HN carcinomas. Thus, our database expands the possibilities for gene networks investigation, providing potential hypothesis to be tested. Database URL: http://www.gencapo.famerp.br/hndb PMID:27013077

  11. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions

    PubMed Central

    Bretaudeau, Anthony; Coste, François; Humily, Florian; Garczarek, Laurence; Le Corguillé, Gildas; Six, Christophe; Ratin, Morgane; Collin, Olivier; Schluchter, Wendy M.; Partensky, Frédéric

    2013-01-01

    CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers. PMID:23175607

  12. Kin-Driver: a database of driver mutations in protein kinases.

    PubMed

    Simonetti, Franco L; Tornador, Cristian; Nabau-Moretó, Nuria; Molina-Vila, Miguel A; Marino-Buslje, Cristina

    2014-01-01

    Somatic mutations in protein kinases (PKs) are frequent driver events in many human tumors, while germ-line mutations are associated with hereditary diseases. Here we present Kin-driver, the first database that compiles driver mutations in PKs with experimental evidence demonstrating their functional role. Kin-driver is a manual expert-curated database that pays special attention to activating mutations (AMs) and can serve as a validation set to develop new generation tools focused on the prediction of gain-of-function driver mutations. It also offers an easy and intuitive environment to facilitate the visualization and analysis of mutations in PKs. Because all mutations are mapped onto a multiple sequence alignment, analogue positions between kinases can be identified and tentative new mutations can be proposed for studying by transferring annotation. Finally, our database can also be of use to clinical and translational laboratories, helping them to identify uncommon AMs that can correlate with response to new antitumor drugs. The website was developed using PHP and JavaScript, which are supported by all major browsers; the database was built using MySQL server. Kin-driver is available at: http://kin-driver.leloir.org.ar/

  13. TBC2health: a database of experimentally validated health-beneficial effects of tea bioactive compounds.

    PubMed

    Zhang, Shihua; Xuan, Hongdong; Zhang, Liang; Fu, Sicong; Wang, Yijun; Yang, Hua; Tai, Yuling; Song, Youhong; Zhang, Jinsong; Ho, Chi-Tang; Li, Shaowen; Wan, Xiaochun

    2016-07-06

    Tea is one of the most consumed beverages in the world. Considerable studies show the exceptional health benefits (e.g. antioxidation, cancer prevention) of tea owing to its various bioactive components. However, data from these extensively published papers had not been made available in a central database. To lay a foundation in improving the understanding of healthy tea functions, we established a TBC2health database that currently documents 1338 relationships between 497 tea bioactive compounds and 206 diseases (or phenotypes) manually culled from over 300 published articles. Each entry in TBC2health contains comprehensive information about a bioactive relationship that can be accessed in three aspects: (i) compound information, (ii) disease (or phenotype) information and (iii) evidence and reference. Using the curated bioactive relationships, a bipartite network was reconstructed and the corresponding network (or sub-network) visualization and topological analyses are provided for users. This database has a user-friendly interface for entry browse, search and download. In addition, TBC2health provides a submission page and several useful tools (e.g. BLAST, molecular docking) to facilitate use of the database. Consequently, TBC2health can serve as a valuable bioinformatics platform for the exploration of beneficial effects of tea on human health. TBC2health is freely available at http://camellia.ahau.edu.cn/TBC2health.

  14. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies.

    PubMed

    Stenson, Peter D; Mort, Matthew; Ball, Edward V; Evans, Katy; Hayden, Matthew; Heywood, Sally; Hussain, Michelle; Phillips, Andrew D; Cooper, David N

    2017-03-27

    The Human Gene Mutation Database (HGMD(®)) constitutes a comprehensive collection of published germline mutations in nuclear genes that underlie, or are closely associated with human inherited disease. At the time of writing (March 2017), the database contained in excess of 203,000 different gene lesions identified in over 8000 genes manually curated from over 2600 journals. With new mutation entries currently accumulating at a rate exceeding 17,000 per annum, HGMD represents de facto the central unified gene/disease-oriented repository of heritable mutations causing human genetic disease used worldwide by researchers, clinicians, diagnostic laboratories and genetic counsellors, and is an essential tool for the annotation of next-generation sequencing data. The public version of HGMD ( http://www.hgmd.org ) is freely available to registered users from academic institutions and non-profit organisations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via QIAGEN Inc.

  15. Analysis of curated and predicted plastid subproteomes of Arabidopsis. Subcellular compartmentalization leads to distinctive proteome properties.

    PubMed

    Sun, Qi; Emanuelsson, Olof; van Wijk, Klaas J

    2004-06-01

    Carefully curated proteomes of the inner envelope membrane, the thylakoid membrane, and the thylakoid lumen of chloroplasts from Arabidopsis were assembled based on published, well-documented localizations. These curated proteomes were evaluated for distribution of physical-chemical parameters, with the goal of extracting parameters for improved subcellular prediction and subsequent identification of additional (low abundant) components of each membrane system. The assembly of rigorously curated subcellular proteomes is in itself also important as a parts list for plant and systems biology. Transmembrane and subcellular prediction strategies were evaluated using the curated data sets. The three curated proteomes differ strongly in average isoelectric point and protein size, as well as transmembrane distribution. Removal of the cleavable, N-terminal transit peptide sequences greatly affected isoelectric point and size distribution. Unexpectedly, the Cys content was much lower for the thylakoid proteomes than for the inner envelope. This likely relates to the role of the thylakoid membrane in light-driven electron transport and helps to avoid unwanted oxidation-reduction reactions. A rule of thumb for discriminating between the predicted integral inner envelope membrane and integral thylakoid membrane proteins is suggested. Using a combination of predictors and experimentally derived parameters, four plastid subproteomes were predicted from the fully annotated Arabidopsis genome. These predicted subproteomes were analyzed for their properties and compared to the curated proteomes. The sensitivity and accuracy of the prediction strategies are discussed. Data can be extracted from the new plastid proteome database (http://ppdb.tc.cornell.edu).

  16. The IPD and IMGT/HLA database: allele variant databases.

    PubMed

    Robinson, James; Halliwell, Jason A; Hayhurst, James D; Flicek, Paul; Parham, Peter; Marsh, Steven G E

    2015-01-01

    The Immuno Polymorphism Database (IPD) was developed to provide a centralized system for the study of polymorphism in genes of the immune system. Through the IPD project we have established a central platform for the curation and publication of locus-specific databases involved either directly or related to the function of the Major Histocompatibility Complex in a number of different species. We have collaborated with specialist groups or nomenclature committees that curate the individual sections before they are submitted to IPD for online publication. IPD consists of five core databases, with the IMGT/HLA Database as the primary database. Through the work of the various nomenclature committees, the HLA Informatics Group and in collaboration with the European Bioinformatics Institute we are able to provide public access to this data through the website http://www.ebi.ac.uk/ipd/. The IPD project continues to develop with new tools being added to address scientific developments, such as Next Generation Sequencing, and to address user feedback and requests. Regular updates to the website ensure that new and confirmatory sequences are dispersed to the immunogenetics community, and the wider research and clinical communities.

  17. The IPD and IMGT/HLA database: allele variant databases

    PubMed Central

    Robinson, James; Halliwell, Jason A.; Hayhurst, James D.; Flicek, Paul; Parham, Peter; Marsh, Steven G. E.

    2015-01-01

    The Immuno Polymorphism Database (IPD) was developed to provide a centralized system for the study of polymorphism in genes of the immune system. Through the IPD project we have established a central platform for the curation and publication of locus-specific databases involved either directly or related to the function of the Major Histocompatibility Complex in a number of different species. We have collaborated with specialist groups or nomenclature committees that curate the individual sections before they are submitted to IPD for online publication. IPD consists of five core databases, with the IMGT/HLA Database as the primary database. Through the work of the various nomenclature committees, the HLA Informatics Group and in collaboration with the European Bioinformatics Institute we are able to provide public access to this data through the website http://www.ebi.ac.uk/ipd/. The IPD project continues to develop with new tools being added to address scientific developments, such as Next Generation Sequencing, and to address user feedback and requests. Regular updates to the website ensure that new and confirmatory sequences are dispersed to the immunogenetics community, and the wider research and clinical communities. PMID:25414341

  18. How should the completeness and quality of curated nanomaterial data be evaluated?

    NASA Astrophysics Data System (ADS)

    Marchese Robinson, Richard L.; Lynch, Iseult; Peijnenburg, Willie; Rumble, John; Klaessig, Fred; Marquardt, Clarissa; Rauscher, Hubert; Puzyn, Tomasz; Purian, Ronit; Åberg, Christoffer; Karcher, Sandra; Vriens, Hanne; Hoet, Peter; Hoover, Mark D.; Hendren, Christine Ogilvie; Harper, Stacey L.

    2016-05-01

    Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated?Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict

  19. Plant Omics Data Center: An Integrated Web Repository for Interspecies Gene Expression Networks with NLP-Based Curation

    PubMed Central

    Ohyanagi, Hajime; Takano, Tomoyuki; Terashima, Shin; Kobayashi, Masaaki; Kanno, Maasa; Morimoto, Kyoko; Kanegae, Hiromi; Sasaki, Yohei; Saito, Misa; Asano, Satomi; Ozaki, Soichi; Kudo, Toru; Yokoyama, Koji; Aya, Koichiro; Suwabe, Keita; Suzuki, Go; Aoki, Koh; Kubo, Yasutaka; Watanabe, Masao; Matsuoka, Makoto; Yano, Kentaro

    2015-01-01

    Comprehensive integration of large-scale omics resources such as genomes, transcriptomes and metabolomes will provide deeper insights into broader aspects of molecular biology. For better understanding of plant biology, we aim to construct a next-generation sequencing (NGS)-derived gene expression network (GEN) repository for a broad range of plant species. So far we have incorporated information about 745 high-quality mRNA sequencing (mRNA-Seq) samples from eight plant species (Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Sorghum bicolor, Vitis vinifera, Solanum tuberosum, Medicago truncatula and Glycine max) from the public short read archive, digitally profiled the entire set of gene expression profiles, and drawn GENs by using correspondence analysis (CA) to take advantage of gene expression similarities. In order to understand the evolutionary significance of the GENs from multiple species, they were linked according to the orthology of each node (gene) among species. In addition to other gene expression information, functional annotation of the genes will facilitate biological comprehension. Currently we are improving the given gene annotations with natural language processing (NLP) techniques and manual curation. Here we introduce the current status of our analyses and the web database, PODC (Plant Omics Data Center; http://bioinf.mind.meiji.ac.jp/podc/), now open to the public, providing GENs, functional annotations and additional comprehensive omics resources. PMID:25505034

  20. Trust, but verify: On the importance of chemical structure curation in cheminformatics and QSAR modeling research

    PubMed Central

    Fourches, Denis; Muratov, Eugene; Tropsha, Alexander

    2010-01-01

    Molecular modelers and cheminformaticians typically analyze experimental data generated by other scientists. Consequently, when it comes to data accuracy, cheminformaticians are always at the mercy of data providers who may inadvertently publish (partially) erroneous data. Thus, dataset curation is crucial for any cheminformatics analysis such as similarity searching, clustering, QSAR modeling, virtual screening, etc., especially nowadays when the availability of chemical datasets in public domain has skyrocketed in recent years. Despite the obvious importance of this preliminary step in the computational analysis of any dataset, there appears to be no commonly accepted guidance or set of procedures for chemical data curation. The main objective of this paper is to emphasize the need for a standardized chemical data curation strategy that should be followed at the onset of any molecular modeling investigation. Herein, we discuss several simple but important steps for cleaning chemical records in a database including the removal of a fraction of the data that cannot be appropriately handled by conventional cheminformatics techniques. Such steps include the removal of inorganic and organometallic compounds, counterions, salts and mixtures; structure validation; ring aromatization; normalization of specific chemotypes; curation of tautomeric forms; and the deletion of duplicates. To emphasize the importance of data curation as a mandatory step in data analysis, we discuss several case studies where chemical curation of the original “raw” database enabled the successful modeling study (specifically, QSAR analysis) or resulted in a significant improvement of model's prediction accuracy. We also demonstrate that in some cases rigorously developed QSAR models could be even used to correct erroneous biological data associated with chemical compounds. We believe that good practices for curation of chemical records outlined in this paper will be of value to all

  1. Cancer Stem Cells Therapeutic Target Database: The First Comprehensive Database for Therapeutic Targets of Cancer Stem Cells.

    PubMed

    Hu, Xiaoqing; Cong, Ye; Luo, Huizhe Howard; Wu, Sijin; Zhao, Liyuan Eric; Liu, Quentin; Yang, Yongliang

    2016-09-02

    SummaryCancer stem cells (CSCs) are a subpopulation of tumor cells that have strong self-renewal capabilities and may contribute to the failure of conventional cancer therapies. Hence, therapeutics homing in on CSCs represent a novel and promising approach that may eradicate malignant tumors. However, the lack of information on validated targets of CSCs has greatly hindered the development of CSC-directed therapeutics. Herein, we describe the Cancer Stem Cells Therapeutic Target Database (CSCTT), the first online database to provide a rich bioinformatics resource for the display, search, and analysis of structure, function, and related annotation for therapeutic targets of cancer stem cells. CSCTT contains 135 proteins that are potential targets of CSCs, with validated experimental evidence manually curated from existing literatures. Proteins are carefully annotated with a detailed description of protein families, biological process, related diseases, and experimental evidences. In addition, CSCTT has compiled 213 documented therapeutic methods for cancer stem cells, including 118 small molecules and 20 biotherapy methods. The CSCTT may serve as a useful platform for the development of CSC-directed therapeutics against various malignant tumors. The CSCTT database is freely available to the public at http://www.csctt.org/ SIGNIFICANCE: Although the definition and role of cancer stem cells (CSCs, also called tumor-initiating cells) remain a topic of much debate, increasing evidence suggests that CSCs may be the driving force behind chemotherapy/radiotherapy resistance, as well as metastasis. Consequently, the elimination or differentiation of CSCs is critical for treating malignant tumors and improving clinical outcomes. Unfortunately, the progress of research into the development of anti-CSC therapeutics has been rather slow, and no anti-CSC drugs are yet in clinical use. Hence, there is an urgent need to develop a database that compiles useful information for

  2. Electronic Commerce user manual

    SciTech Connect

    Not Available

    1992-04-10

    This User Manual supports the Electronic Commerce Standard System. The Electronic Commerce Standard System is being developed for the Department of Defense of the Technology Information Systems Program at the Lawrence Livermore National Laboratory, operated by the University of California for the Department of Energy. The Electronic Commerce Standard System, or EC as it is known, provides the capability for organizations to conduct business electronically instead of through paper transactions. Electronic Commerce and Computer Aided Acquisition and Logistics Support, are two major projects under the DoD`s Corporate Information Management program, whose objective is to make DoD business transactions faster and less costly by using computer networks instead of paper forms and postage. EC runs on computers that use the UNIX operating system and provides a standard set of applications and tools that are bound together by a common command and menu system. These applications and tools may vary according to the requirements of the customer or location and may be customized to meet the specific needs of an organization. Local applications can be integrated into the menu system under the Special Databases & Applications option on the EC main menu. These local applications will be documented in the appendices of this manual. This integration capability provides users with a common environment of standard and customized applications.

  3. Electronic Commerce user manual

    SciTech Connect

    Not Available

    1992-04-10

    This User Manual supports the Electronic Commerce Standard System. The Electronic Commerce Standard System is being developed for the Department of Defense of the Technology Information Systems Program at the Lawrence Livermore National Laboratory, operated by the University of California for the Department of Energy. The Electronic Commerce Standard System, or EC as it is known, provides the capability for organizations to conduct business electronically instead of through paper transactions. Electronic Commerce and Computer Aided Acquisition and Logistics Support, are two major projects under the DoD's Corporate Information Management program, whose objective is to make DoD business transactions faster and less costly by using computer networks instead of paper forms and postage. EC runs on computers that use the UNIX operating system and provides a standard set of applications and tools that are bound together by a common command and menu system. These applications and tools may vary according to the requirements of the customer or location and may be customized to meet the specific needs of an organization. Local applications can be integrated into the menu system under the Special Databases Applications option on the EC main menu. These local applications will be documented in the appendices of this manual. This integration capability provides users with a common environment of standard and customized applications.

  4. Astromaterials Curation Online Resources for Principal Investigators

    NASA Technical Reports Server (NTRS)

    Todd, Nancy S.; Zeigler, Ryan A.; Mueller, Lina

    2017-01-01

    The Astromaterials Acquisition and Curation office at NASA Johnson Space Center curates all of NASA's extraterrestrial samples, the most extensive set of astromaterials samples available to the research community worldwide. The office allocates 1500 individual samples to researchers and students each year and has served the planetary research community for 45+ years. The Astromaterials Curation office provides access to its sample data repository and digital resources to support the research needs of sample investigators and to aid in the selection and request of samples for scientific study. These resources can be found on the Astromaterials Acquisition and Curation website at https://curator.jsc.nasa.gov. To better serve our users, we have engaged in several activities to enhance the data available for astromaterials samples, to improve the accessibility and performance of the website, and to address user feedback. We havealso put plans in place for continuing improvements to our existing data products.

  5. Asphalt Raking. Instructor Manual. Trainee Manual.

    ERIC Educational Resources Information Center

    Laborers-AGC Education and Training Fund, Pomfret Center, CT.

    This packet consists of the instructor and trainee manuals for an asphalt raking course. The instructor manual contains a course schedule for 4 days of instruction, content outline, and instructor outline. The trainee manual is divided into five sections: safety, asphalt basics, placing methods, repair and patching, and clean-up and maintenance.…

  6. Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments

    PubMed Central

    Petryszak, Robert; Burdett, Tony; Fiorelli, Benedetto; Fonseca, Nuno A.; Gonzalez-Porta, Mar; Hastings, Emma; Huber, Wolfgang; Jupp, Simon; Keays, Maria; Kryvych, Nataliya; McMurry, Julie; Marioni, John C.; Malone, James; Megy, Karine; Rustici, Gabriella; Tang, Amy Y.; Taubert, Jan; Williams, Eleanor; Mannion, Oliver; Parkinson, Helen E.; Brazma, Alvis

    2014-01-01

    Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of ‘baseline’ expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful ‘contrasts’, i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user. PMID:24304889

  7. Use of Semantic Technology to Create Curated Data Albums

    NASA Technical Reports Server (NTRS)

    Ramachandran, Rahul; Kulkarni, Ajinkya; Li, Xiang; Sainju, Roshan; Bakare, Rohan; Basyal, Sabin; Fox, Peter (Editor); Norack, Tom (Editor)

    2014-01-01

    One of the continuing challenges in any Earth science investigation is the discovery and access of useful science content from the increasingly large volumes of Earth science data and related information available online. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. Those who know exactly the data sets they need can obtain the specific files using these systems. However, in cases where researchers are interested in studying an event of research interest, they must manually assemble a variety of relevant data sets by searching the different distributed data systems. Consequently, there is a need to design and build specialized search and discovery tools in Earth science that can filter through large volumes of distributed online data and information and only aggregate the relevant resources needed to support climatology and case studies. This paper presents a specialized search and discovery tool that automatically creates curated Data Albums. The tool was designed to enable key elements of the search process such as dynamic interaction and sense-making. The tool supports dynamic interaction via different modes of interactivity and visual presentation of information. The compilation of information and data into a Data Album is analogous to a shoebox within the sense-making framework. This tool automates most of the tedious information/data gathering tasks for researchers. Data curation by the tool is achieved via an ontology-based, relevancy ranking algorithm that filters out non-relevant information and data. The curation enables better search results as compared to the simple keyword searches provided by existing data systems in Earth science.

  8. Use of Semantic Technology to Create Curated Data Albums

    NASA Technical Reports Server (NTRS)

    Ramachandran, Rahul; Kulkarni, Ajinkya; Li, Xiang; Sainju, Roshan; Bakare, Rohan; Basyal, Sabin

    2014-01-01

    One of the continuing challenges in any Earth science investigation is the discovery and access of useful science content from the increasingly large volumes of Earth science data and related information available online. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. Those who know exactly the data sets they need can obtain the specific files using these systems. However, in cases where researchers are interested in studying an event of research interest, they must manually assemble a variety of relevant data sets by searching the different distributed data systems. Consequently, there is a need to design and build specialized search and discover tools in Earth science that can filter through large volumes of distributed online data and information and only aggregate the relevant resources needed to support climatology and case studies. This paper presents a specialized search and discovery tool that automatically creates curated Data Albums. The tool was designed to enable key elements of the search process such as dynamic interaction and sense-making. The tool supports dynamic interaction via different modes of interactivity and visual presentation of information. The compilation of information and data into a Data Album is analogous to a shoebox within the sense-making framework. This tool automates most of the tedious information/data gathering tasks for researchers. Data curation by the tool is achieved via an ontology-based, relevancy ranking algorithm that filters out nonrelevant information and data. The curation enables better search results as compared to the simple keyword searches provided by existing data systems in Earth science.

  9. ChemEx: information extraction system for chemical data curation

    PubMed Central

    2012-01-01

    Background Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure image recognition and text mining together. Results We have developed ChemEx, a chemical information extraction system. ChemEx processes both text and images in publications. Text annotator is able to extract compound, organism, and assay entities from text content while structure image recognition enables translation of chemical raster images to machine readable format. A user can view annotated text along with summarized information of compounds, organism that produces those compounds, and assay tests. Conclusions ChemEx facilitates and speeds up chemical data curation by extracting compounds, organisms, and assays from a large collection of publications. The software and corpus can be downloaded from http://www.biotec.or.th/isl/ChemEx. PMID:23282330

  10. Hayabusa Sample Curation in the JAXA's Planetary Material Curation Facility

    NASA Astrophysics Data System (ADS)

    Okada, T.; Abe, M.; Fujimura, A.; Yada, T.; Ishibashi, Y.; Uesugi, M.; Yuzuru, K.; Yakame, S.; Nakamura, T.; Noguchi, T.; Okazaki, R.; Zolensky, M.; Sandford, S.; Ueno, M.; Mukai, T.; Yoshikawa, M.; Kawaguchi, J.

    2011-12-01

    Hayabusa has successfully returned its reentry capsule in Australia on June 13th, 2010. As detailed previously [1], a series of processes have been held in the JAXA's Planetary Material Curation Facility to introduce the sample container of reentry capsule into the pure nitrogen filled clean chamber without influence by water or oxygen, retrieve fine particles found inside the container, characterize them with scanning electron microscope (SEM) with energy dispersive X-ray spectroscopy (EDX), classify them into mineral or rock types, and store them for future analysis. Some of those particles are delivered for initial analysis to catalogue them [2-10]. The facility is demanded to develop new methodologies or train techniques to pick up the recovered samples much finer than originally expected One of them is the electrostatic micro-probe for pickups, and .a trial started to slice the fine samples for detailed analysis of extra-fine structures. Electrostatic nano-probe to be used in SEM is also considered and developed.. To maximize the scientific outputs, the analyses must go on .based on more advanced methodology or sophisticated ideas. So far we have identified those samples as materials from S-class asteroid 25143 Itokawa due to their consistency with results by remote near-infrared and X-rsy spectroscopy: about 1500 ultra-fine particles (mostly smaller than 10 microns) caught by Teflon spatula scooping, and about 100 fine particles (mostly 20-200 microns) collected by compulsory fall onto silica glass plates. Future schedule for sample distribution must be planned. The initial analyses are still in progress, and we will distribute some more of particles recovered. Then some part of the particles will be distributed to NASA, based on the Memorandum of Understanding (MOU) between Japan and U.S.A. for the Hayabusa mission. Finally, in the near future an international Announcement of Opportunity (AO) for sample analyses will be open to any interested researchers In

  11. Human Transporter Database: Comprehensive Knowledge and Discovery Tools in the Human Transporter Genes

    PubMed Central

    Ye, Adam Y.; Liu, Qing-Rong; Li, Chuan-Yun; Zhao, Min; Qu, Hong

    2014-01-01

    Transporters are essential in homeostatic exchange of endogenous and exogenous substances at the systematic, organic, cellular, and subcellular levels. Gene mutations of transporters are often related to pharmacogenetics traits. Recent developments in high throughput technologies on genomics, transcriptomics and proteomics allow in depth studies of transporter genes in normal cellular processes and diverse disease conditions. The flood of high throughput data have resulted in urgent need for an updated knowledgebase with curated, organized, and annotated human transporters in an easily accessible way. Using a pipeline with the combination of automated keywords query, sequence similarity search and manual curation on transporters, we collected 1,555 human non-redundant transporter genes to develop the Human Transporter Database (HTD) (http://htd.cbi.pku.edu.cn). Based on the extensive annotations, global properties of the transporter genes were illustrated, such as expression patterns and polymorphisms in relationships with their ligands. We noted that the human transporters were enriched in many fundamental biological processes such as oxidative phosphorylation and cardiac muscle contraction, and significantly associated with Mendelian and complex diseases such as epilepsy and sudden infant death syndrome. Overall, HTD provides a well-organized interface to facilitate research communities to search detailed molecular and genetic information of transporters for development of personalized medicine. PMID:24558441

  12. OpenTrials: towards a collaborative open database of all available information on all clinical trials.

    PubMed

    Goldacre, Ben; Gray, Jonathan

    2016-04-08

    OpenTrials is a collaborative and open database for all available structured data and documents on all clinical trials, threaded together by individual trial. With a versatile and expandable data schema, it is initially designed to host and match the following documents and data for each trial: registry entries; links, abstracts, or texts of academic journal papers; portions of regulatory documents describing individual trials; structured data on methods and results extracted by systematic reviewers or other researchers; clinical study reports; and additional documents such as blank consent forms, blank case report forms, and protocols. The intention is to create an open, freely re-usable index of all such information and to increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data and drive up standards around open data in evidence-based medicine. The project has phase I funding. This will allow us to create a practical data schema and populate the database initially through web-scraping, basic record linkage techniques, crowd-sourced curation around selected drug areas, and import of existing sources of structured and documents. It will also allow us to create user-friendly web interfaces onto the data and conduct user engagement workshops to optimise the database and interface designs. Where other projects have set out to manually and perfectly curate a narrow range of information on a smaller number of trials, we aim to use a broader range of techniques and attempt to match a very large quantity of information on all trials. We are currently seeking feedback and additional sources of structured data.

  13. Curation of Samples from Mars

    NASA Astrophysics Data System (ADS)

    Lindstrom, D.; Allen, C.

    One of the strong scientific reasons for returning samples from Mars is to search for evidence of current or past life in the samples. Because of the remote possibility that the samples may contain life forms that are hazardous to the terrestrial biosphere, the National Research Council has recommended that all samples returned from Mars be kept under strict biological containment until tests show that they can safely be released to other laboratories. It is possible that Mars samples may contain only scarce or subtle traces of life or prebiotic chemistry that could readily be overwhelmed by terrestrial contamination. Thus, the facilities used to contain, process, and analyze samples from Mars must have a combination of high-level biocontainment and organic / inorganic chemical cleanliness that is unprecedented. We have been conducting feasibility studies and developing designs for a facility that would be at least as capable as current maximum containment BSL-4 (BioSafety Level 4) laboratories, while simultaneously maintaining cleanliness levels exceeding those of the cleanest electronics manufacturing labs. Unique requirements for the processing of Mars samples have inspired a program to develop handling techniques that are much more precise and reliable than the approach (currently used for lunar samples) of employing gloved human hands in nitrogen-filled gloveboxes. Individual samples from Mars are expected to be much smaller than lunar samples, the total mass of samples returned by each mission being 0.5- 1 kg, compared with many tens of kg of lunar samples returned by each of the six Apollo missions. Smaller samp les require much more of the processing to be done under microscopic observation. In addition, the requirements for cleanliness and high-level containment would be difficult to satisfy while using traditional gloveboxes. JSC has constructed a laboratory to test concepts and technologies important to future sample curation. The Advanced Curation

  14. SynDB: a Synapse protein DataBase based on synapse ontology

    PubMed Central

    Zhang, Wuxue; Zhang, Yong; Zheng, Hui; Zhang, Chen; Xiong, Wei; Olyarchuk, John G.; Walker, Michael; Xu, Weifeng; Zhao, Min; Zhao, Shuqi; Zhou, Zhuan; Wei, Liping

    2007-01-01

    A synapse is the junction across which a nerve impulse passes from an axon terminal to a neuron, muscle cell or gland cell. The functions and building molecules of the synapse are essential to almost all neurobiological processes. To describe synaptic structures and functions, we have developed Synapse Ontology (SynO), a hierarchical representation that includes 177 terms with hundreds of synonyms and branches up to eight levels deep. associated 125 additional protein keywords and 109 InterPro domains with these SynO terms. Using a combination of automated keyword searches, domain searches and manual curation, we collected 14 000 non-redundant synapse-related proteins, including 3000 in human. We extensively annotated the proteins with information about sequence, structure, function, expression, pathways, interactions and disease associations and with hyperlinks to external databases. The data are stored and presented in the Synapse protein DataBase (SynDB, ). SynDB can be interactively browsed by SynO, Gene Ontology (GO), domain families, species, chromosomal locations or Tribe-MCL clusters. It can also be searched by text (including Boolean operators) or by sequence similarity. SynDB is the most comprehensive database to date for synaptic proteins. PMID:17098931

  15. PhenoMiner: from text to a database of phenotypes associated with OMIM diseases

    PubMed Central

    Collier, Nigel; Groza, Tudor; Smedley, Damian; Robinson, Peter N.; Oellrich, Anika; Rebholz-Schuhmann, Dietrich

    2015-01-01

    Analysis of scientific and clinical phenotypes reported in the experimental literature has been curated manually to build high-quality databases such as the Online Mendelian Inheritance in Man (OMIM). However, the identification and harmonization of phenotype descriptions struggles with the diversity of human expressivity. We introduce a novel automated extraction approach called PhenoMiner that exploits full parsing and conceptual analysis. Apriori association mining is then used to identify relationships to human diseases. We applied PhenoMiner to the BMC open access collection and identified 13 636 phenotype candidates. We identified 28 155 phenotype-disorder hypotheses covering 4898 phenotypes and 1659 Mendelian disorders. Analysis showed: (i) the semantic distribution of the extracted terms against linked ontologies; (ii) a comparison of term overlap with the Human Phenotype Ontology (HP); (iii) moderate support for phenotype-disorder pairs in both OMIM and the literature; (iv) strong associations of phenotype-disorder pairs to known disease-genes pairs using PhenoDigm. The full list of PhenoMiner phenotypes (S1), phenotype-disorder associations (S2), association-filtered linked data (S3) and user database documentation (S5) is available as supplementary data and can be downloaded at http://github.com/nhcollier/PhenoMiner under a Creative Commons Attribution 4.0 license. Database URL: phenominer.mml.cam.ac.uk PMID:26507285

  16. Reflections on curative health care in Nicaragua.

    PubMed Central

    Slater, R G

    1989-01-01

    Improved health care in Nicaragua is a major priority of the Sandinista revolution; it has been pursued by major reforms of the national health care system, something few developing countries have attempted. In addition to its internationally recognized advances in public health, considerable progress has been made in health care delivery by expanding curative medical services through training more personnel and building more facilities to fulfill a commitment to free universal health coverage. The very uneven quality of medical care is the leading problem facing curative medicine now. Underlying factors include the difficulty of adequately training the greatly increased number of new physicians. Misdiagnosis and mismanagement continue to be major problems. The curative medical system is not well coordinated with the preventive sector. Recent innovations include initiation of a "medicina integral" residency, similar to family practice. Despite its inadequacies and the handicaps of war and poverty, the Nicaraguan curative medical system has made important progress. PMID:2705603

  17. Construction of biological networks from unstructured information based on a semi-automated curation workflow.

    PubMed

    Szostak, Justyna; Ansari, Sam; Madan, Sumit; Fluck, Juliane; Talikka, Marja; Iskandar, Anita; De Leon, Hector; Hofmann-Apitius, Martin; Peitsch, Manuel C; Hoeng, Julia

    2015-06-17

    Capture and representation of scientific knowledge in a structured format are essential to improve the understanding of biological mechanisms involved in complex diseases. Biological knowledge and knowledge about standardized terminologies are difficult to capture from literature in a usable form. A semi-automated knowledge extraction workflow is presented that was developed to allow users to extract causal and correlative relationships from scientific literature and to transcribe them into the computable and human readable Biological Expression Language (BEL). The workflow combines state-of-the-art linguistic tools for recognition of various entities and extraction of knowledge from literature sources. Unlike most other approaches, the workflow outputs the results to a curation interface for manual curation and converts them into BEL documents that can be compiled to form biological networks. We developed a new semi-automated knowledge extraction workflow that was designed to capture and organize scientific knowledge and reduce the required curation skills and effort for this task. The workflow was used to build a network that represents the cellular and molecular mechanisms implicated in atherosclerotic plaque destabilization in an apolipoprotein-E-deficient (ApoE(-/-)) mouse model. The network was generated using knowledge extracted from the primary literature. The resultant atherosclerotic plaque destabilization network contains 304 nodes and 743 edges supported by 33 PubMed referenced articles. A comparison between the semi-automated and conventional curation processes showed similar results, but significantly reduced curation effort for the semi-automated process. Creating structured knowledge from unstructured text is an important step for the mechanistic interpretation and reusability of knowledge. Our new semi-automated knowledge extraction workflow reduced the curation skills and effort required to capture and organize scientific knowledge. The

  18. LiverCancerMarkerRIF: a liver cancer biomarker interactive curation system combining text mining and expert annotations

    PubMed Central

    Dai, Hong-Jie; Wu, Johnny Chi-Yang; Lin, Wei-San; Reyes, Aaron James F.; dela Rosa, Mira Anne C.; Syed-Abdul, Shabbir; Tsai, Richard Tzong-Han; Hsu, Wen-Lian

    2014-01-01

    Biomarkers are biomolecules in the human body that can indicate disease states and abnormal biological processes. Biomarkers are often used during clinical trials to identify patients with cancers. Although biomedical research related to biomarkers has increased over the years and substantial effort has been expended to obtain results in these studies, the specific results obtained often contain ambiguities, and the results might contradict each other. Therefore, the information gathered from these studies must be appropriately integrated and organized to facilitate experimentation on biomarkers. In this study, we used liver cancer as the target and developed a text-mining–based curation system named LiverCancerMarkerRIF, which allows users to retrieve biomarker-related narrations and curators to curate supporting evidence on liver cancer biomarkers directly while browsing PubMed. In contrast to most of the other curation tools that require curators to navigate away from PubMed and accommodate distinct user interfaces or Web sites to complete the curation process, our system provides a user-friendly method for accessing text-mining–aided information and a concise interface to assist curators while they remain at the PubMed Web site. Biomedical text-mining techniques are applied to automatically recognize biomedical concepts such as genes, microRNA, diseases and investigative technologies, which can be used to evaluate the potential of a certain gene as a biomarker. Through the participation in the BioCreative IV user-interactive task, we examined the feasibility of using this novel type of augmented browsing-based curation method, and collaborated with curators to curate biomarker evidential sentences related to liver cancer. The positive feedback received from curators indicates that the proposed method can be effectively used for curation. A publicly available online database containing all the aforementioned information has been constructed at http

  19. Curating NASA's Past, Present, and Future Extraterrestrial Sample Collections

    NASA Technical Reports Server (NTRS)

    McCubbin, F. M.; Allton, J. H.; Evans, C. A.; Fries, M. D.; Nakamura-Messenger, K.; Righter, K.; Zeigler, R. A.; Zolensky, M.; Stansbery, E. K.

    2016-01-01

    The Astromaterials Acquisition and Curation Office (henceforth referred to herein as NASA Curation Office) at NASA Johnson Space Center (JSC) is responsible for curating all of NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials", JSC is charged with "...curation of all extra-terrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including "...documentation, preservation, preparation, and distribution of samples for research, education, and public outreach." Here we describe some of the past, present, and future activities of the NASA Curation Office.

  20. BGMUT: NCBI dbRBC database of allelic variations of genes encoding antigens of blood group systems.

    PubMed

    Patnaik, Santosh Kumar; Helmberg, Wolfgang; Blumenfeld, Olga O

    2012-01-01

    Analogous to human leukocyte antigens, blood group antigens are surface markers on the erythrocyte cell membrane whose structures differ among individuals and which can be serologically identified. The Blood Group Antigen Gene Mutation Database (BGMUT) is an online repository of allelic variations in genes that determine the antigens of various human blood group systems. The database is manually curated with allelic information collated from scientific literature and from direct submissions from research laboratories. Currently, the database documents sequence variations of a total of 1251 alleles of all 40 gene loci that together are known to affect antigens of 30 human blood group systems. When available, information on the geographic or ethnic prevalence of an allele is also provided. The BGMUT website also has general information on the human blood group systems and the genes responsible for them. BGMUT is a part of the dbRBC resource of the National Center for Biotechnology Information, USA, and is available online at http://www.ncbi.nlm.nih.gov/projects/gv/rbc/xslcgi.fcgi?cmd=bgmut. The database should be of use to members of the transfusion medicine community, those interested in studies of genetic variation and related topics such as human migrations, and students as well as members of the general public.

  1. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  2. The OCLC Terminal: An Instruction Manual.

    ERIC Educational Resources Information Center

    Arnold, Jane; And Others

    Prepared for users of OCLC terminals at the Michigan State University library, this manual provides a brief description of the contents of the OCLC database, instructions for constructing search requests in OCLC, guidelines for interpreting OCLC search results, a key to "reading" the OCLC screen, an overview of typical terminal responses…

  3. RefSeq curation and annotation of antizyme and antizyme inhibitor genes in vertebrates

    PubMed Central

    Rajput, Bhanu; Murphy, Terence D.; Pruitt, Kim D.

    2015-01-01

    Polyamines are ubiquitous cations that are involved in regulating fundamental cellular processes such as cell growth and proliferation; hence, their intracellular concentration is tightly regulated. Antizyme and antizyme inhibitor have a central role in maintaining cellular polyamine levels. Antizyme is unique in that it is expressed via a novel programmed ribosomal frameshifting mechanism. Conventional computational tools are unable to predict a programmed frameshift, resulting in misannotation of antizyme transcripts and proteins on transcript and genomic sequences. Correct annotation of a programmed frameshifting event requires manual evaluation. Our goal was to provide an accurately curated and annotated Reference Sequence (RefSeq) data set of antizyme transcript and protein records across a broad taxonomic scope that would serve as standards for accurate representation of these gene products. As antizyme and antizyme inhibitor proteins are functionally connected, we also curated antizyme inhibitor genes to more fully represent the elegant biology of polyamine regulation. Manual review of genes for three members of the antizyme family and two members of the antizyme inhibitor family in 91 vertebrate organisms resulted in a total of 461 curated RefSeq records. PMID:26170238

  4. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  5. The BioGRID interaction database: 2015 update.

    PubMed

    Chatr-Aryamontri, Andrew; Breitkreutz, Bobby-Joe; Oughtred, Rose; Boucher, Lorrie; Heinicke, Sven; Chen, Daici; Stark, Chris; Breitkreutz, Ashton; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Nixon, Julie; Ramage, Lindsay; Winter, Andrew; Sellam, Adnane; Chang, Christie; Hirschman, Jodi; Theesfeld, Chandra; Rust, Jennifer; Livstone, Michael S; Dolinski, Kara; Tyers, Mike

    2015-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein interactions curated from the primary biomedical literature for all major model organism species and humans. As of September 2014, the BioGRID contains 749,912 interactions as drawn from 43,149 publications that represent 30 model organisms. This interaction count represents a 50% increase compared to our previous 2013 BioGRID update. BioGRID data are freely distributed through partner model organism databases and meta-databases and are directly downloadable in a variety of formats. In addition to general curation of the published literature for the major model species, BioGRID undertakes themed curation projects in areas of particular relevance for biomedical sciences, such as the ubiquitin-proteasome system and various human disease-associated interaction networks. BioGRID curation is coordinated through an Interaction Management System (IMS) that facilitates the compilation interaction records through structured evidence codes, phenotype ontologies, and gene annotation. The BioGRID architecture has been improved in order to support a broader range of interaction and post-translational modification types, to allow the representation of more complex multi-gene/protein interactions, to account for cellular phenotypes through structured ontologies, to expedite curation through semi-automated text-mining approaches, and to enhance curation quality control.

  6. CancerHSP: anticancer herbs database of systems pharmacology

    NASA Astrophysics Data System (ADS)

    Tao, Weiyang; Li, Bohui; Gao, Shuo; Bai, Yaofei; Shar, Piar Ali; Zhang, Wenjuan; Guo, Zihu; Sun, Ke; Fu, Yingxue; Huang, Chao; Zheng, Chunli; Mu, Jiexin; Pei, Tianli; Wang, Yuan; Li, Yan; Wang, Yonghua

    2015-06-01

    The numerous natural products and their bioactivity potentially afford an extraordinary resource for new drug discovery and have been employed in cancer treatment. However, the underlying pharmacological mechanisms of most natural anticancer compounds remain elusive, which has become one of the major obstacles in developing novel effective anticancer agents. Here, to address these unmet needs, we developed an anticancer herbs database of systems pharmacology (CancerHSP), which records anticancer herbs related information through manual curation. Currently, CancerHSP contains 2439 anticancer herbal medicines with 3575 anticancer ingredients. For each ingredient, the molecular structure and nine key ADME parameters are provided. Moreover, we also provide the anticancer activities of these compounds based on 492 different cancer cell lines. Further, the protein targets of the compounds are predicted by state-of-art methods or collected from literatures. CancerHSP will help reveal the molecular mechanisms of natural anticancer products and accelerate anticancer drug development, especially facilitate future investigations on drug repositioning and drug discovery. CancerHSP is freely available on the web at http://lsp.nwsuaf.edu.cn/CancerHSP.php.

  7. CancerHSP: anticancer herbs database of systems pharmacology.

    PubMed

    Tao, Weiyang; Li, Bohui; Gao, Shuo; Bai, Yaofei; Shar, Piar Ali; Zhang, Wenjuan; Guo, Zihu; Sun, Ke; Fu, Yingxue; Huang, Chao; Zheng, Chunli; Mu, Jiexin; Pei, Tianli; Wang, Yuan; Li, Yan; Wang, Yonghua

    2015-06-15

    The numerous natural products and their bioactivity potentially afford an extraordinary resource for new drug discovery and have been employed in cancer treatment. However, the underlying pharmacological mechanisms of most natural anticancer compounds remain elusive, which has become one of the major obstacles in developing novel effective anticancer agents. Here, to address these unmet needs, we developed an anticancer herbs database of systems pharmacology (CancerHSP), which records anticancer herbs related information through manual curation. Currently, CancerHSP contains 2439 anticancer herbal medicines with 3575 anticancer ingredients. For each ingredient, the molecular structure and nine key ADME parameters are provided. Moreover, we also provide the anticancer activities of these compounds based on 492 different cancer cell lines. Further, the protein targets of the compounds are predicted by state-of-art methods or collected from literatures. CancerHSP will help reveal the molecular mechanisms of natural anticancer products and accelerate anticancer drug development, especially facilitate future investigations on drug repositioning and drug discovery. CancerHSP is freely available on the web at http://lsp.nwsuaf.edu.cn/CancerHSP.php.

  8. RepeatsDB: a database of tandem repeat protein structures

    PubMed Central

    Di Domenico, Tomás; Potenza, Emilio; Walsh, Ian; Gonzalo Parra, R.; Giollo, Manuel; Minervini, Giovanni; Piovesan, Damiano; Ihsan, Awais; Ferrari, Carlo; Kajava, Andrey V.; Tosatto, Silvio C.E.

    2014-01-01

    RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services. PMID:24311564

  9. IPD--the Immuno Polymorphism Database.

    PubMed

    Robinson, James; Mistry, Kavita; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G E

    2010-01-01

    The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors, IPD-MHC, is a database of sequences of the Major Histocompatibility Complex of different species; IPD-human platelet antigens, alloantigens expressed only on platelets and IPD-ESTDAB, which provides access to the European Searchable Tumour cell-line database, a cell bank of immunologically characterised melanoma cell lines. The data is currently available online from the website and ftp directory.

  10. IPD--the Immuno Polymorphism Database.

    PubMed

    Robinson, James; Halliwell, Jason A; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G E

    2013-01-01

    The Immuno Polymorphism Database (IPD), http://www.ebi.ac.uk/ipd/ is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer-cell immunoglobulin-like receptors, IPD-MHC, a database of sequences of the major histocompatibility complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTDAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The data is currently available online from the website and FTP directory. This article describes the latest updates and additional tools added to the IPD project.

  11. IPD—the Immuno Polymorphism Database

    PubMed Central

    Robinson, James; Mistry, Kavita; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G. E.

    2010-01-01

    The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors, IPD-MHC, is a database of sequences of the Major Histocompatibility Complex of different species; IPD-human platelet antigens, alloantigens expressed only on platelets and IPD-ESTDAB, which provides access to the European Searchable Tumour cell-line database, a cell bank of immunologically characterised melanoma cell lines. The data is currently available online from the website and ftp directory. PMID:19875415

  12. AtomPy: an open atomic-data curation environment

    NASA Astrophysics Data System (ADS)

    Bautista, Manuel; Mendoza, Claudio; Boswell, Josiah S; Ajoku, Chukwuemeka

    2014-06-01

    We present a cloud-computing environment for atomic data curation, networking among atomic data providers and users, teaching-and-learning, and interfacing with spectral modeling software. The system is based on Google-Drive Sheets, Pandas (Python Data Analysis Library) DataFrames, and IPython Notebooks for open community-driven curation of atomic data for scientific and technological applications. The atomic model for each ionic species is contained in a multi-sheet Google-Drive workbook, where the atomic parameters from all known public sources are progressively stored. Metadata (provenance, community discussion, etc.) accompanying every entry in the database are stored through Notebooks. Education tools on the physics of atomic processes as well as their relevance to plasma and spectral modeling are based on IPython Notebooks that integrate written material, images, videos, and active computer-tool workflows. Data processing workflows and collaborative software developments are encouraged and managed through the GitHub social network. Relevant issues this platform intends to address are: (i) data quality by allowing open access to both data producers and users in order to attain completeness, accuracy, consistency, provenance and currentness; (ii) comparisons of different datasets to facilitate accuracy assessment; (iii) downloading to local data structures (i.e. Pandas DataFrames) for further manipulation and analysis by prospective users; and (iv) data preservation by avoiding the discard of outdated sets.

  13. MAHMI database: a comprehensive MetaHit-based resource for the study of the mechanism of action of the human microbiota

    PubMed Central

    Blanco-Míguez, Aitor; Gutiérrez-Jácome, Alberto; Fdez-Riverola, Florentino; Lourenço, Anália; Sánchez, Borja

    2017-01-01

    The Mechanism of Action of the Human Microbiome (MAHMI) database is a unique resource that provides comprehensive information about the sequence of potential immunomodulatory and antiproliferative peptides encrypted in the proteins produced by the human gut microbiota. Currently, MAHMI database contains over 300 hundred million peptide entries, with detailed information about peptide sequence, sources and potential bioactivity. The reference peptide data section is curated manually by domain experts. The in silico peptide data section is populated automatically through the systematic processing of publicly available exoproteomes of the human microbiome. Bioactivity prediction is based on the global alignment of the automatically processed peptides with experimentally validated immunomodulatory and antiproliferative peptides, in the reference section. MAHMI provides researchers with a comparative tool for inspecting the potential immunomodulatory or antiproliferative bioactivity of new amino acidic sequences and identifying promising peptides to be further investigated. Moreover, researchers are welcome to submit new experimental evidence on peptide bioactivity, namely, empiric and structural data, as a proactive, expert means to keep the database updated and improve the implemented bioactivity prediction method. Bioactive peptides identified by MAHMI have a huge biotechnological potential, including the manipulation of aberrant immune responses and the design of new functional ingredients/foods based on the genetic sequences of the human microbiome. Hopefully, the resources provided by MAHMI will be useful to those researching gastrointestinal disorders of autoimmune and inflammatory nature, such as Inflammatory Bowel Diseases. MAHMI database is routinely updated and is available free of charge. Database URL: http://mahmi.org/ PMID:28077565

  14. Nutrient Control Design Manual

    EPA Science Inventory

    The Nutrient Control Design Manual will present an extensive state-of-the-technology review of the engineering design and operation of nitrogen and phosphorous control technologies and techniques applied at municipal wastewater treatment plants (WWTPs). This manual will present ...

  15. Value, but high costs in post-deposition data curation

    PubMed Central

    ten Hoopen, Petra; Amid, Clara; Luigi Buttigieg, Pier; Pafilis, Evangelos; Bravakos, Panos; Cerdeño-Tárraga, Ana M.; Gibson, Richard; Kahlke, Tim; Legaki, Aglaia; Narayana Murthy, Kada; Papastefanou, Gabriella; Pereira, Emiliano; Rossello, Marc; Luisa Toribio, Ana; Cochrane, Guy

    2016-01-01

    Discoverability of sequence data in primary data archives is proportional to the richness of contextual information associated with the data. Here, we describe an exercise in the improvement of contextual information surrounding sample records associated with metagenomics sequence reads available in the European Nucleotide Archive. We outline the annotation process and summarize findings of this effort aimed at increasing usability of publicly available environmental data. Furthermore, we emphasize the benefits of such an exercise and detail its costs. We conclude that such a third party annotation approach is expensive and has value as an element of curation, but should form only part of a more sustainable submitter-driven approach. Database URL: http://www.ebi.ac.uk/ena PMID:26861660

  16. Nutrient Control Design Manual

    EPA Science Inventory

    The purpose of this EPA design manual is to provide updated, state‐of‐the‐technology design guidance on nitrogen and phosphorus control at municipal Wastewater Treatment Plants (WWTPs). Similar to previous EPA manuals, this manual contains extensive information on the principles ...

  17. KDYNA user's manual

    SciTech Connect

    Levatin, J.A.L.; Attia, A.V.; Hallquist, J.O.

    1990-09-28

    This report is a complete user's manual for KDYNA, the Earth Sciences version of DYNA2D. Because most features of DYNA2D have been retained in KDYNA much of this manual is identical to the DYNA2D user's manual.

  18. Correcting Inconsistencies and Errors in Bacterial Genome Metadata Using an Automated Curation Tool in Excel (AutoCurE).

    PubMed

    Schmedes, Sarah E; King, Jonathan L; Budowle, Bruce

    2015-01-01

    Whole-genome data are invaluable for large-scale comparative genomic studies. Current sequencing technologies have made it feasible to sequence entire bacterial genomes with relative ease and time with a substantially reduced cost per nucleotide, hence cost per genome. More than 3,000 bacterial genomes have been sequenced and are available at the finished status. Publically available genomes can be readily downloaded; however, there are challenges to verify the specific supporting data contained within the download and to identify errors and inconsistencies that may be present within the organizational data content and metadata. AutoCurE, an automated tool for bacterial genome database curation in Excel, was developed to facilitate local database curation of supporting data that accompany downloaded genomes from the National Center for Biotechnology Information. AutoCurE provides an automated approach to curate local genomic databases by flagging inconsistencies or errors by comparing the downloaded supporting data to the genome reports to verify genome name, RefSeq accession numbers, the presence of archaea, BioProject/UIDs, and sequence file descriptions. Flags are generated for nine metadata fields if there are inconsistencies between the downloaded genomes and genomes reports and if erroneous or missing data are evident. AutoCurE is an easy-to-use tool for local database curation for large-scale genome data prior to downstream analyses.

  19. Pubcast and Genecast: Browsing and explor-ing publications and associated curated con-tent in biology through mobile devices.

    PubMed

    Goldweber, Scott; Theodore, Jamal; John Torcivia-Rodriguez, John; Simonyan, Vahan; Mazumder, Raja

    2016-03-23

    Services such as Facebook, Amazon, and eBay were once solely accessed from stationary computers. These web services are now being used increasingly on mobile devices. We acknowledge this new reality by providing users a way to access publications and a curated cancer mutation database on their mobile device with daily automated updates.

  20. Surveying the maize community for their diversity and pedigree visualization needs to prioritize tool development and curation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Maize Genetics and Genomics Database (MaizeGDB) team prepared a survey to identify breeders’ needs for visualizing pedigrees, diversity data, and haplotypes in order to prioritize tool development and curation efforts at MaizeGDB. The survey was distributed to the maize research community on beh...

  1. Opportunities for text mining in the FlyBase genetic literature curation workflow.

    PubMed

    McQuilton, Peter

    2012-01-01

    FlyBase is the model organism database for Drosophila genetic and genomic information. Over the last 20 years, FlyBase has had to adapt and change to keep abreast of advances in biology and database design. We are continually looking for ways to improve curation efficiency and efficacy. Genetic literature curation focuses on the extraction of genetic entities (e.g. genes, mutant alleles, transgenic constructs) and their associated phenotypes and Gene Ontology terms from the published literature. Over 2000 Drosophila research articles are now published every year. These articles are becoming ever more data-rich and there is a growing need for text mining to shoulder some of the burden of paper triage and data extraction. In this article, we describe our curation workflow, along with some of the problems and bottlenecks therein, and highlight the opportunities for text mining. We do so in the hope of encouraging the BioCreative community to help us to develop effective methods to mine this torrent of information. DATABASE URL: http://flybase.org

  2. Curating NASA's Extraterrestrial Samples - Past, Present, and Future

    NASA Technical Reports Server (NTRS)

    Allen, Carlton; Allton, Judith; Lofgren, Gary; Righter, Kevin; Zolensky, Michael

    2010-01-01

    Curation of extraterrestrial samples is the critical interface between sample return missions and the international research community. The Astromaterials Acquisition and Curation Office at the NASA Johnson Space Center (JSC) is responsible for curating NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials," JSC is charged with ". . . curation of all extraterrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including documentation, preservation, preparation, and distribution of samples for research, education, and public outreach.

  3. Curating NASA's Extraterrestrial Samples - Past, Present, and Future

    NASA Technical Reports Server (NTRS)

    Allen, Carlton; Allton, Judith; Lofgren, Gary; Righter, Kevin; Zolensky, Michael

    2011-01-01

    Curation of extraterrestrial samples is the critical interface between sample return missions and the international research community. The Astromaterials Acquisition and Curation Office at the NASA Johnson Space Center (JSC) is responsible for curating NASA s extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials", JSC is charged with ". . . curation of all extraterrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including "documentation, preservation, preparation, and distribution of samples for research, education, and public outreach."

  4. A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE).

    PubMed

    Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja

    2014-01-01

    Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http

  5. Astromaterials Acquisition and Curation Office (KT) Overview

    NASA Technical Reports Server (NTRS)

    Allen, Carlton

    2014-01-01

    The Astromaterials Acquisition and Curation Office has the unique responsibility to curate NASA's extraterrestrial samples - from past and forthcoming missions - into the indefinite future. Currently, curation includes documentation, preservation, physical security, preparation, and distribution of samples from the Moon, asteroids, comets, the solar wind, and the planet Mars. Each of these sample sets has a unique history and comes from a unique environment. The curation laboratories and procedures developed over 40 years have proven both necessary and sufficient to serve the evolving needs of a worldwide research community. A new generation of sample return missions to destinations across the solar system is being planned and proposed. The curators are developing the tools and techniques to meet the challenges of these new samples. Extraterrestrial samples pose unique curation requirements. These samples were formed and exist under conditions strikingly different from those on the Earth's surface. Terrestrial contamination would destroy much of the scientific significance of extraterrestrial materials. To preserve the research value of these precious samples, contamination must be minimized, understood, and documented. In addition, the samples must be preserved - as far as possible - from physical and chemical alteration. The elaborate curation facilities at JSC were designed and constructed, and have been operated for many years, to keep sample contamination and alteration to a minimum. Currently, JSC curates seven collections of extraterrestrial samples: (a)) Lunar rocks and soils collected by the Apollo astronauts, (b) Meteorites collected on dedicated expeditions to Antarctica, (c) Cosmic dust collected by high-altitude NASA aircraft,t (d) Solar wind atoms collected by the Genesis spacecraft, (e) Comet particles collected by the Stardust spacecraft, (f) Interstellar dust particles collected by the Stardust spacecraft, and (g) Asteroid soil particles collected

  6. NASA's Astromaterials Curation Digital Repository: Enabling Research Through Increased Access to Sample Data, Metadata and Imagery

    NASA Astrophysics Data System (ADS)

    Evans, C. A.; Todd, N. S.

    2014-12-01

    The Astromaterials Acquisition & Curation Office at NASA's Johnson Space Center (JSC) is the designated facility for curating all of NASA's extraterrestrial samples. Today, the suite of collections includes the lunar samples from the Apollo missions, cosmic dust particles falling into the Earth's atmosphere, meteorites collected in Antarctica, comet and interstellar dust particles from the Stardust mission, asteroid particles from Japan's Hayabusa mission, solar wind atoms collected during the Genesis mission, and space-exposed hardware from several missions. To support planetary science research on these samples, JSC's Astromaterials Curation Office hosts NASA's Astromaterials Curation digital repository and data access portal [http://curator.jsc.nasa.gov/], providing descriptions of the missions and collections, and critical information about each individual sample. Our office is designing and implementing several informatics initiatives to better serve the planetary research community. First, we are re-hosting the basic database framework by consolidating legacy databases for individual collections and providing a uniform access point for information (descriptions, imagery, classification) on all of our samples. Second, we continue to upgrade and host digital compendia that summarize and highlight published findings on the samples (e.g., lunar samples, meteorites from Mars). We host high resolution imagery of samples as it becomes available, including newly scanned images of historical prints from the Apollo missions. Finally we are creating plans to collect and provide new data, including 3D imagery, point cloud data, micro CT data, and external links to other data sets on selected samples. Together, these individual efforts will provide unprecedented digital access to NASA's Astromaterials, enabling preservation of the samples through more specific and targeted requests, and supporting new planetary science research and collaborations on the samples.

  7. If we build it, will they come? Curation and use of the ESO telescope bibliography

    NASA Astrophysics Data System (ADS)

    Grothkopf, Uta; Meakins, Silvia; Bordelon, Dominic

    2015-12-01

    The ESO Telescope Bibliography (telbib) is a database of refereed papers published by the ESO users community. It links data in the ESO Science Archive with the published literature, and vice versa. Developed and maintained by the ESO library, telbib also provides insights into the organization's research output and impact as measured through bibliometric studies. Curating telbib is a multi-step process that involves extensive tagging of the database records. Based on selected use cases, this talk will explain how the rich metadata provide parameters for reports and statistics in order to investigate the performance of ESO's facilities and to understand trends and developments in the publishing behaviour of the user community.

  8. Image Databases.

    ERIC Educational Resources Information Center

    Pettersson, Rune

    Different kinds of pictorial databases are described with respect to aims, user groups, search possibilities, storage, and distribution. Some specific examples are given for databases used for the following purposes: (1) labor markets for artists; (2) document management; (3) telling a story; (4) preservation (archives and museums); (5) research;…

  9. Maize databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  10. Plant Genome DataBase Japan (PGDBj): A Portal Website for the Integration of Plant Genome-Related Databases

    PubMed Central

    Asamizu, Erika; Ichihara, Hisako; Nakaya, Akihiro; Nakamura, Yasukazu; Hirakawa, Hideki; Ishii, Takahiro; Tamura, Takuro; Fukami-Kobayashi, Kaoru; Nakajima, Yukari; Tabata, Satoshi

    2014-01-01

    The Plant Genome DataBase Japan (PGDBj, http://pgdbj.jp/?ln=en) is a portal website that aims to integrate plant genome-related information from databases (DBs) and the literature. The PGDBj is comprised of three component DBs and a cross-search engine, which provides a seamless search over the contents of the DBs. The three DBs are as follows. (i) The Ortholog DB, providing gene cluster information based on the amino acid sequence similarity. Over 500,000 amino acid sequences of 20 Viridiplantae species were subjected to reciprocal BLAST searches and clustered. Sequences from plant genome DBs (e.g. TAIR10 and RAP-DB) were also included in the cluster with a direct link to the original DB. (ii) The Plant Resource DB, integrating the SABRE DB, which provides cDNA and genome sequence resources accumulated and maintained in the RIKEN BioResource Center and National BioResource Projects. (iii) The DNA Marker DB, providing manually or automatically curated information of DNA markers, quantitative trait loci and related linkage maps, from the literature and external DBs. As the PGDBj targets various plant species, including model plants, algae, and crops important as food, fodder and biofuel, researchers in the field of basic biology as well as a wide range of agronomic fields are encouraged to perform searches using DNA sequences, gene names, traits and phenotypes of interest. The PGDBj will return the search results from the component DBs and various types of linked external DBs. PMID:24363285

  11. ComPPI: a cellular compartment-specific database for protein-protein interaction network analysis.

    PubMed

    Veres, Daniel V; Gyurkó, Dávid M; Thaler, Benedek; Szalay, Kristóf Z; Fazekas, Dávid; Korcsmáros, Tamás; Csermely, Peter

    2015-01-01

    Here we present ComPPI, a cellular compartment-specific database of proteins and their interactions enabling an extensive, compartmentalized protein-protein interaction network analysis (URL: http://ComPPI.LinkGroup.hu). ComPPI enables the user to filter biologically unlikely interactions, where the two interacting proteins have no common subcellular localizations and to predict novel properties, such as compartment-specific biological functions. ComPPI is an integrated database covering four species (S. cerevisiae, C. elegans, D. melanogaster and H. sapiens). The compilation of nine protein-protein interaction and eight subcellular localization data sets had four curation steps including a manually built, comprehensive hierarchical structure of >1600 subcellular localizations. ComPPI provides confidence scores for protein subcellular localizations and protein-protein interactions. ComPPI has user-friendly search options for individual proteins giving their subcellular localization, their interactions and the likelihood of their interactions considering the subcellular localization of their interacting partners. Download options of search results, whole-proteomes, organelle-specific interactomes and subcellular localization data are available on its website. Due to its novel features, ComPPI is useful for the analysis of experimental results in biochemistry and molecular biology, as well as for proteome-wide studies in bioinformatics and network science helping cellular biology, medicine and drug design.

  12. ComPPI: a cellular compartment-specific database for protein–protein interaction network analysis

    PubMed Central

    Veres, Daniel V.; Gyurkó, Dávid M.; Thaler, Benedek; Szalay, Kristóf Z.; Fazekas, Dávid; Korcsmáros, Tamás; Csermely, Peter

    2015-01-01

    Here we present ComPPI, a cellular compartment-specific database of proteins and their interactions enabling an extensive, compartmentalized protein–protein interaction network analysis (URL: http://ComPPI.LinkGroup.hu). ComPPI enables the user to filter biologically unlikely interactions, where the two interacting proteins have no common subcellular localizations and to predict novel properties, such as compartment-specific biological functions. ComPPI is an integrated database covering four species (S. cerevisiae, C. elegans, D. melanogaster and H. sapiens). The compilation of nine protein–protein interaction and eight subcellular localization data sets had four curation steps including a manually built, comprehensive hierarchical structure of >1600 subcellular localizations. ComPPI provides confidence scores for protein subcellular localizations and protein–protein interactions. ComPPI has user-friendly search options for individual proteins giving their subcellular localization, their interactions and the likelihood of their interactions considering the subcellular localization of their interacting partners. Download options of search results, whole-proteomes, organelle-specific interactomes and subcellular localization data are available on its website. Due to its novel features, ComPPI is useful for the analysis of experimental results in biochemistry and molecular biology, as well as for proteome-wide studies in bioinformatics and network science helping cellular biology, medicine and drug design. PMID:25348397

  13. MediaDB: a database of microbial growth conditions in defined media.

    PubMed

    Richards, Matthew A; Cassen, Victor; Heavner, Benjamin D; Ajami, Nassim E; Herrmann, Andrea; Simeonidis, Evangelos; Price, Nathan D

    2014-01-01

    Isolating pure microbial cultures and cultivating them in the laboratory on defined media is used to more fully characterize the metabolism and physiology of organisms. However, identifying an appropriate growth medium for a novel isolate remains a challenging task. Even organisms with sequenced and annotated genomes can be difficult to grow, despite our ability to build genome-scale metabolic networks that connect genomic data with metabolic function. The scientific literature is scattered with information about defined growth media used successfully for cultivating a wide variety of organisms, but to date there exists no centralized repository to inform efforts to cultivate less characterized organisms by bridging the gap between genomic data and compound composition for growth media. Here we present MediaDB, a manually curated database of defined media that have been used for cultivating organisms with sequenced genomes, with an emphasis on organisms with metabolic network models. The database is accessible online, can be queried by keyword searches or downloaded in its entirety, and can generate exportable individual media formulation files. The data assembled in MediaDB facilitate comparative studies of organism growth media, serve as a starting point for formulating novel growth media, and contribute to formulating media for in silico investigation of metabolic networks. MediaDB is freely available for public use at https://mediadb.systemsbiology.net.

  14. SubtiWiki 2.0--an integrated database for the model organism Bacillus subtilis.

    PubMed

    Michna, Raphael H; Zhu, Bingyao; Mäder, Ulrike; Stülke, Jörg

    2016-01-04

    To understand living cells, we need knowledge of each of their parts as well as about the interactions of these parts. To gain rapid and comprehensive access to this information, annotation databases are required. Here, we present SubtiWiki 2.0, the integrated database for the model bacterium Bacillus subtilis (http://subtiwiki.uni-goettingen.de/). SubtiWiki provides text-based access to published information about the genes and proteins of B. subtilis as well as presentations of metabolic and regulatory pathways. Moreover, manually curated protein-protein interactions diagrams are linked to the protein pages. Finally, expression data are shown with respect to gene expression under 104 different conditions as well as absolute protein quantification for cytoplasmic proteins. To facilitate the mobile use of SubtiWiki, we have now expanded it by Apps that are available for iOS and Android devices. Importantly, the App allows to link private notes and pictures to the gene/protein pages. Today, SubtiWiki has become one of the most complete collections of knowledge on a living organism in one single resource.

  15. DDMGD: the database of text-mined associations between genes methylated in diseases from different species.

    PubMed

    Bin Raies, Arwa; Mansour, Hicham; Incitti, Roberto; Bajic, Vladimir B

    2015-01-01

    Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases.

  16. MetaboLights: An Open-Access Database Repository for Metabolomics Data.

    PubMed

    Kale, Namrata S; Haug, Kenneth; Conesa, Pablo; Jayseelan, Kalaivani; Moreno, Pablo; Rocca-Serra, Philippe; Nainala, Venkata Chandrasekhar; Spicer, Rachel A; Williams, Mark; Li, Xuefei; Salek, Reza M; Griffin, Julian L; Steinbeck, Christoph

    2016-03-24

    MetaboLights is the first general purpose, open-access database repository for cross-platform and cross-species metabolomics research at the European Bioinformatics Institute (EMBL-EBI). Based upon the open-source ISA framework, MetaboLights provides Metabolomics Standard Initiative (MSI) compliant metadata and raw experimental data associated with metabolomics experiments. Users can upload their study datasets into the MetaboLights Repository. These studies are then automatically assigned a stable and unique identifier (e.g., MTBLS1) that can be used for publication reference. The MetaboLights Reference Layer associates metabolites with metabolomics studies in the archive and is extensively annotated with data fields such as structural and chemical information, NMR and MS spectra, target species, metabolic pathways, and reactions. The database is manually curated with no specific release schedules. MetaboLights is also recommended by journals for metabolomics data deposition. This unit provides a guide to using MetaboLights, downloading experimental data, and depositing metabolomics datasets using user-friendly submission tools.

  17. ATLAS: A database linking binding affinities with structures for wild-type and mutant TCR-pMHC complexes.

    PubMed

    Borrman, Tyler; Cimons, Jennifer; Cosiano, Michael; Purcaro, Michael; Pierce, Brian G; Baker, Brian M; Weng, Zhiping

    2017-02-03

    The ATLAS (Altered TCR Ligand Affinities and Structures) database (https://zlab.umassmed.edu/atlas/web/) is a manually curated repository containing the binding affinities for wild-type and mutant T cell receptors (TCRs) and their antigens, peptides presented by the major histocompatibility complex (pMHC). The database links experimentally measured binding affinities with the corresponding three dimensional (3D) structures for TCR-pMHC complexes. The user can browse and search affinities, structures, and experimental details for TCRs, peptides, and MHCs of interest. We expect this database to facilitate the development of next-generation protein design algorithms targeting TCR-pMHC interactions. ATLAS can be easily parsed using modeling software that builds protein structures for training and testing. As an example, we provide structural models for all mutant TCRs in ATLAS, built using the Rosetta program. Utilizing these structures, we report a correlation of 0.63 between experimentally measured changes in binding energies and our predicted changes. Proteins 2017. © 2017 Wiley Periodicals, Inc.

  18. Curating and Nudging in Virtual CLIL Environments

    ERIC Educational Resources Information Center

    Nielsen, Helle Lykke

    2014-01-01

    Foreign language teachers can benefit substantially from the notions of curation and nudging when scaffolding CLIL activities on the internet. This article shows how these principles can be integrated into CLILstore, a free multimedia-rich learning tool with seamless access to online dictionaries, and presents feedback from first and second year…

  19. Curating Media Learning: Towards a Porous Expertise

    ERIC Educational Resources Information Center

    McDougall, Julian; Potter, John

    2015-01-01

    This article combines research results from a range of projects with two consistent themes. Firstly, we explore the potential for curation to offer a productive metaphor for the convergence of digital media learning across and between home/lifeworld and formal educational/system-world spaces--or between the public and private spheres. Secondly, we…

  20. The Role of Community-Driven Data Curation for Enterprises

    NASA Astrophysics Data System (ADS)

    Curry, Edward; Freitas, Andre; O'Riáin, Sean

    With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.

  1. How Workflow Documentation Facilitates Curation Planning

    NASA Astrophysics Data System (ADS)

    Wickett, K.; Thomer, A. K.; Baker, K. S.; DiLauro, T.; Asangba, A. E.

    2013-12-01

    The description of the specific processes and artifacts that led to the creation of a data product provide a detailed picture of data provenance in the form of a workflow. The Site-Based Data Curation project, hosted by the Center for Informatics Research in Science and Scholarship at the University of Illinois, has been investigating how workflows can be used in developing curation processes and policies that move curation "upstream" in the research process. The team has documented an individual workflow for geobiology data collected during a single field trip to Yellowstone National Park. This specific workflow suggests a generalized three-part process for field data collection that comprises three distinct elements: a Planning Stage, a Fieldwork Stage, and a Processing and Analysis Stage. Beyond supplying an account of data provenance, the workflow has allowed the team to identify 1) points of intervention for curation processes and 2) data products that are likely candidates for sharing or deposit. Although these objects may be viewed by individual researchers as 'intermediate' data products, discussions with geobiology researchers have suggested that with appropriate packaging and description they may serve as valuable observational data for other researchers. Curation interventions may include the introduction of regularized data formats during the planning process, data description procedures, the identification and use of established controlled vocabularies, and data quality and validation procedures. We propose a poster that shows the individual workflow and our generalization into a three-stage process. We plan to discuss with attendees how well the three-stage view applies to other types of field-based research, likely points of intervention, and what kinds of interventions are appropriate and feasible in the example workflow.

  2. CSTEM User Manual

    NASA Technical Reports Server (NTRS)

    Hartle, M.; McKnight, R. L.

    2000-01-01

    This manual is a combination of a user manual, theory manual, and programmer manual. The reader is assumed to have some previous exposure to the finite element method. This manual is written with the idea that the CSTEM (Coupled Structural Thermal Electromagnetic-Computer Code) user needs to have a basic understanding of what the code is actually doing in order to properly use the code. For that reason, the underlying theory and methods used in the code are described to a basic level of detail. The manual gives an overview of the CSTEM code: how the code came into existence, a basic description of what the code does, and the order in which it happens (a flowchart). Appendices provide a listing and very brief description of every file used by the CSTEM code, including the type of file it is, what routine regularly accesses the file, and what routine opens the file, as well as special features included in CSTEM.

  3. The Danish Bladder Cancer Database

    PubMed Central

    Hansen, Erik; Larsson, Heidi; Nørgaard, Mette; Thind, Peter; Jensen, Jørgen Bjerggaard

    2016-01-01

    Aim of database The aim of the Danish Bladder Cancer Database (DaBlaCa-data) is to monitor the treatment of all patients diagnosed with invasive bladder cancer (BC) in Denmark. Study population All patients diagnosed with BC in Denmark from 2012 onward were included in the study. Results presented in this paper are predominantly from the 2013 population. Main variables In 2013, 970 patients were diagnosed with BC in Denmark and were included in a preliminary report from the database. A total of 458 (47%) patients were diagnosed with non-muscle-invasive BC (non-MIBC) and 512 (53%) were diagnosed with muscle-invasive BC (MIBC). A total of 300 (31%) patients underwent cystectomy. Among the 135 patients diagnosed with MIBC, who were 75 years of age or younger, 67 (50%) received neoadjuvent chemotherapy prior to cystectomy. In 2013, a total of 147 patients were treated with curative-intended radiation therapy. Descriptive data One-year mortality was 28% (95% confidence interval [CI]: 15–21). One-year cancer-specific mortality was 25% (95% CI: 22–27%). One-year mortality after cystectomy was 14% (95% CI: 10–18). Ninety-day mortality after cystectomy was 3% (95% CI: 1–5) in 2013. One-year mortality following curative-intended radiation therapy was 32% (95% CI: 24–39) and 1-year cancer-specific mortality was 23% (95% CI: 16–31) in 2013. Conclusion This preliminary DaBlaCa-data report showed that the treatment of MIBC in Denmark overall meet high international academic standards. The database is able to identify Danish BC patients and monitor treatment and mortality. In the future, DaBlaCa-data will be a valuable data source and expansive observational studies on BC will be available. PMID:27822081

  4. Industrial labor relations manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    The NASA Industrial Labor Relations Manual provides internal guidelines and procedures to assist NASA Field Installations in dealing with contractor labor management disputes, Service Contract Act variance hearings, and to provide access of Labor Union Representatives to NASA for the purpose of maintaining schedules and goals in connection with vital NASA programs. This manual will be revised by page changes as revisions become necessary. Initial distribution of this manual has been made to NASA Headquarters and Field Installations.

  5. Radiological Control Manual

    SciTech Connect

    Not Available

    1993-04-01

    This manual has been prepared by Lawrence Berkeley Laboratory to provide guidance for site-specific additions, supplements, and clarifications to the DOE Radiological Control Manual. The guidance provided in this manual is based on the requirements given in Title 10 Code of Federal Regulations Part 835, Radiation Protection for Occupational Workers, DOE Order 5480.11, Radiation Protection for Occupational Workers, and the DOE Radiological Control Manual. The topics covered are (1) excellence in radiological control, (2) radiological standards, (3) conduct of radiological work, (4) radioactive materials, (5) radiological health support operations, (6) training and qualification, and (7) radiological records.

  6. EMSL Operations Manual

    SciTech Connect

    Foster, Nancy S.

    2009-03-25

    This manual is a general resource tool to assist EMSL users and Laboratory staff within EMSL locate official policy, practice and subject matter experts. It is not intended to replace or amend any formal Battelle policy or practice. Users of this manual should rely only on Battelle’s Standard Based Management System (SBMS) for official policy. No contractual commitment or right of any kind is created by this manual. Battelle management reserves the right to alter, change, or delete any information contained within this manual without prior notice.

  7. EMSL Operations Manual

    SciTech Connect

    Foster, Nancy S.

    2009-06-18

    This manual is a general resource tool to assist EMSL users and Laboratory staff within EMSL locate official policy, practice and subject matter experts. It is not intended to replace or amend any formal Battelle policy or practice. Users of this manual should rely only on Battelle’s Standard Based Management System (SBMS) for official policy. No contractual commitment or right of any kind is created by this manual. Battelle management reserves the right to alter, change, or delete any information contained within this manual without prior notice.

  8. Instruct coders' manual

    NASA Technical Reports Server (NTRS)

    Friend, J.

    1971-01-01

    A manual designed both as an instructional manual for beginning coders and as a reference manual for the coding language INSTRUCT, is presented. The manual includes the major programs necessary to implement the teaching system and lists the limitation of current implementation. A detailed description is given of how to code a lesson, what buttons to push, and what utility programs to use. Suggestions for debugging coded lessons and the error messages that may be received during assembly or while running the lesson are given.

  9. Business Modeling System (BMS) users manual

    SciTech Connect

    Pennewell, W.J.; White, S.A.; Mott, T.B.

    1988-08-01

    The Business Modeling System (BMS) Users Manual was produced for the NAVMIPPS Project Office (CODE 5N) of the Navy Finance Center (NAVFINCEN) by members of the Martin Marietta Energy Systems Navy Military Integrated Personnel and Pay Strategy (NAVMIPPS) Project Team. The manual was developed to aid users of the Business Modeling System which was previously delivered to the NAVFINCEN Operations Directorate (CODE 6) by the NAVMIPPS Project Team. The BMS comprises CODE 6 data and a set of dBASE III PLUS programs which were developed to store the data in varying formats designed to aid in the formulation and evaluation of CODE 6 reorganization options. This user manual contains instructions for database installation, online query, and preformatted report program use. Appendix A is a list of the thirty-two CODE 6 suborganizations from which the BMS data were collected. Appendix B is BMS MENU/FUNCTION HIERARCHY. Appendix C is a compilation sample report output.

  10. Demonstration and Validation Assets: User Manual Development

    SciTech Connect

    2008-06-30

    This report documents the development of a database-supported user manual for DEMVAL assets in the NSTI area of operations and focuses on providing comprehensive user information on DEMVAL assets serving businesses with national security technology applications in southern New Mexico. The DEMVAL asset program is being developed as part of the NSPP, funded by both Department of Energy (DOE) and NNSA. This report describes the development of a comprehensive user manual system for delivering indexed DEMVAL asset information to be used in marketing and visibility materials and to NSTI clients, prospective clients, stakeholders, and any person or organization seeking it. The data about area DEMVAL asset providers are organized in an SQL database with updateable application structure that optimizes ease of access and customizes search ability for the user.

  11. NASA's Earth Observatory Natural Event Tracker: Curating Metadata for Linking Data and Images to Natural Events

    NASA Astrophysics Data System (ADS)

    Ward, K.

    2015-12-01

    On any given date, there are multiple natural events occurring on our planet. Storms, wildfires, volcanoes and algal blooms can be analyzed and represented using multiple dataset parameters. These parameters, in turn, may be visualized in multiple ways and disseminated via multiple web services. Given these multiple-to-multiple relationships, we already have the makings of a microverse of linked data. In an attempt to begin putting this microverse to practical use, NASA's Earth Observatory Group has developed a prototype system called the Earth Observatory Natural Event Tracker (EONET). EONET is a metadata-driven service that is exploring digital curation as a means to adding value to the intersection of natural event-related data and existing web service-enabled visualization systems. A curated natural events database maps specific events to topical groups (e.g., storms, fires, volcanoes), from those groups to related web service visualization systems and, eventually, to the source data products themselves. I will discuss the complexities that arise from attempting to map event types to dataset parameters, and the issues of granularity that come from trying to define exactly what is, and what constrains, a single natural event, particularly in a system where one of the end goals is to provide a group-curated database.

  12. Data Albums: An Event Driven Search, Aggregation and Curation Tool for Earth Science

    NASA Technical Reports Server (NTRS)

    Ramachandran, Rahul; Kulkarni, Ajinkya; Maskey, Manil; Bakare, Rohan; Basyal, Sabin; Li, Xiang; Flynn, Shannon

    2014-01-01

    Approaches used in Earth science research such as case study analysis and climatology studies involve discovering and gathering diverse data sets and information to support the research goals. To gather relevant data and information for case studies and climatology analysis is both tedious and time consuming. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. In cases where researchers are interested in studying a significant event, they have to manually assemble a variety of datasets relevant to it by searching the different distributed data systems. This paper presents a specialized search, aggregation and curation tool for Earth science to address these challenges. The search rool automatically creates curated 'Data Albums', aggregated collections of information related to a specific event, containing links to relevant data files [granules] from different instruments, tools and services for visualization and analysis, and information about the event contained in news reports, images or videos to supplement research analysis. Curation in the tool is driven via an ontology based relevancy ranking algorithm to filter out non relevant information and data.

  13. Semantic mediation in the national geologic map database (US)

    USGS Publications Warehouse

    Percy, D.; Richard, S.; Soller, D.

    2008-01-01

    Controlled language is the primary challenge in merging heterogeneous databases of geologic information. Each agency or organization produces databases with different schema, and different terminology for describing the objects within. In order to make some progress toward merging these databases using current technology, we have developed software and a workflow that allows for the "manual semantic mediation" of these geologic map databases. Enthusiastic support from many state agencies (stakeholders and data stewards) has shown that the community supports this approach. Future implementations will move toward a more Artificial Intelligence-based approach, using expert-systems or knowledge-bases to process data based on the training sets we have developed manually.

  14. Indoor Air Quality Manual.

    ERIC Educational Resources Information Center

    Baldwin Union Free School District, NY.

    This manual identifies ways to improve a school's indoor air quality (IAQ) and discusses practical actions that can be carried out by school staff in managing air quality. The manual includes discussions of the many sources contributing to school indoor air pollution and the preventive planning for each including renovation and repair work,…

  15. Technical Manual. The ACT®

    ERIC Educational Resources Information Center

    ACT, Inc., 2014

    2014-01-01

    This manual contains technical information about the ACT® college readiness assessment. The principal purpose of this manual is to document the technical characteristics of the ACT in light of its intended purposes. ACT regularly conducts research as part of the ongoing formative evaluation of its programs. The research is intended to ensure that…

  16. Automatic Versus Manual Indexing

    ERIC Educational Resources Information Center

    Vander Meulen, W. A.; Janssen, P. J. F. C.

    1977-01-01

    A comparative evaluation of results in terms of recall and precision from queries submitted to systems with automatic and manual subject indexing. Differences were attributed to query formulation. The effectiveness of automatic indexing was found equivalent to manual indexing. (Author/KP)

  17. Biology Laboratory Safety Manual.

    ERIC Educational Resources Information Center

    Case, Christine L.

    The Centers for Disease Control (CDC) recommends that schools prepare or adapt a biosafety manual, and that instructors develop a list of safety procedures applicable to their own lab and distribute it to each student. In this way, safety issues will be brought to each student's attention. This document is an example of such a manual. It contains…

  18. School District Energy Manual.

    ERIC Educational Resources Information Center

    Association of School Business Officials International, Reston, VA.

    This manual serves as an energy conservation reference and management guide for school districts. The School District Energy Program (SDEP) is designed to provide information and/or assistance to school administrators planning to implement a comprehensive energy management program. The manual consists of 15 parts. Part 1 describes the SDEP; Parts…

  19. Police Robbery Control Manual.

    ERIC Educational Resources Information Center

    Ward, Richard H.; And Others

    The manual was designed as a practical guide for police department personnel in developing robbery control programs. An introductory chapter defines types of robberies and suggests administrative and operational uses of the manual. Research and control strategies are reported according to five robbery types: street (visible and non-visible),…

  20. Manual for Refugee Sponsorship.

    ERIC Educational Resources Information Center

    Brewer, Kathleen; Taran, Patrick A.

    This manual provides guidelines for religious congregations interested in setting up programs to sponsor refugees. The manual describes the psychological implications of the refugee experience and discusses initial steps in organizing for sponsorship. Detailed information is provided for sponsors regarding: finances and mobilization of resources;…

  1. Materials inventory management manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    This NASA Materials Inventory Management Manual (NHB 4100.1) is issued pursuant to Section 203(c)(1) of the National Aeronautics and Space Act of 1958 (42 USC 2473). It sets forth policy, performance standards, and procedures governing the acquisition, management and use of materials. This Manual is effective upon receipt.

  2. School Reform Resource Manual.

    ERIC Educational Resources Information Center

    Mid-Continent Research for Education and Learning, Aurora, CO.

    This manual is designed to help schools make successful school reform a reality. It provides the background and perspectives necessary for a school constituency to understand the current climate of education reform in the United States and what is known about successful school reform. The manual also provides inquiry-based techniques for…

  3. Boating Safety Training Manual.

    ERIC Educational Resources Information Center

    Coast Guard, Washington, DC.

    The training manual serves as the text for the Coast Guard's boating safety 32-hour course and for the D-8 Qualification Code Recertification Course. The manual is designed for self-study or for use with an instructor-led course. Each chapter concludes with a quiz to be used as a review of chaper content. Opening chapters review the use of the…

  4. Learning Resources Evaluations Manual.

    ERIC Educational Resources Information Center

    Nunes, Evelyn H., Ed.

    This manual contains evaluations of 196 instructional products listed in Virginia's Adult Basic Education Curricula Resource Catalog. It is intended as a convenient reference manual for making informed decisions concerning materials for adult learners in adult basic education, English-as-a-Second-Language instruction, and general educational…

  5. Dental Charting. Student's Manual.

    ERIC Educational Resources Information Center

    Weaver, Trudy Karlene; Apfel, Maura

    This manual is part of a series dealing with skills and information needed by students in dental assisting. The individualized student materials are suitable for classroom, laboratory, or cooperative training programs. This student manual contains four units covering the following topics: dental anatomical terminology; tooth numbering systems;…

  6. [Physiotherapy as manual therapy].

    PubMed

    Torstensen, T A; Nielsen, L L; Jensen, R; Reginiussen, T; Wiesener, T; Kirkesola, G; Mengshoel, A M

    1999-05-30

    Manual therapy includes methods where the therapist's hands are used to stretch, mobilize or manipulate the spinal column, paravertebral structures or extremity joints. The aims of these methods are to relieve pain and improve function. In Norway only specially qualified physiotherapists and chiropractors are authorized to perform manipulation of joints (high velocity thrust techniques). To become a qualified manual therapist in Norway one must have a minimum of two years of clinical practice as physiotherapist followed by two year full time postgraduate training in manual therapy (a total of six years). Historically the Norwegian manual therapy system was developed in the 1950s by physiotherapists and medical doctors in England (James Cyriax and James Mennell) and Norway. As a result doctors allowed physiotherapists to use manipulation as a treatment method of both spinal and peripheral joints. In 1957 the Norwegian health authorities introduced reimbursement for manual therapy performed by physiotherapists.

  7. Experiment Databases

    NASA Astrophysics Data System (ADS)

    Vanschoren, Joaquin; Blockeel, Hendrik

    Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.

  8. A Reflection on a Data Curation Journey

    PubMed Central

    van Zyl, Christa

    2015-01-01

    This commentary is a reflection on experience of data preservation and sharing (i.e., data curation) practices developed in a South African research organization. The lessons learned from this journey have echoes in the findings and recommendations emerging from the present study in Low and Middle-Income Countries (LMIC) and may usefully contribute to more general reflection on the management of change in data practice. PMID:26297756

  9. Data Curation Education in Research Centers (DCERC)

    NASA Astrophysics Data System (ADS)

    Marlino, M. R.; Mayernik, M. S.; Kelly, K.; Allard, S.; Tenopir, C.; Palmer, C.; Varvel, V. E., Jr.

    2012-12-01

    Digital data both enable and constrain scientific research. Scientists are enabled by digital data to develop new research methods, utilize new data sources, and investigate new topics, but they also face new data collection, management, and preservation burdens. The current data workforce consists primarily of scientists who receive little formal training in data management and data managers who are typically educated through on-the-job training. The Data Curation Education in Research Centers (DCERC) program is investigating a new model for educating data professionals to contribute to scientific research. DCERC is a collaboration between the University of Illinois at Urbana-Champaign Graduate School of Library and Information Science, the University of Tennessee School of Information Sciences, and the National Center for Atmospheric Research. The program is organized around a foundations course in data curation and provides field experiences in research and data centers for both master's and doctoral students. This presentation will outline the aims and the structure of the DCERC program and discuss results and lessons learned from the first set of summer internships in 2012. Four masters students participated and worked with both data mentors and science mentors, gaining first hand experiences in the issues, methods, and challenges of scientific data curation. They engaged in a diverse set of topics, including climate model metadata, observational data management workflows, and data cleaning, documentation, and ingest processes within a data archive. The students learned current data management practices and challenges while developing expertise and conducting research. They also made important contributions to NCAR data and science teams by evaluating data management workflows and processes, preparing data sets to be archived, and developing recommendations for particular data management activities. The master's student interns will return in summer of 2013

  10. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking.

    PubMed

    Wang, Mingxun; Carver, Jeremy J; Phelan, Vanessa V; Sanchez, Laura M; Garg, Neha; Peng, Yao; Nguyen, Don Duy; Watrous, Jeramie; Kapono, Clifford A; Luzzatto-Knaan, Tal; Porto, Carla; Bouslimani, Amina; Melnik, Alexey V; Meehan, Michael J; Liu, Wei-Ting; Crüsemann, Max; Boudreau, Paul D; Esquenazi, Eduardo; Sandoval-Calderón, Mario; Kersten, Roland D; Pace, Laura A; Quinn, Robert A; Duncan, Katherine R; Hsu, Cheng-Chih; Floros, Dimitrios J; Gavilan, Ronnie G; Kleigrewe, Karin; Northen, Trent; Dutton, Rachel J; Parrot, Delphine; Carlson, Erin E; Aigle, Bertrand; Michelsen, Charlotte F; Jelsbak, Lars; Sohlenkamp, Christian; Pevzner, Pavel; Edlund, Anna; McLean, Jeffrey; Piel, Jörn; Murphy, Brian T; Gerwick, Lena; Liaw, Chih-Chuang; Yang, Yu-Liang; Humpf, Hans-Ulrich; Maansson, Maria; Keyzers, Robert A; Sims, Amy C; Johnson, Andrew R; Sidebottom, Ashley M; Sedio, Brian E; Klitgaard, Andreas; Larson, Charles B; Boya P, Cristopher A; Torres-Mendoza, Daniel; Gonzalez, David J; Silva, Denise B; Marques, Lucas M; Demarque, Daniel P; Pociute, Egle; O'Neill, Ellis C; Briand, Enora; Helfrich, Eric J N; Granatosky, Eve A; Glukhov, Evgenia; Ryffel, Florian; Houson, Hailey; Mohimani, Hosein; Kharbush, Jenan J; Zeng, Yi; Vorholt, Julia A; Kurita, Kenji L; Charusanti, Pep; McPhail, Kerry L; Nielsen, Kristian Fog; Vuong, Lisa; Elfeki, Maryam; Traxler, Matthew F; Engene, Niclas; Koyama, Nobuhiro; Vining, Oliver B; Baric, Ralph; Silva, Ricardo R; Mascuch, Samantha J; Tomasi, Sophie; Jenkins, Stefan; Macherla, Venkat; Hoffman, Thomas; Agarwal, Vinayak; Williams, Philip G; Dai, Jingqui; Neupane, Ram; Gurr, Joshua; Rodríguez, Andrés M C; Lamsa, Anne; Zhang, Chen; Dorrestein, Kathleen; Duggan, Brendan M; Almaliti, Jehad; Allard, Pierre-Marie; Phapale, Prasad; Nothias, Louis-Felix; Alexandrov, Theodore; Litaudon, Marc; Wolfender, Jean-Luc; Kyle, Jennifer E; Metz, Thomas O; Peryea, Tyler; Nguyen, Dac-Trung; VanLeer, Danielle; Shinn, Paul; Jadhav, Ajit; Müller, Rolf; Waters, Katrina M; Shi, Wenyuan; Liu, Xueting; Zhang, Lixin; Knight, Rob; Jensen, Paul R; Palsson, Bernhard Ø; Pogliano, Kit; Linington, Roger G; Gutiérrez, Marcelino; Lopes, Norberto P; Gerwick, William H; Moore, Bradley S; Dorrestein, Pieter C; Bandeira, Nuno

    2016-08-09

    The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data.

  11. Product Manuals: A Consumer Perspective.

    ERIC Educational Resources Information Center

    Showers, Linda S.; And Others

    1993-01-01

    Qualitative analysis of insights from consumer focus groups on product manual usage reveals consumer perceptions and preferences regarding manual and safety message format. Results can be used to improve manual design and content. (JOW)

  12. Creating a VAPEPS database: A VAPEPS tutorial

    NASA Technical Reports Server (NTRS)

    Graves, George

    1989-01-01

    A procedural method is outlined for creating a Vibroacoustic Payload Environment Prediction System (VAPEPS) Database. The method of presentation employs flowcharts of sequential VAPEPS Commands used to create a VAPEPS Database. The commands are accompanied by explanatory text to the right of the command in order to minimize the need for repetitive reference to the VAPEPS user's manual. The method is demonstrated by examples of varying complexity. It is assumed that the reader has acquired a basic knowledge of the VAPEPS software program.

  13. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  14. The Danish Melanoma Database

    PubMed Central

    Hölmich, Lisbet Rosenkrantz; Klausen, Siri; Spaun, Eva; Schmidt, Grethe; Gad, Dorte; Svane, Inge Marie; Schmidt, Henrik; Lorentzen, Henrik Frank; Ibfelt, Else Helene

    2016-01-01

    Aim of database The aim of the database is to monitor and improve the treatment and survival of melanoma patients. Study population All Danish patients with cutaneous melanoma and in situ melanomas must be registered in the Danish Melanoma Database (DMD). In 2014, 2,525 patients with invasive melanoma and 780 with in situ tumors were registered. The coverage is currently 93% compared with the Danish Pathology Register. Main variables The main variables include demographic, clinical, and pathological characteristics, including Breslow’s tumor thickness, ± ulceration, mitoses, and tumor–node–metastasis stage. Information about the date of diagnosis, treatment, type of surgery, including safety margins, results of lymphoscintigraphy in patients for whom this was indicated (tumors > T1a), results of sentinel node biopsy, pathological evaluation hereof, and follow-up information, including recurrence, nature, and treatment hereof is registered. In case of death, the cause and date are included. Currently, all data are entered manually; however, data catchment from the existing registries is planned to be included shortly. Descriptive data The DMD is an old research database, but new as a clinical quality register. The coverage is high, and the performance in the five Danish regions is quite similar due to strong adherence to guidelines provided by the Danish Melanoma Group. The list of monitored indicators is constantly expanding, and annual quality reports are issued. Several important scientific studies are based on DMD data. Conclusion DMD holds unique detailed information about tumor characteristics, the surgical treatment, and follow-up of Danish melanoma patients. Registration and monitoring is currently expanding to encompass even more clinical parameters to benefit both patient treatment and research. PMID:27822097

  15. Digital Management and Curation of the National Rock and Ore Collections at NMNH, Smithsonian

    NASA Astrophysics Data System (ADS)

    Cottrell, E.; Andrews, B.; Sorensen, S. S.; Hale, L. J.

    2011-12-01

    The National Museum of Natural History, Smithsonian Institution, is home to the world's largest curated rock collection. The collection houses 160,680 physical rock and ore specimen lots ("samples"), all of which already have a digital record that can be accessed by the public through a searchable web interface (http://collections.mnh.si.edu/search/ms/). In addition, there are 66 accessions pending that when catalogued will add approximately 60,000 specimen lots. NMNH's collections are digitally managed on the KE EMu° platform which has emerged as the premier system for managing collections in natural history museums worldwide. In 2010 the Smithsonian released an ambitious 5 year Digitization Strategic Plan. In Mineral Sciences, new digitization efforts in the next five years will focus on integrating various digital resources for volcanic specimens. EMu sample records will link to the corresponding records for physical eruption information housed within the database of Smithsonian's Global Volcanism Program (GVP). Linkages are also planned between our digital records and geochemical databases (like EarthChem or PetDB) maintained by third parties. We anticipate that these linkages will increase the use of NMNH collections as well as engender new scholarly directions for research. Another large project the museum is currently undertaking involves the integration of the functionality of in-house designed Transaction Management software with the EMu database. This will allow access to the details (borrower, quantity, date, and purpose) of all loans of a given specimen through its catalogue record. We hope this will enable cross-referencing and fertilization of research ideas while avoiding duplicate efforts. While these digitization efforts are critical, we propose that the greatest challenge to sample curation is not posed by digitization and that a global sample registry alone will not ensure that samples are available for reuse. We suggest instead that the ability

  16. MaizeGDB update: New tools, data, and interface for the maize model organism database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB is a highly curated, community-oriented database and informatics service to researchers focused on the crop plant and model organism Zea mays ssp. mays. Although some form of the maize community database has existed over the last 25 years, there have only been two major releases. In 1991, ...

  17. CottonGen: a genomics, genetics and breeding database for cotton research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, vis...

  18. CMS RATFOR System Manual.

    DTIC Science & Technology

    1979-07-01

    CMS RATFOR SYSTEM MANUAL.(U) U 79 S M CHOQUETTE, R J ORGASS AFOSR-79-O021 NCLASSIFIED VPI/SU-TM-79- AFOSR -TR-80-0277 NI MEhLlllllElIIIIIIII...GRADUATE PROGRAM IN NORTHERN VIRGINIA FAtheqsow, AC 20041 (703) 471-4600 CMS RATFOR SYSTEM MANUAL*t Stephen M. Choquette and Richard J. Orgass DTIO...the System Manual for the RATFOR preprocessor on the IBM CMS timesharing system . Included in this paper is a language description of RATPOR, an

  19. Fire Protection Program Manual

    SciTech Connect

    Sharry, J A

    2012-05-18

    This manual documents the Lawrence Livermore National Laboratory (LLNL) Fire Protection Program. Department of Energy (DOE) Orders 420.1B, Facility Safety, requires LLNL to have a comprehensive and effective fire protection program that protects LLNL personnel and property, the public and the environment. The manual provides LLNL and its facilities with general information and guidance for meeting DOE 420.1B requirements. The recommended readers for this manual are: fire protection officers, fire protection engineers, fire fighters, facility managers, directorage assurance managers, facility coordinators, and ES and H team members.

  20. OCDB: a database collecting genes, miRNAs and drugs for obsessive-compulsive disorder

    PubMed Central

    Privitera, Anna P.; Distefano, Rosario; Wefer, Hugo A.; Ferro, Alfredo; Pulvirenti, Alfredo; Giugno, Rosalba

    2015-01-01

    Obsessive-compulsive disorder (OCD) is a psychiatric condition characterized by intrusive and unwilling thoughts (obsessions) giving rise to anxiety. The patients feel obliged to perform a behavior (compulsions) induced by the obsessions. The World Health Organization ranks OCD as one of the 10 most disabling medical conditions. In the class of Anxiety Disorders, OCD is a pathology that shows an hereditary component. Consequently, an online resource collecting and integrating scientific discoveries and genetic evidence about OCD would be helpful to improve the current knowledge on this disorder. We have developed a manually curated database, OCD Database (OCDB), collecting the relations between candidate genes in OCD, microRNAs (miRNAs) involved in the pathophysiology of OCD and drugs used in its treatments. We have screened articles from PubMed and MEDLINE. For each gene, the bibliographic references with a brief description of the gene and the experimental conditions are shown. The database also lists the polymorphisms within genes and its chromosomal regions. OCDB data is enriched with both validated and predicted miRNA-target and drug-target information. The transcription factors regulations, which are also included, are taken from David and TransmiR. Moreover, a scoring function ranks the relevance of data in the OCDB context. The database is also integrated with the main online resources (PubMed, Entrez-gene, HGNC, dbSNP, DrugBank, miRBase, PubChem, Kegg, Disease-ontology and ChEBI). The web interface has been developed using phpMyAdmin and Bootstrap software. This allows (i) to browse data by category and (ii) to navigate in the database by searching genes, miRNAs, drugs, SNPs, regions, drug targets and articles. The data can be exported in textual format as well as the whole database in.sql or tabular format. OCDB is an essential resource to support genome-wide analysis, genetic and pharmacological studies. It also facilitates the evaluation of genetic data

  1. Research Problems in Data Curation: Outcomes from the Data Curation Education in Research Centers Program

    NASA Astrophysics Data System (ADS)

    Palmer, C. L.; Mayernik, M. S.; Weber, N.; Baker, K. S.; Kelly, K.; Marlino, M. R.; Thompson, C. A.

    2013-12-01

    The need for data curation is being recognized in numerous institutional settings as national research funding agencies extend data archiving mandates to cover more types of research grants. Data curation, however, is not only a practical challenge. It presents many conceptual and theoretical challenges that must be investigated to design appropriate technical systems, social practices and institutions, policies, and services. This presentation reports on outcomes from an investigation of research problems in data curation conducted as part of the Data Curation Education in Research Centers (DCERC) program. DCERC is developing a new model for educating data professionals to contribute to scientific research. The program is organized around foundational courses and field experiences in research and data centers for both master's and doctoral students. The initiative is led by the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign, in collaboration with the School of Information Sciences at the University of Tennessee, and library and data professionals at the National Center for Atmospheric Research (NCAR). At the doctoral level DCERC is educating future faculty and researchers in data curation and establishing a research agenda to advance the field. The doctoral seminar, Research Problems in Data Curation, was developed and taught in 2012 by the DCERC principal investigator and two doctoral fellows at the University of Illinois. It was designed to define the problem space of data curation, examine relevant concepts and theories related to both technical and social perspectives, and articulate research questions that are either unexplored or under theorized in the current literature. There was a particular emphasis on the Earth and environmental sciences, with guest speakers brought in from NCAR, National Snow and Ice Data Center (NSIDC), and Rensselaer Polytechnic Institute. Through the assignments, students

  2. VAPEPS user's reference manual, version 5.0

    NASA Technical Reports Server (NTRS)

    Park, D. M.

    1988-01-01

    This is the reference manual for the VibroAcoustic Payload Environment Prediction System (VAPEPS). The system consists of a computer program and a vibroacoustic database. The purpose of the system is to collect measurements of vibroacoustic data taken from flight events and ground tests, and to retrieve this data and provide a means of using the data to predict future payload environments. This manual describes the operating language of the program. Topics covered include database commands, Statistical Energy Analysis (SEA) prediction commands, stress prediction command, and general computational commands.

  3. CHPVDB ‐ a sequence annotation database for Chandipura Virus

    PubMed Central

    Dikhit, Manas Ranjan; Rana, Sindhu Prava; Das, Pradeep; Sahoo, Ganesh Chandra

    2009-01-01

    Databases containing proteomic information have become indispensable for virology studies. As the gap between the amount of sequence information and functional characterization widens, increasing efforts are being directed to the development of databases. For virologist, it is therefore desirable to have a single data collection point which integrates research related data from different domains. CHPVDB is our effort to provide virologist such a one‐step information center. We describe herein the creation of CHPVDB, a new database that integrates information of different proteins in to a single resource. For basic curation of protein information, the database relies on features from other selected databases, servers and published reports. This database facilitates significant relationship between molecular analysis, cleavage sites, possible protein functional families assigned to different proteins of Chandipura virus (CHPV) by SVMProt and related tools. Availability The database is freely available at http://chpvdb.biomedinformri.org/. PMID:19293996

  4. PAH Mutation Analysis Consortium Database: 1997. Prototype for relational locus-specific mutation databases.

    PubMed Central

    Nowacki, P M; Byck, S; Prevost, L; Scriver, C R

    1998-01-01

    PAHdb (http://www.mcgill.ca/pahdb ) is a curated relational database (Fig. 1) of nucleotide variation in the human PAH cDNA (GenBank U49897). Among 328 different mutations by state (Fig. 2) the majority are rare mutations causing hyperphenylalaninemia (HPA) (OMIM 261600), the remainder are polymorphic variants without apparent effect on phenotype. PAHdb modules contain mutations, polymorphic haplotypes, genotype-phenotype correlations, expression analysis, sources of information and the reference sequence; the database also contains pages of clinical information and data on three ENU mouse orthologues of human HPA. Only six different mutations account for 60% of human HPA chromosomes worldwide, mutations stratify by population and geographic region, and the Oriental and Caucasian mutation sets are different (Fig. 3). PAHdb provides curated electronic publication and one third of its incoming reports are direct submissions. Each different mutation receives a systematic (nucleotide) name and a unique identifier (UID). Data are accessed both by a Newsletter and a search engine on the website; integrity of the database is ensured by keeping the curated template offline. There have been >6500 online interrogations of the website. PMID:9399840

  5. IPD--the Immuno Polymorphism Database.

    PubMed

    Robinson, James; Waller, Matthew J; Stoehr, Peter; Marsh, Steven G E

    2005-01-01

    The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors; IPD-MHC, a database of sequences of the Major Histocompatibility Complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC share the same database structure. The sharing of a common database structure makes it easier to implement common tools for data submission and retrieval. The data are currently available online from the website and ftp directory; files will also be made available in different formats to download from the website and ftp server. The data will also be included in SRS, BLAST and FASTA search engines at the European Bioinformatics Institute.

  6. Peace Corps Aquaculture Training Manual. Training Manual T0057.

    ERIC Educational Resources Information Center

    Peace Corps, Washington, DC. Information Collection and Exchange Div.

    This Peace Corps training manual was developed from two existing manuals to provide a comprehensive training program in fish production for Peace Corps volunteers. The manual encompasses the essential elements of the University of Oklahoma program that has been training volunteers in aquaculture for 25 years. The 22 chapters of the manual are…

  7. Interactive Office user's manual

    NASA Technical Reports Server (NTRS)

    Montgomery, Edward E.; Lowers, Benjamin; Nabors, Terri L.

    1990-01-01

    Given here is a user's manual for Interactive Office (IO), an executive office tool for organization and planning, written specifically for Macintosh. IO is a paperless management tool to automate a related group of individuals into one productive system.

  8. Geochemical engineering reference manual

    SciTech Connect

    Owen, L.B.; Michels, D.E.

    1984-01-01

    The following topics are included in this manual: physical and chemical properties of geothermal brine and steam, scale and solids control, processing spent brine for reinjection, control of noncondensable gas emissions, and goethermal mineral recovery. (MHR)

  9. Case Resolution Manual

    EPA Pesticide Factsheets

    This Case Resolution Manual (CRM) is intended to provide procedural guidance to ECRCO case managers to ensure EPA’s prompt, effective, and efficient resolution of civil rights cases consistent with science and the civil rights laws.

  10. MARSAME Manual and Resources

    EPA Pesticide Factsheets

    MARSAME provides technical information on survey approaches to determine proper disposition of materials and equipment (M&E). MARSAME is a supplement to the Multi-Agency Radiation Survey and Site Investigation Manual (MARSSIM).

  11. AromaDeg, a novel database for phylogenomics of aerobic bacterial degradation of aromatics.

    PubMed

    Duarte, Márcia; Jauregui, Ruy; Vilchez-Vargas, Ramiro; Junca, Howard; Pieper, Dietmar H

    2014-01-01

    Understanding prokaryotic transformation of recalcitrant pollutants and the in-situ metabolic nets require the integration of massive amounts of biological data. Decades of biochemical studies together with novel next-generation sequencing data have exponentially increased information on aerobic aromatic degradation pathways. However, the majority of protein sequences in public databases have not been experimentally characterized and homology-based methods are still the most routinely used approach to assign protein function, allowing the propagation of misannotations. AromaDeg is a web-based resource targeting aerobic degradation of aromatics that comprises recently updated (September 2013) and manually curated databases constructed based on a phylogenomic approach. Grounded in phylogenetic analyses of protein sequences of key catabolic protein families and of proteins of documented function, AromaDeg allows query and data mining of novel genomic, metagenomic or metatranscriptomic data sets. Essentially, each query sequence that match a given protein family of AromaDeg is associated to a specific cluster of a given phylogenetic tree and further function annotation and/or substrate specificity may be inferred from the neighboring cluster members with experimentally validated function. This allows a detailed characterization of individual protein superfamilies as well as high-throughput functional classifications. Thus, AromaDeg addresses the deficiencies of homology-based protein function prediction, combining phylogenetic tree construction and integration of experimental data to obtain more accurate annotations of new biological data related to aerobic aromatic biodegradation pathways. We pursue in future the expansion of AromaDeg to other enzyme families involved in aromatic degradation and its regular update. Database URL: http://aromadeg.siona.helmholtz-hzi.de

  12. AromaDeg, a novel database for phylogenomics of aerobic bacterial degradation of aromatics

    PubMed Central

    Duarte, Márcia; Jauregui, Ruy; Vilchez-Vargas, Ramiro; Junca, Howard; Pieper, Dietmar H.

    2014-01-01

    Understanding prokaryotic transformation of recalcitrant pollutants and the in-situ metabolic nets require the integration of massive amounts of biological data. Decades of biochemical studies together with novel next-generation sequencing data have exponentially increased information on aerobic aromatic degradation pathways. However, the majority of protein sequences in public databases have not been experimentally characterized and homology-based methods are still the most routinely used approach to assign protein function, allowing the propagation of misannotations. AromaDeg is a web-based resource targeting aerobic degradation of aromatics that comprises recently updated (September 2013) and manually curated databases constructed based on a phylogenomic approach. Grounded in phylogenetic analyses of protein sequences of key catabolic protein families and of proteins of documented function, AromaDeg allows query and data mining of novel genomic, metagenomic or metatranscriptomic data sets. Essentially, each query sequence that match a given protein family of AromaDeg is associated to a specific cluster of a given phylogenetic tree and further function annotation and/or substrate specificity may be inferred from the neighboring cluster members with experimentally validated function. This allows a detailed characterization of individual protein superfamilies as well as high-throughput functional classifications. Thus, AromaDeg addresses the deficiencies of homology-based protein function prediction, combining phylogenetic tree construction and integration of experimental data to obtain more accurate annotations of new biological data related to aerobic aromatic biodegradation pathways. We pursue in future the expansion of AromaDeg to other enzyme families involved in aromatic degradation and its regular update. Database URL: http://aromadeg.siona.helmholtz-hzi.de PMID:25468931

  13. DiseaseMeth version 2.0: a major expansion and update of the human disease methylation database

    PubMed Central

    Xiong, Yichun; Wei, Yanjun; Gu, Yue; Zhang, Shumei; Lyu, Jie; Zhang, Bin; Chen, Chuangeng; Zhu, Jiang; Wang, Yihan; Liu, Hongbo; Zhang, Yan

    2017-01-01

    The human disease methylation database (DiseaseMeth, http://bioinfo.hrbmu.edu.cn/diseasemeth/) is an interactive database that aims to present the most complete collection and annotation of aberrant DNA methylation in human diseases, especially various cancers. Recently, the high-throughput microarray and sequencing technologies have promoted the production of methylome data that contain comprehensive knowledge of human diseases. In this DiseaseMeth update, we have increased the number of samples from 3610 to 32 701, the number of diseases from 72 to 88 and the disease–gene associations from 216 201 to 679 602. DiseaseMeth version 2.0 provides an expanded comprehensive list of disease–gene associations based on manual curation from experimental studies and computational identification from high-throughput methylome data. Besides the data expansion, we also updated the search engine and visualization tools. In particular, we enhanced the differential analysis tools, which now enable online automated identification of DNA methylation abnormalities in human disease in a case-control or disease–disease manner. To facilitate further mining of the disease methylome, three new web tools were developed for cluster analysis, functional annotation and survival analysis. DiseaseMeth version 2.0 should be a useful resource platform for further understanding the molecular mechanisms of human diseases. PMID:27899673

  14. SubtiWiki 2.0—an integrated database for the model organism Bacillus subtilis

    PubMed Central

    Michna, Raphael H.; Zhu, Bingyao; Mäder, Ulrike; Stülke, Jörg

    2016-01-01

    To understand living cells, we need knowledge of each of their parts as well as about the interactions of these parts. To gain rapid and comprehensive access to this information, annotation databases are required. Here, we present SubtiWiki 2.0, the integrated database for the model bacterium Bacillus subtilis (http://subtiwiki.uni-goettingen.de/). SubtiWiki provides text-based access to published information about the genes and proteins of B. subtilis as well as presentations of metabolic and regulatory pathways. Moreover, manually curated protein-protein interactions diagrams are linked to the protein pages. Finally, expression data are shown with respect to gene expression under 104 different conditions as well as absolute protein quantification for cytoplasmic proteins. To facilitate the mobile use of SubtiWiki, we have now expanded it by Apps that are available for iOS and Android devices. Importantly, the App allows to link private notes and pictures to the gene/protein pages. Today, SubtiWiki has become one of the most complete collections of knowledge on a living organism in one single resource. PMID:26433225

  15. Improving the Acquisition and Management of Sample Curation Data

    NASA Technical Reports Server (NTRS)

    Todd, Nancy S.; Evans, Cindy A.; Labasse, Dan

    2011-01-01

    This paper discusses the current sample documentation processes used during and after a mission, examines the challenges and special considerations needed for designing effective sample curation data systems, and looks at the results of a simulated sample result mission and the lessons learned from this simulation. In addition, it introduces a new data architecture for an integrated sample Curation data system being implemented at the NASA Astromaterials Acquisition and Curation department and discusses how it improves on existing data management systems.

  16. Research resources: curating the new eagle-i discovery system

    PubMed Central

    Vasilevsky, Nicole; Johnson, Tenille; Corday, Karen; Torniai, Carlo; Brush, Matthew; Segerdell, Erik; Wilson, Melanie; Shaffer, Chris; Robinson, David; Haendel, Melissa

    2012-01-01

    Development of biocuration processes and guidelines for new data types or projects is a challenging task. Each project finds its way toward defining annotation standards and ensuring data consistency with varying degrees of planning and different tools to support and/or report on consistency. Further, this process may be data type specific even within the context of a single project. This article describes our experiences with eagle-i, a 2-year pilot project to develop a federated network of data repositories in which unpublished, unshared or otherwise ‘invisible’ scientific resources could be inventoried and made accessible to the scientific community. During the course of eagle-i development, the main challenges we experienced related to the difficulty of collecting and curating data while the system and the data model were simultaneously built, and a deficiency and diversity of data management strategies in the laboratories from which the source data was obtained. We discuss our approach to biocuration and the importance of improving information management strategies to the research process, specifically with regard to the inventorying and usage of research resources. Finally, we highlight the commonalities and differences between eagle-i and similar efforts with the hope that our lessons learned will assist other biocuration endeavors. Database URL: www.eagle-i.net PMID:22434835

  17. Crowdsourcing and curation: perspectives from biology and natural language processing

    PubMed Central

    Hirschman, Lynette; Fort, Karën; Boué, Stéphanie; Kyrpides, Nikos; Islamaj Doğan, Rezarta; Cohen, Kevin Bretonnel

    2016-01-01

    Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging ‘the crowd’; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9–11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing. Database URL: http://www.mitre.org/publications/technical-papers/crowdsourcing-and-curation-perspectives PMID:27504010

  18. Italian Rett database and biobank.

    PubMed

    Sampieri, Katia; Meloni, Ilaria; Scala, Elisa; Ariani, Francesca; Caselli, Rossella; Pescucci, Chiara; Longo, Ilaria; Artuso, Rosangela; Bruttini, Mirella; Mencarelli, Maria Antonietta; Speciale, Caterina; Causarano, Vincenza; Hayek, Giuseppe; Zappella, Michele; Renieri, Alessandra; Mari, Francesca

    2007-04-01

    Rett syndrome is the second most common cause of severe mental retardation in females, with an incidence of approximately 1 out of 10,000 live female births. In addition to the classic form, a number of Rett variants have been described. MECP2 gene mutations are responsible for about 90% of classic cases and for a lower percentage of variant cases. Recently, CDKL5 mutations have been identified in the early onset seizures variant and other atypical Rett patients. While the high percentage of MECP2 mutations in classic patients supports the hypothesis of a single disease gene, the low frequency of mutated variant cases suggests genetic heterogeneity. Since 1998, we have performed clinical evaluation and molecular analysis of a large number of Italian Rett patients. The Italian Rett Syndrome (RTT) database has been developed to share data and samples of our RTT collection with the scientific community (http://www.biobank.unisi.it). This is the first RTT database that has been connected with a biobank. It allows the user to immediately visualize the list of available RTT samples and, using the "Search by" tool, to rapidly select those with specific clinical and molecular features. By contacting bank curators, users can request the samples of interest for their studies. This database encourages collaboration projects with clinicians and researchers from around the world and provides important resources that will help to better define the pathogenic mechanisms underlying Rett syndrome.

  19. FreeSolv: A database of experimental and calculated hydration free energies, with input files

    PubMed Central

    Mobley, David L.; Guthrie, J. Peter

    2014-01-01

    This work provides a curated database of experimental and calculated hydration free energies for small neutral molecules in water, along with molecular structures, input files, references, and annotations. We call this the Free Solvation Database, or FreeSolv. Experimental values were taken from prior literature and will continue to be curated, with updated experimental references and data added as they become available. Calculated values are based on alchemical free energy calculations using molecular dynamics simulations. These used the GAFF small molecule force field in TIP3P water with AM1-BCC charges. Values were calculated with the GROMACS simulation package, with full details given in references cited within the database itself. This database builds in part on a previous, 504-molecule database containing similar information. However, additional curation of both experimental data and calculated values has been done here, and the total number of molecules is now up to 643. Additional information is now included in the database, such as SMILES strings, PubChem compound IDs, accurate reference DOIs, and others. One version of the database is provided in the Supporting Information of this article, but as ongoing updates are envisioned, the database is now versioned and hosted online. In addition to providing the database, this work describes its construction process. The database is available free-of-charge via http://www.escholarship.org/uc/item/6sd403pz. PMID:24928188

  20. FreeSolv: a database of experimental and calculated hydration free energies, with input files.

    PubMed

    Mobley, David L; Guthrie, J Peter

    2014-07-01

    This work provides a curated database of experimental and calculated hydration free energies for small neutral molecules in water, along with molecular structures, input files, references, and annotations. We call this the Free Solvation Database, or FreeSolv. Experimental values were taken from prior literature and will continue to be curated, with updated experimental references and data added as they become available. Calculated values are based on alchemical free energy calculations using molecular dynamics simulations. These used the GAFF small molecule force field in TIP3P water with AM1-BCC charges. Values were calculated with the GROMACS simulation package, with full details given in references cited within the database itself. This database builds in part on a previous, 504-molecule database containing similar information. However, additional curation of both experimental data and calculated values has been done here, and the total number of molecules is now up to 643. Additional information is now included in the database, such as SMILES strings, PubChem compound IDs, accurate reference DOIs, and others. One version of the database is provided in the Supporting Information of this article, but as ongoing updates are envisioned, the database is now versioned and hosted online. In addition to providing the database, this work describes its construction process. The database is available free-of-charge via http://www.escholarship.org/uc/item/6sd403pz .

  1. Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing

    PubMed Central

    Burger, John D.; Doughty, Emily; Khare, Ritu; Wei, Chih-Hsuan; Mishra, Rajashree; Aberdeen, John; Tresner-Kirsch, David; Wellner, Ben; Kann, Maricel G.; Lu, Zhiyong; Hirschman, Lynette

    2014-01-01

    Background: This article describes capture of biological information using a hybrid approach that combines natural language processing to extract biological entities and crowdsourcing with annotators recruited via Amazon Mechanical Turk to judge correctness of candidate biological relations. These techniques were applied to extract gene– mutation relations from biomedical abstracts with the goal of supporting production scale capture of gene–mutation–disease findings as an open source resource for personalized medicine. Results: The hybrid system could be configured to provide good performance for gene–mutation extraction (precision ∼82%; recall ∼70% against an expert-generated gold standard) at a cost of $0.76 per abstract. This demonstrates that crowd labor platforms such as Amazon Mechanical Turk can be used to recruit quality annotators, even in an application requiring subject matter expertise; aggregated Turker judgments for gene–mutation relations exceeded 90% accuracy. Over half of the precision errors were due to mismatches against the gold standard hidden from annotator view (e.g. incorrect EntrezGene identifier or incorrect mutation position extracted), or incomplete task instructions (e.g. the need to exclude nonhuman mutations). Conclusions: The hybrid curation model provides a readily scalable cost-effective approach to curation, particularly if coupled with expert human review to filter precision errors. We plan to generalize the framework and make it available as open source software. Database URL: http://www.mitre.org/publications/technical-papers/hybrid-curation-of-gene-mutation-relations-combining-automated PMID:25246425

  2. 3DSwap: curated knowledgebase of proteins involved in 3D domain swapping.

    PubMed

    Shameer, Khader; Shingate, Prashant N; Manjunath, S C P; Karthika, M; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

    2011-01-01

    Three-dimensional domain swapping is a unique protein structural phenomenon where two or more protein chains in a protein oligomer share a common structural segment between individual chains. This phenomenon is observed in an array of protein structures in oligomeric conformation. Protein structures in swapped conformations perform diverse functional roles and are also associated with deposition diseases in humans. We have performed in-depth literature curation and structural bioinformatics analyses to develop an integrated knowledgebase of proteins involved in 3D domain swapping. The hallmark of 3D domain swapping is the presence of distinct structural segments such as the hinge and swapped regions. We have curated the literature to delineate the boundaries of these regions. In addition, we have defined several new concepts like 'secondary major interface' to represent the interface properties arising as a result of 3D domain swapping, and a new quantitative measure for the 'extent of swapping' in structures. The catalog of proteins reported in 3DSwap knowledgebase has been generated using an integrated structural bioinformatics workflow of database searches, literature curation, by structure visualization and sequence-structure-function analyses. The current version of the 3DSwap knowledgebase reports 293 protein structures, the analysis of such a compendium of protein structures will further the understanding molecular factors driving 3D domain swapping.

  3. SHARP User Manual

    SciTech Connect

    Yu, Y. Q.; Shemon, E. R.; Thomas, J. W.; Mahadevan, Vijay S.; Rahaman, Ronald O.; Solberg, Jerome

    2016-03-31

    SHARP is an advanced modeling and simulation toolkit for the analysis of nuclear reactors. It is comprised of several components including physical modeling tools, tools to integrate the physics codes for multi-physics analyses, and a set of tools to couple the codes within the MOAB framework. Physics modules currently include the neutronics code PROTEUS, the thermal-hydraulics code Nek5000, and the structural mechanics code Diablo. This manual focuses on performing multi-physics calculations with the SHARP ToolKit. Manuals for the three individual physics modules are available with the SHARP distribution to help the user to either carry out the primary multi-physics calculation with basic knowledge or perform further advanced development with in-depth knowledge of these codes. This manual provides step-by-step instructions on employing SHARP, including how to download and install the code, how to build the drivers for a test case, how to perform a calculation and how to visualize the results. Since SHARP has some specific library and environment dependencies, it is highly recommended that the user read this manual prior to installing SHARP. Verification tests cases are included to check proper installation of each module. It is suggested that the new user should first follow the step-by-step instructions provided for a test problem in this manual to understand the basic procedure of using SHARP before using SHARP for his/her own analysis. Both reference output and scripts are provided along with the test cases in order to verify correct installation and execution of the SHARP package. At the end of this manual, detailed instructions are provided on how to create a new test case so that user can perform novel multi-physics calculations with SHARP. Frequently asked questions are listed at the end of this manual to help the user to troubleshoot issues.

  4. OCDB: a database collecting genes, miRNAs and drugs for obsessive-compulsive disorder.

    PubMed

    Privitera, Anna P; Distefano, Rosario; Wefer, Hugo A; Ferro, Alfredo; Pulvirenti, Alfredo; Giugno, Rosalba

    2015-01-01

    Obsessive-compulsive disorder (OCD) is a psychiatric condition characterized by intrusive and unwilling thoughts (obsessions) giving rise to anxiety. The patients feel obliged to perform a behavior (compulsions) induced by the obsessions. The World Health Organization ranks OCD as one of the 10 most disabling medical conditions. In the class of Anxiety Disorders, OCD is a pathology that shows an hereditary component. Consequently, an online resource collecting and integrating scientific discoveries and genetic evidence about OCD would be helpful to improve the current knowledge on this disorder. We have developed a manually curated database, OCD Database (OCDB), collecting the relations between candidate genes in OCD, microRNAs (miRNAs) involved in the pathophysiology of OCD and drugs used in its treatments. We have screened articles from PubMed and MEDLINE. For each gene, the bibliographic references with a brief description of the gene and the experimental conditions are shown. The database also lists the polymorphisms within genes and its chromosomal regions. OCDB data is enriched with both validated and predicted miRNA-target and drug-target information. The transcription factors regulations, which are also included, are taken from David and TransmiR. Moreover, a scoring function ranks the relevance of data in the OCDB context. The database is also integrated with the main online resources (PubMed, Entrez-gene, HGNC, dbSNP, DrugBank, miRBase, PubChem, Kegg, Disease-ontology and ChEBI). The web interface has been developed using phpMyAdmin and Bootstrap software. This allows (i) to browse data by category and (ii) to navigate in the database by searching genes, miRNAs, drugs, SNPs, regions, drug targets and articles. The data can be exported in textual format as well as the whole database in.sql or tabular format. OCDB is an essential resource to support genome-wide analysis, genetic and pharmacological studies. It also facilitates the evaluation of genetic data

  5. Standardization in generating and reporting genetically modified rodent malaria parasites: the RMgmDB database.

    PubMed

    Khan, Shahid M; Kroeze, Hans; Franke-Fayard, Blandine; Janse, Chris J

    2013-01-01

    Genetically modified Plasmodium parasites are central gene function reagents in malaria research. The Rodent Malaria genetically modified DataBase (RMgmDB) ( www.pberghei.eu ) is a manually curated Web - based repository that contains information on genetically modified rodent malaria parasites. It provides easy and rapid access to information on the genotype and phenotype of genetically modified mutant and reporter parasites. Here, we provide guidelines for generating and describing rodent malaria parasite mutants. Standardization in describing mutant genotypes and phenotypes is important not only to enhance publication quality but also to facilitate cross-linking and mining data from multiple sources, and should permit information derived from mutant parasites to be used in integrative system biology approaches. We also provide guidelines on how to submit information to RMgmDB on non-published mutants, mutants that do not exhibit a clear phenotype, as well as negative attempts to disrupt/mutate genes. Such information helps to prevent unnecessary duplication of experiments in different laboratories, and can provide indirect evidence that these genes are essential for blood-stage development.

  6. Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements

    PubMed Central

    Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon; Ovchinnikova, Galina; Verezemska, Olena; Isbandi, Michelle; Thomas, Alex D.; Ali, Rida; Sharma, Kaushal; Kyrpides, Nikos C.; Reddy, T. B. K.

    2017-01-01

    The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years. PMID:27794040

  7. Evolutionary Characters, Phenotypes and Ontologies: Curating Data from the Systematic Biology Literature

    PubMed Central

    Dahdul, Wasila M.; Balhoff, James P.; Engeman, Jeffrey; Grande, Terry; Hilton, Eric J.; Kothari, Cartik; Lapp, Hilmar; Lundberg, John G.; Midford, Peter E.; Vision, Todd J.; Westerfield, Monte; Mabee, Paula M.

    2010-01-01

    Background The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. Methodology/Principal Findings We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. Conclusions/Significance The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes

  8. Atomic Databases

    NASA Astrophysics Data System (ADS)

    Mendoza, Claudio

    2000-10-01

    Atomic and molecular data are required in a variety of fields ranging from the traditional astronomy, atmospherics and fusion research to fast growing technologies such as lasers, lighting, low-temperature plasmas, plasma assisted etching and radiotherapy. In this context, there are some research groups, both theoretical and experimental, scattered round the world that attend to most of this data demand, but the implementation of atomic databases has grown independently out of sheer necessity. In some cases the latter has been associated with the data production process or with data centers involved in data collection and evaluation; but sometimes it has been the result of individual initiatives that have been quite successful. In any case, the development and maintenance of atomic databases call for a number of skills and an entrepreneurial spirit that are not usually associated with most physics researchers. In the present report we present some of the highlights in this area in the past five years and discuss what we think are some of the main issues that have to be addressed.

  9. Field Procedures Manual: Shady Rest, California, and Sulphur Springs, New Mexico

    SciTech Connect

    Goff, S.

    1988-12-01

    This Field Procedures Manual is the comprehensive operations guide to be used to curate samples obtained from the Shady Rest site at Mammoth, California, and the Sulphur Springs site at Valles caldera, New Mexico (VC-2A). These sites are diamond drilling projects in small-diameter holes that will produce continuous core. Fluid samples will also be of primary importance at both of these sites. Detailed core and fluid handling procedures are therefore the major focus of this manual. The manual provides a comprehensive operations guide for the well-site geoscientists working at the Department of Energy/Office of Basic Energy Science (DOE/OBES) Continental Scientific Drilling Program (CSDP)/Thermal Regimes drill sites. These procedures modify and improve those in previous DOE/OBES field manuals. 1 ref., 7 figs.

  10. Global Identification of Protein Post-translational Modifications in a Single-Pass Database Search.

    PubMed

    Shortreed, Michael R; Wenger, Craig D; Frey, Brian L; Sheynkman, Gloria M; Scalf, Mark; Keller, Mark P; Attie, Alan D; Smith, Lloyd M

    2015-11-06

    Bottom-up proteomics database search algorithms used for peptide identification cannot comprehensively identify post-translational modifications (PTMs) in a single-pass because of high false discovery rates (FDRs). A new approach to database searching enables global PTM (G-PTM) identification by exclusively looking for curated PTMs, thereby avoiding the FDR penalty experienced during conventional variable modification searches. We identified over 2200 unique, high-confidence modified peptides comprising 26 different PTM types in a single-pass database search.

  11. KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    PubMed Central

    2010-01-01

    Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. Description Here we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM) from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched. The results were stored in a database and are available as "KID the KInetic Database" via the internet. Conclusions The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public Licence and available on

  12. Maintenance Manual for NATICK’s Footwear Database

    DTIC Science & Technology

    1992-01-01

    radiculopathy; metatarsal stress fracture (calcaneal stress fractures are rare); tarsal tunnel syndrome ; gout; Reiter’s disease; ankylosing... syndrome plantar fascia crush injury foot injury BIOMECHANICS foot powder/antiperspirants barefoot adaptation fracture biomechanics friction blister...electromyography frost bite energy transfer iliotibial band syndrome load injury modelling injury rates sensory attenuation knee injury lower extremity

  13. User Manual for NATICK’s Footwear Database

    DTIC Science & Technology

    1992-01-01

    are rare); tarsal tunnel syndrome ; gout; Reiter’s disease; ankylosing spondylitis; psoriatic arthropathy; and rheumatoid arthritis. Page 2 Figure 37b...chondromalacia lower extremity morphology compartment syndrome plantar fascia crush injury foot injury BIOMECNANICS foot powder/antiperspirants...barefoot adaptation fracture biomechanics friction blister electromyography frost bite energy transfer iliotibial band syndrome load injury modelling injury

  14. Salinas : theory manual.

    SciTech Connect

    Walsh, Timothy Francis; Reese, Garth M.; Bhardwaj, Manoj Kumar

    2011-11-01

    Salinas provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of structural systems. This manual describes the theory behind many of the constructs in Salinas. For a more detailed description of how to use Salinas, we refer the reader to Salinas, User's Notes. Many of the constructs in Salinas are pulled directly from published material. Where possible, these materials are referenced herein. However, certain functions in Salinas are specific to our implementation. We try to be far more complete in those areas. The theory manual was developed from several sources including general notes, a programmer notes manual, the user's notes and of course the material in the open literature.

  15. Aircraft operations management manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    The NASA aircraft operations program is a multifaceted, highly diverse entity that directly supports the agency mission in aeronautical research and development, space science and applications, space flight, astronaut readiness training, and related activities through research and development, program support, and mission management aircraft operations flights. Users of the program are interagency, inter-government, international, and the business community. This manual provides guidelines to establish policy for the management of NASA aircraft resources, aircraft operations, and related matters. This policy is an integral part of and must be followed when establishing field installation policy and procedures covering the management of NASA aircraft operations. Each operating location will develop appropriate local procedures that conform with the requirements of this handbook. This manual should be used in conjunction with other governing instructions, handbooks, and manuals.

  16. Nuclear material operations manual

    SciTech Connect

    Tyler, R.P.

    1981-02-01

    This manual provides a concise and comprehensive documentation of the operating procedures currently practiced at Sandia National Laboratories with regard to the management, control, and accountability of nuclear materials. The manual is divided into chapters which are devoted to the separate functions performed in nuclear material operations-management, control, accountability, and safeguards, and the final two chapters comprise a document which is also issued separately to provide a summary of the information and operating procedures relevant to custodians and users of radioactive and nuclear materials. The manual also contains samples of the forms utilized in carrying out nuclear material activities. To enhance the clarity of presentation, operating procedures are presented in the form of playscripts in which the responsible organizations and necessary actions are clearly delineated in a chronological fashion from the initiation of a transaction to its completion.

  17. Developing a policy manual.

    PubMed

    Hotta, Tracey A

    2013-01-01

    Do you really need to have a policy and procedure in the office? Frequently they are seen sitting on the shelf, collecting dust. The answer is yes for a number of very important reasons. A policy and procedure manual is a tool to set guidelines and expectations on the basis of the mission and vision of the office. A well-written manual is a powerful training tool for new staff so they can get a feel for the office culture. Furthermore, it is a provincial or state legislative requirement that can reduce management's concern about potential legal issues or problems. If an office does not have a manual to set guidelines, the employees may be forced to make their own decisions to solve problems, which can often result in confusion, inconsistencies, and mistakes.

  18. The Role of Preoperative TIPSS to Facilitate Curative Gastric Surgery

    SciTech Connect

    Norton, S.A.; Vickers, J.; Callaway, M.P. Alderson, D.

    2003-08-15

    The use of TIPSS to facilitate radical curative upper gastrointestinal surgery has not been reported. We describe a case in which curative gastric resection was performed for carcinoma of the stomach after a preoperative TIPSS and embolization of a large gastric varix in a patient with portal hypertension.

  19. RELAP-7 Theory Manual

    SciTech Connect

    Berry, Ray Alden; Zou, Ling; Zhao, Haihua; Zhang, Hongbin; Peterson, John William; Martineau, Richard Charles; Kadioglu, Samet Yucel; Andrs, David

    2016-03-01

    This document summarizes the physical models and mathematical formulations used in the RELAP-7 code. In summary, the MOOSE based RELAP-7 code development is an ongoing effort. The MOOSE framework enables rapid development of the RELAP-7 code. The developmental efforts and results demonstrate that the RELAP-7 project is on a path to success. This theory manual documents the main features implemented into the RELAP-7 code. Because the code is an ongoing development effort, this RELAP-7 Theory Manual will evolve with periodic updates to keep it current with the state of the development, implementation, and model additions/revisions.

  20. Manuals of Cultural Systems

    NASA Astrophysics Data System (ADS)

    Ballonoff, Paul

    2014-10-01

    Ethnography often studies social networks including empirical descriptions of marriages and families. We initially concentrate on a special subset of networks which we call configurations. We show that descriptions of the possible outcomes of viable histories form a manual, and an orthoalgebra. We then study cases where family sizes vary, and show that this also forms a manual. In fact, it demonstrates adiabatic invariance, a property often associated with physical system conservation laws, and which here expresses conservation of the viability of a cultural system.

  1. Equipment Management Manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    The NASA Equipment Management Manual (NHB 4200.1) is issued pursuant to Section 203(c)(1) of the National Aeronautics and Space Act of 1958, as amended (42 USC 2473), and sets forth policy, uniform performance standards, and procedural guidance to NASA personnel for the acquisition, management, and use of NASA-owned equipment. This revision is effective upon receipt. This is a controlled manual, issued in loose-leaf form, and revised through page changes. Additional copies for internal use may be obtained through normal distribution.

  2. Fastener Design Manual

    NASA Technical Reports Server (NTRS)

    Barrett, Richard T.

    1990-01-01

    This manual was written for design engineers to enable them to choose appropriate fasteners for their designs. Subject matter includes fastener material selection, platings, lubricants, corrosion, locking methods, washers, inserts, thread types and classes, fatigue loading, and fastener torque. A section on design criteria covers the derivation of torque formulas, loads on a fastener group, combining simultaneous shear and tension loads, pullout load for tapped holes, grip length, head styles, and fastener strengths. The second half of this manual presents general guidelines and selection criteria for rivets and lockbolts.

  3. Brain Tumor Database, a free relational database for collection and analysis of brain tumor patient information.

    PubMed

    Bergamino, Maurizio; Hamilton, David J; Castelletti, Lara; Barletta, Laura; Castellan, Lucio

    2015-03-01

    In this study, we describe the development and utilization of a relational database designed to manage the clinical and radiological data of patients with brain tumors. The Brain Tumor Database was implemented using MySQL v.5.0, while the graphical user interface was created using PHP and HTML, thus making it easily accessible through a web browser. This web-based approach allows for multiple institutions to potentially access the database. The BT Database can record brain tumor patient information (e.g. clinical features, anatomical attributes, and radiological characteristics) and be used for clinical and research purposes. Analytic tools to automatically generate statistics and different plots are provided. The BT Database is a free and powerful user-friendly tool with a wide range of possible clinical and research applications in neurology and neurosurgery. The BT Database graphical user interface source code and manual are freely available at http://tumorsdatabase.altervista.org.

  4. Sample Curation at a Lunar Outpost

    NASA Technical Reports Server (NTRS)

    Allen, Carlton C.; Lofgren, Gary E.; Treiman, A. H.; Lindstrom, Marilyn L.

    2007-01-01

    The six Apollo surface missions returned 2,196 individual rock and soil samples, with a total mass of 381.6 kg. Samples were collected based on visual examination by the astronauts and consultation with geologists in the science back room in Houston. The samples were photographed during collection, packaged in uniquely-identified containers, and transported to the Lunar Module. All samples collected on the Moon were returned to Earth. NASA's upcoming return to the Moon will be different. Astronauts will have extended stays at an out-post and will collect more samples than they will return. They will need curation and analysis facilities on the Moon in order to carefully select samples for return to Earth.

  5. Stackfile Database

    NASA Technical Reports Server (NTRS)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  6. Hayabusa-returned sample curation in the Planetary Material Sample Curation Facility of JAXA

    NASA Astrophysics Data System (ADS)

    Yada, Toru; Fujimura, Akio; Abe, Masanao; Nakamura, Tomoki; Noguchi, Takaaki; Okazaki, Ryuji; Nagao, Keisuke; Ishibashi, Yukihiro; Shirai, Kei; Zolensky, Michael E.; Sandford, Scott; Okada, Tatsuaki; Uesugi, Masayuki; Karouji, Yuzuru; Ogawa, Maho; Yakame, Shogo; Ueno, Munetaka; Mukai, Toshifumi; Yoshikawa, Makoto; Kawaguchi, Junichiro

    2014-02-01

    Abstract- The Planetary Material Sample Curation Facility of JAXA (PMSCF/JAXA) was established in Sagamihara, Kanagawa, Japan, to curate planetary material samples returned from space in conditions of minimum terrestrial contaminants. The performances for the curation of Hayabusa-returned samples had been checked with a series of comprehensive tests and rehearsals. After the Hayabusa spacecraft had accomplished a round-trip flight to asteroid 25143 Itokawa and returned its reentry capsule to the Earth in June 2010, the reentry capsule was brought back to the PMSCF/JAXA and was put to a series of processes to extract recovered samples from Itokawa. The particles recovered from the sample catcher were analyzed by electron microscope, given their ID, grouped into four categories, and preserved in dimples on quartz slide glasses. Some fraction of them has been distributed for initial analyses at NASA, and will be distributed for international announcement of opportunity (AO), but a certain fraction of them will be preserved in vacuum for future analyses.

  7. IPD—the Immuno Polymorphism Database

    PubMed Central

    Robinson, James; Halliwell, Jason A.; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G. E.

    2013-01-01

    The Immuno Polymorphism Database (IPD), http://www.ebi.ac.uk/ipd/ is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer-cell immunoglobulin-like receptors, IPD-MHC, a database of sequences of the major histocompatibility complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTDAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The data is currently available online from the website and FTP directory. This article describes the latest updates and additional tools added to the IPD project. PMID:23180793

  8. Turkmen Language Manual.

    ERIC Educational Resources Information Center

    Tyson, David; Clark, Larry

    The manual of standard Turkmen language is designed to teach basic language skills that Peace Corps volunteers need during a tour in Turkmenistan. An introductory section gives information about the Turkmen language, including a brief history, notes on the alphabet, vowel and consonant sounds, rules of vowel harmony, and specific grammatical forms…

  9. NASCAP user's manual, 1978

    NASA Technical Reports Server (NTRS)

    Cassidy, J. J., III

    1978-01-01

    NASCAP simulates the charging process for a complex object in either tenuous plasma (geosynchronous orbit) or ground test (electron gun source) environment. Program control words, the structure of user input files, and various user options available are described in this computer programmer's user manual.

  10. Matematica 3. Manual.

    ERIC Educational Resources Information Center

    D'Alu, Maria Jose Miranda de Sousa

    This teachers manual accompanies a mathematics textbook, written in Portuguese, for third graders. It closely follows the objectives and methodology of the major curricula used throughout the schools in the United States. The 11 chapters deal with: numeration (0-999,999); addition with and without regrouping; subtraction with and without…

  11. Outdoor Education Manual.

    ERIC Educational Resources Information Center

    Gooyers, Cobina; And Others

    Designed for teachers to provide students with an awareness of the world of nature which surrounds them, the manual presents the philosophy of outdoor education, goals and objectives of the school program, planning for outdoor education, the Wildwood Programs, sequential program planning for students, program booking and resource list. Content…

  12. Manual for CLE Lecturers.

    ERIC Educational Resources Information Center

    Shellaberger, Donna J.

    This manual is designed to help lawyers develop the skills needed to present effective, stimulating continuing legal education (CLE) lectures. It focuses on the particular purpose and nature of CLE lecturing, relationships and interplay of personalities in CLE, commitments and constraints which lecturers should observe, program structure and…

  13. Aquatic Microbiology Laboratory Manual.

    ERIC Educational Resources Information Center

    Cooper, Robert C.; And Others

    This laboratory manual presents information and techniques dealing with aquatic microbiology as it relates to environmental health science, sanitary engineering, and environmental microbiology. The contents are divided into three categories: (1) ecological and physiological considerations; (2) public health aspects; and (3)microbiology of water…

  14. Child Welfare Policy Manual

    ERIC Educational Resources Information Center

    Administration for Children & Families, 2008

    2008-01-01

    This document conveys mandatory policies that have their basis in Federal Law and/or program regulations. It also provides interpretations of Federal Statutes and program regulations initiated by inquiries from State Child Welfare agencies or Administration for Children and Families (ACF) Regional Offices. The manual replaces the Children's…

  15. Rescue Manual. Module 8.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The eighth of 10 modules contains 6 chapters: (1) trench rescue; (2) shoring and tunneling techniques; (3) farm accident rescue; (4) wilderness search and rescue; (5) aircraft rescue; and (6) helicopter information. Key points, an…

  16. Environmental Science Laboratory Manual.

    ERIC Educational Resources Information Center

    Strobbe, Maurice A.

    The objective of this manual is to provide a set of basic analytical procedures commonly used to determine environmental quality. Procedures are designed to be used in an introductory course in environmental science and are explicit enough to allow them to be performed by both the non-science or beginning science student. Stressing ecology and…

  17. Book Repair Manual.

    ERIC Educational Resources Information Center

    Milevski, Robert J.

    1995-01-01

    This book repair manual developed for the Illinois Cooperative Conservation Program includes book structure and book problems, book repair procedures for 4 specific problems, a description of adhesive bindings, a glossary, an annotated list of 11 additional readings, book repair supplies and suppliers, and specifications for book repair kits. (LRW)

  18. General Drafting. Technical Manual.

    ERIC Educational Resources Information Center

    Department of the Army, Washington, DC.

    The manual provides instructional guidance and reference material in the principles and procedures of general drafting and constitutes the primary study text for personnel in drafting as a military occupational specialty. Included is information on drafting equipment and its use; line weights, conventions and formats; lettering; engineering charts…

  19. Operations Policy Manual, 2012

    ERIC Educational Resources Information Center

    Teacher Education Accreditation Council, 2011

    2011-01-01

    The Teacher Education Accreditation Council's (TEAC's) "Operations Policy Manual" outlines all of TEAC's current policies and procedures related to TEAC members, TEAC administration, and the public, and includes the Bylaws of the Teacher Education Accreditation Council. Contents include: (1) Policies Related to TEAC Members; (2) Policies Related…

  20. Operations Policy Manual, 2010

    ERIC Educational Resources Information Center

    Teacher Education Accreditation Council, 2010

    2010-01-01

    The Teacher Education Accreditation Council (TEAC) "Operations Policy Manual" outlines all of TEAC's current policies and procedures related to TEAC members, TEAC administration, and the public, and includes the Bylaws of the Teacher Education Accreditation Council, Inc. An index is also included.

  1. Consumer Decisions. Student Manual.

    ERIC Educational Resources Information Center

    Florida State Dept. of Education, Tallahassee. Div. of Vocational Education.

    This student manual covers five areas relating to consumer decisions. Titles of the five sections are Consumer Law, Consumer Decision Making, Buying a Car, Convenience Foods, and Books for Preschool Children. Each section may contain some or all of these materials: list of objectives, informative sections, questions on the information and answers,…

  2. System Documentation Manual.

    ERIC Educational Resources Information Center

    Semmel, Melvyn I.; Olson, Jerry

    The document is a system documentation manual of the Computer-Assisted Teacher Training System (CATTS) developed by the Center for Innovation in Teaching the Handicapped (Indiana University). CATTS is characterized as a system capable of providing continuous, instantaneous, and/or delayed feedback of relevant teacher-student interaction data to a…

  3. Small Town Renewal Manual.

    ERIC Educational Resources Information Center

    Kenyon, Peter

    Over the last 2 decades, the loss of population and businesses in many small, inland, and remote Australian rural communities has intensified, largely because of the stress and uncertainty of volatile world commodity markets. This manual presents a range of survival and revival strategies that some communities have used to build resilient…

  4. HEMPDS user's manual

    SciTech Connect

    Warren, K.H.

    1983-02-01

    HEMPDS, the double-slide version of two-dimensional HEMP, allows the intersection of slide lines and slide lines in any direction, thus making use of triangular zones. this revised user's manual aids the physicist, computer scientist, and computer technician in using, maintaining, and coverting HEMPDS. Equations, EOS models, and sample problems are included.

  5. Rescue Manual. Module 3.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The third of 10 modules contains 4 chapters: (1) forcible entry; (2) structure search and rescue; (3) rescue operations involving electricity; and (4) cutting torches. Key points, an introduction, and conclusion accompany substantive…

  6. Rescue Manual. Module 1.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The first of 10 modules contains 9 chapters: (1) introduction; (2) occupational stresses in rescue operations; (3) size-up; (4) critique; (5) reports and recordkeeping; (6) tools and equipment for rescue operations; (7) planning for…

  7. Rescue Manual. Module 6.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The sixth of 10 modules contains 4 chapters: (1) industrial rescue; (2) rescue from a confined space; (3) extrication from heavy equipment; and (4) rescue operations involving elevators. Key points, an introduction, and conclusion accompany…

  8. Rescue Manual. Module 4.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The fourth of 10 modules contains 8 chapters: (1) construction and characteristics of rescue rope; (2) knots, bends, and hitches; (3) critical angles; (4) raising systems; (5) rigging; (6) using the brake-bar rack for rope rescue; (7) rope…

  9. Rescue Manual. Module 7.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The seventh of 10 modules contains information on extrication from vehicles. Key points, an introduction, and conclusion accompany substantive material in this module. In addition, suggested tools and equipment for extrication procedures are…

  10. Rescue Manual. Module 10.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The 10th of 10 modules contains a 16-page glossary of rescue terms and 3 appendices: (1) 4 computer programs and 32 other technical assistance materials available for hazardous materials; (2) hazardous materials resources--60 phone numbers,…

  11. Rescue Manual. Module 2.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The second of 10 modules contains 5 chapters: (1) patient care and handling techniques; (2) rescue carries and drags; (3) emergency vehicle operations; (4) self-contained breathing apparatus; and (5) protective clothing. Key points, an…

  12. Rescue Manual. Module 9.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The ninth of 10 modules contains 7 chapters: (1) ice characteristics; (2) river characteristics and tactics for rescue; (3) water rescue techniques; (4) water rescue/recovery operations; (5) dive operations; (6) water rescue equipment; and…

  13. Southern Ductile Training Manual.

    ERIC Educational Resources Information Center

    Alabama State Dept. of Education, Montgomery.

    This instructor's manual contains the materials required to conduct the competency-based workplace literacy program that was developed to help employees at a foundry that has evolved from a small, family-owned business into a major foundry group with several automated production systems. The workplace literacy program consists of 24 lessons in…

  14. Consumer Education Reference Manual.

    ERIC Educational Resources Information Center

    Tennessee Univ., Knoxville. State Agency for Title I.

    This manual contains information for consumer education, which is defined as the process of imparting to an individual the skills, concepts, knowledges, and insights required to help each person evolve his or her own values, evaluate alternative choices in the marketplace, manage personal resources effectively, and obtain the best buys for his or…

  15. Child Study Manual.

    ERIC Educational Resources Information Center

    Polk County Public Schools, Bartow, FL.

    This manual from the Polk County Public Schools in Bartow, Florida, was developed for school personnel and parents to aid in their efforts to develop successful behavioral and academic interventions to enhance the educational outcomes for students. Section 1 describes the child study team which is comprised of an administrator, guidance counselor,…

  16. Regulations and Procedures Manual

    SciTech Connect

    Young, Lydia J.

    2011-07-25

    The purpose of the Regulations and Procedures Manual (RPM) is to provide LBNL personnel with a reference to University and Lawrence Berkeley National Laboratory (LBNL or Laboratory) policies and regulations by outlining normal practices and answering most policy questions that arise in the day-to-day operations of Laboratory organizations. Much of the information in this manual has been condensed from detail provided in LBNL procedure manuals, Department of Energy (DOE) directives, and Contract DE-AC02-05CH11231. This manual is not intended, however, to replace any of those documents. RPM sections on personnel apply only to employees who are not represented by unions. Personnel policies pertaining to employees represented by unions may be found in their labor agreements. Questions concerning policy interpretation should be directed to the LBNL organization responsible for the particular policy. A link to the Managers Responsible for RPM Sections is available on the RPM home page. If it is not clear which organization is responsible for a policy, please contact Requirements Manager Lydia Young or the RPM Editor.

  17. Sewer Maintenance Manual.

    ERIC Educational Resources Information Center

    Ontario Ministry of the Environment, Toronto.

    Outlined are practices and procedures that should be followed in order to protect and fully realize the benefits of sewer systems and also to maximize service and minimize inconveniences to the public. Written in practical terms, the manual is designed to be of immediate use to municipal employees and others involved in sewer maintenance…

  18. Workplace ESL Teachers Manual.

    ERIC Educational Resources Information Center

    Reyes, Andy F.

    The manual is intended for teachers in workplace English-as-a-Second-Language (ESL) programs, and contains ideas and techniques that both experienced and less experienced teachers in a wide variety of workplace ESL classes might find helpful. Sections address the following topics: (1) helpful hints for creating a successful educational environment…

  19. Rescue Manual. Module 5.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The fifth of 10 modules contains information on hazardous materials. Key points, an introduction, and conclusion accompany substantive material in this module. In addition, the module contains a Department of Transportation guide chart on…

  20. Affirmative Action Program Manual.

    ERIC Educational Resources Information Center

    Ventura County Community Coll. District, CA.

    Guidelines relating to the affirmative action program of the Ventura County Community College District are provided in this manual. Affirmative action is defined as, "A set of specific and result-oriented procedures to which a contractor commits himself/herself to apply every good faith effort. The objective of those procedures, plus such efforts,…

  1. Christmas Tree Category Manual.

    ERIC Educational Resources Information Center

    Bowman, James S.; Turmel, Jon P.

    This manual provides information needed to meet the standards for pesticide applicator certification. Pests and diseases of christmas tree plantations are identified and discussed. Section one deals with weeds and woody plants and the application, formulation and effects of herbicides in controlling them. Section two discusses specific diseases…

  2. ALA Standards Manual.

    ERIC Educational Resources Information Center

    American Library Association, Chicago, IL. Committee on Standards.

    This American Library Association (ALA) policy statement and procedure manual is intended for use in the preparation of all standards issued by ALA and its component units to insure coordination of format and correlation of content of ALA standards. A brief discussion of the purpose of standards is offered, followed by definitions of four types of…

  3. Drywall Finishing Manual.

    ERIC Educational Resources Information Center

    Lengert, Gerald

    This manual, a self-study guide for apprentices in the drywall finishing trade in British Columbia, attempts to establish standards for the trade. It tells how to produce a properly taped and filled drywall surface and describes what that surface should look like. The standards emphasize quality work that can be realistically achieved on the job.…

  4. Manual for physical fitness

    NASA Technical Reports Server (NTRS)

    Coleman, A. E.

    1981-01-01

    Training manual used for preflight conditioning of NASA astronauts is written for audience with diverse backgrounds and interests. It suggests programs for various levels of fitness, including sample starter programs, safe progression schedules, and stretching exercises. Related information on equipment needs, environmental coonsiderations, and precautions can help readers design safe and effective running programs.

  5. Disaster Preparedness Manual.

    ERIC Educational Resources Information Center

    Michael, Douglas O.

    Prepared for use by the staff of a community college library, this manual describes the natural, building, and human hazards which can result in disaster in a library and outlines a set of disaster prevention measures and salvage procedures. A list of salvage priorities, floor plans for all three levels of Bourke Memorial Library, and staff duties…

  6. Facilities Data System Manual.

    ERIC Educational Resources Information Center

    Acridge, Charles W.; Ford, Tim M.

    The purposes of this manual are to set forth the scope and procedures for the maintenance and operation of the University of California facilities Data System (FDX) and to serve as a reference document for users of the system. FDX is an information system providing planning and management data about the existing physical plant. That is, it…

  7. Water Chemistry Laboratory Manual.

    ERIC Educational Resources Information Center

    Jenkins, David; And Others

    This manual of laboratory experiments in water chemistry serves a dual function of illustrating fundamental chemical principles of dilute aqueous systems and of providing the student with some familiarity with the chemical measurements commonly used in water and wastewater analysis. Experiments are grouped in categories on the basis of similar…

  8. Bicycle Law Enforcement Manual.

    ERIC Educational Resources Information Center

    Hunter, William W.; Stutts, Jane C.

    This manual is an attempt to draw together relevant resources and information for localities interested in developing a bicycle law enforcement operation. It is divided into five major sections. Section I explains the need for and importance of bicycle law enforcement. In section II are presented examples of past and current bicycle law…

  9. Enucleation Procedure Manual.

    ERIC Educational Resources Information Center

    Davis, Kevin; Poston, George

    This manual provides information on the enucleation procedure (removal of the eyes for organ banks). An introductory section focuses on the anatomy of the eye and defines each of the parts. Diagrams of the eye are provided. A list of enucleation materials follows. Other sections present outlines of (1) a sterile procedure; (2) preparation for eye…

  10. Ohio School Design Manual.

    ERIC Educational Resources Information Center

    Ohio School Facilities Commission, Columbus.

    This manual presents guidance to facility designers, school administrators, staff, and students for the development of school facilities being constructed under Ohio's Classroom Facilities Assistance Program. It provides critical analysis of individual spaces and material/system components necessary for the construction of elementary and secondary…

  11. Therapeutic Recreation Practicum Manual.

    ERIC Educational Resources Information Center

    Schneegas, Kay

    This manual provides information on the practicum program offered by Moraine Valley Community College (MVCC) for students in its therapeutic recreation program. Sections I and II outline the rationale and goals for providing practical, on-the-job work experiences for therapeutic recreation students. Section III specifies MVCC's responsibilities…

  12. Fiscal Accounting Manual.

    ERIC Educational Resources Information Center

    California State Dept. of Housing and Community Development, Sacramento. Indian Assistance Program.

    Written in simple, easy to understand form, the manual provides a vehicle for the untrained person in bookkeeping to control funds received from grants for Indian Tribal Councils and Indian organizations. The method used to control grants (federal, state, or private) is fund accounting, designed to organize rendering services on a non-profit…

  13. The Tree Worker's Manual.

    ERIC Educational Resources Information Center

    Smithyman, S. J.

    This manual is designed to prepare students for entry-level positions as tree care professionals. Addressed in the individual chapters of the guide are the following topics: the tree service industry; clothing, eqiupment, and tools; tree workers; basic tree anatomy; techniques of pruning; procedures for climbing and working in the tree; aerial…

  14. Outdoor Education Manual.

    ERIC Educational Resources Information Center

    Ontario Teachers' Federation, Toronto.

    A guide and introduction to outdoor education for the classroom teacher, the manual lists 5 aims and objectives of outdoor education and discusses the means for reaching these objectives through field trips, camping, outdoor learning centers, and outdoor teaching in school environs. Selected activities are described by subject area: arts and…

  15. Graphics mini manual

    NASA Technical Reports Server (NTRS)

    Taylor, Nancy L.; Randall, Donald P.; Bowen, John T.; Johnson, Mary M.; Roland, Vincent R.; Matthews, Christine G.; Gates, Raymond L.; Skeens, Kristi M.; Nolf, Scott R.; Hammond, Dana P.

    1990-01-01

    The computer graphics capabilities available at the Center are introduced and their use is explained. More specifically, the manual identifies and describes the various graphics software and hardware components, details the interfaces between these components, and provides information concerning the use of these components at LaRC.

  16. Learning Resources Evaluations Manual.

    ERIC Educational Resources Information Center

    Nunes, Evelyn H., Ed.

    This Learning Resources Evaluation Manual (LREM) contains evaluations of 140 instructional products listed in the 1994 supplement to Virginia's Adult Education Curricula Resource Catalog. A table of contents lists topics/subjects and page numbers. Some titles that are useful under more than one category are cross-listed for easy reference. These…

  17. Program Evaluation: Resource Manual.

    ERIC Educational Resources Information Center

    Hayden, David L.

    Intended for administrators and evaluators, the manual identifies models useful in evaluating special education programs in Maryland. An introduction to program evaluation, defines the concept of educational program evaluation, notes its purpose, and addresses its current status in the field of special education. Chapter 2 goes into greater depth…

  18. Regulations and Procedures Manual

    SciTech Connect

    Young, Lydia

    2010-09-30

    The purpose of the Regulations and Procedures Manual (RPM) is to provide Laboratory personnel with a reference to University and Lawrence Berkeley National Laboratory policies and regulations by outlining the normal practices and answering most policy questions that arise in the day-to-day operations of Laboratory departments. Much of the information in this manual has been condensed from detail provided in Laboratory procedure manuals, Department of Energy (DOE) directives, and Contract DE-AC02-05CH11231. This manual is not intended, however, to replace any of those documents. The sections on personnel apply only to employees who are not represented by unions. Personnel policies pertaining to employees represented by unions may be found in their labor agreements. Questions concerning policy interpretation should be directed to the department responsible for the particular policy. A link to the Managers Responsible for RPM Sections is available on the RPM home page. If it is not clear which department should be called, please contact the Associate Laboratory Director of Operations.

  19. CE Programmer's Manual.

    ERIC Educational Resources Information Center

    Lund, Brishkai; McGechaen, Sandy

    Designed as a reference guide for continuing education planners, this manual provides basic information about the process of developing and delivering programs and courses. Material is organized to match job skills described in the systematic approach called DACUM--Designing a Curriculum--which reflects all skills required to be a continuing…

  20. DFLOW USER'S MANUAL

    EPA Science Inventory

    DFLOW is a computer program for estimating design stream flows for use in water quality studies. The manual describes the use of the program on both the EPA's IBM mainframe system and on a personal computer (PC). The mainframe version of DFLOW can extract a river's daily flow rec...

  1. Outdoor Education Manual.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    Creative ways to use the outdoors as a part of the regular school curriculum are outlined in this teacher's manual for the elementary grades. Presented for consideration are the general objectives of outdoor education, suggestions for evaluating outdoor education experiences, and techniques for teaching outdoor education. The purpose and functions…

  2. "Leader Attributes Inventory" Manual.

    ERIC Educational Resources Information Center

    Moss, Jerome, Jr.; And Others

    This manual, which is designed to assist potential users of the Leader Attributes Inventory (LAI) and individuals studying leadership and its measurement, presents the rationale and psychometric characteristics of the LAI and guidelines for using it. Described in chapter 1 are the context in which the LAI was developed and the conceptualization of…

  3. Manual on Occupational Analysis.

    ERIC Educational Resources Information Center

    Hermann, Graham D.

    This manual on occupational analysis, developed in Australia, is organized in three sections. The first section provides a framework for occupational analysis (OA) and a discussion of possible outputs from an OA from each of three phases: (1) determining the nature and scope of the occupational area; (2) developing a competencies list; and (3)…

  4. Manual for Youth Coordinators.

    ERIC Educational Resources Information Center

    President's Council on Youth Opportunity, Washington, DC.

    This manual was designed primarily for use by coordinators responsible for developing comprehensive community youth opportunity programs of employment, education, and recreation, but the material may also be of assistance to community and business leaders, educators, and others involved in expanding local opportunities for young people. Contents…

  5. Day Care Evaluation Manual.

    ERIC Educational Resources Information Center

    Council for Community Services in Metropolitan Chicago, IL.

    This manual presents instruments for evaluating the program and facilities of day care centers and family day care homes serving nonhandicapped children aged 3-5. Chapter 1 discusses child care evaluation in general and outlines the rationale underlying this evaluation system (including the principle that day care evaluation should assess program…

  6. Collection Assessment Manual.

    ERIC Educational Resources Information Center

    Hall, Blaine H.

    This manual is designed to help bibliographers, librarians, and other materials selectors plan and conduct systematic collection evaluations using both collection centered and client centered techniques. Topics covered in five chapters are: (1) planning the assessment; (2) collection-centered techniques, comprising the compilation of statistics,…

  7. Young Adult Services Manual.

    ERIC Educational Resources Information Center

    Boegen, Anne, Ed.

    Designed to offer guidelines, ideas and help to those who provide library service to young adults, this manual includes information about the provision of young adult (YA) services in six sections. The first section, which addresses planning and administration, includes a definition of a young adult and a checklist for determining community needs…

  8. Records Management Manual.

    ERIC Educational Resources Information Center

    Alaska State Dept. of Education, Juneau. State Archives and Records Management.

    This manual, prepared primarily for state government agencies, describes the organization and management of Alaska government records. Information is presented in nine topic areas: (1) Alaska's Archives and Records Management Program, which describes the program, its mission, services available, and employee responsibilities; (2) Records in…

  9. Manually operated coded switch

    DOEpatents

    Barnette, Jon H.

    1978-01-01

    The disclosure relates to a manually operated recodable coded switch in which a code may be inserted, tried and used to actuate a lever controlling an external device. After attempting a code, the switch's code wheels must be returned to their zero positions before another try is made.

  10. Educator Effectiveness Administrative Manual

    ERIC Educational Resources Information Center

    Pennsylvania Department of Education, 2014

    2014-01-01

    The goal of this manual is to provide guidance in the evaluation of educators, highlight critical components of effectiveness training, and offer opportunities for professional growth. The term "educator" includes teachers, all professional and temporary professional employees, education specialists, and school administrators/principals.…

  11. Cooperative Office Education Manual.

    ERIC Educational Resources Information Center

    Clemson Univ., SC. Vocational Education Media Center.

    This manual, intended for inexperienced and experienced coordinators, school administrators, and guidance personnel, is designed to provide practical suggestions for initiating, developing, operating, coordinating, improving, and evaluating cooperative office education programs. Major content is presented primarily in outline form under the…

  12. Discretionary Grants Administration Manual.

    ERIC Educational Resources Information Center

    Office of Human Development Services (DHHS), Washington, DC.

    This manual sets forth applicable administrative policies and procedures to recipients of discretionary project grants or cooperative agreements awarded by program offices in the Office of Human Development Services (HDS). It is intended to serve as a basic reference for project directors and business officers of recipient organizations who are…

  13. Radiological Defense Manual.

    ERIC Educational Resources Information Center

    Defense Civil Preparedness Agency (DOD), Washington, DC.

    Originally prepared for use as a student textbook in Radiological Defense (RADEF) courses, this manual provides the basic technical information necessary for an understanding of RADEF. It also briefly discusses the need for RADEF planning and expected postattack emergency operations. There are 14 chapters covering these major topics: introduction…

  14. Student Attendance Accounting Manual.

    ERIC Educational Resources Information Center

    Freitas, Joseph M.

    In response to state legislation authorizing procedures for changes in academic calendars and measurement of student workload in California community colleges, this manual from the Chancellor's Office provides guidelines for student attendance accounting. Chapter 1 explains general items such as the academic calendar, admissions policies, student…

  15. Volunteer Recording Program Manual.

    ERIC Educational Resources Information Center

    Arizona Braille and Talking Book Library, Phoenix.

    This manual for volunteers begins with a brief introduction to Arizona's Library for the Blind and Physically Handicapped, which is one of 56 libraries appointed by the Librarian of Congress to provide public library service to persons with visual or physical impairments. Introductory materials include explanations of the general policies and…

  16. Peer Leadership Manual.

    ERIC Educational Resources Information Center

    Lovell, Nikki; Hachmeister, Paula

    This is a manual for peer counselors and parents in an alcohol and drug abuse prevention program for teenagers. The document opens with the training objectives for the peer helpers: to know yourself, to be a resource, and to promote and establish a drug-free peer group and drug-free activities in school. Discussion on these topics is provided: (1)…

  17. Seed Treatment. Manual 92.

    ERIC Educational Resources Information Center

    Missouri Univ., Columbia. Agricultural Experiment Station.

    This training manual provides information needed to meet minimum EPA standards for certification as a commercial applicator of pesticides in the seed treatment category. The text discusses pests commonly associated with seeds; seed treatment pesticides; labels; chemicals and seed treatment equipment; requirements of federal and state seed laws;…

  18. A curated census of autophagy-modulating proteins and small molecules

    PubMed Central

    Lorenzi, Philip L; Claerhout, Sofie; Mills, Gordon B; Weinstein, John N

    2014-01-01

    Autophagy, a programmed process in which cell contents are delivered to lysosomes for degradation, appears to have both tumor-suppressive and tumor-promoting functions; both stimulation and inhibition of autophagy have been reported to induce cancer cell death, and particular genes and proteins have been associated both positively and negatively with autophagy. To provide a basis for incisive analysis of those complexities and ambiguities and to guide development of new autophagy-targeted treatments for cancer, we have compiled a comprehensive, curated inventory of autophagy modulators by integrating information from published siRNA screens, multiple pathway analysis algorithms, and extensive, manually curated text-mining of the literature. The resulting inventory includes 739 proteins and 385 chemicals (including drugs, small molecules, and metabolites). Because autophagy is still at an early stage of investigation, we provide extensive analysis of our sources of information and their complex relationships with each other. We conclude with a discussion of novel strategies that could potentially be used to target autophagy for cancer therapy. PMID:24906121

  19. Identifying relevant data for a biological database: handcrafted rules versus machine learning.

    PubMed

    Sehgal, Aditya Kumar; Das, Sanmay; Noto, Keith; Saier, Milton H; Elkan, Charles

    2011-01-01

    With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.

  20. Design database for quantitative trait loci (QTL) data warehouse, data mining, and meta-analysis.

    PubMed

    Hu, Zhi-Liang; Reecy, James M; Wu, Xiao-Lin

    2012-01-01

    A database can be used to warehouse quantitative trait loci (QTL) data from multiple sources for comparison, genomic data mining, and meta-analysis. A robust database design involves sound data structure logistics, meaningful data transformations, normalization, and proper user interface designs. This chapter starts with a brief review of relational database basics and concentrates on issues associated with curation of QTL data into a relational database, with emphasis on the principles of data normalization and structure optimization. In addition, some simple examples of QTL data mining and meta-analysis are included. These examples are provided to help readers better understand the potential and importance of sound database design.

  1. PrionHome: a database of prions and other sequences relevant to prion phenomena.

    PubMed

    Harbi, Djamel; Parthiban, Marimuthu; Gendoo, Deena M A; Ehsani, Sepehr; Kumar, Manish; Schmitt-Ulms, Gerold; Sowdhamini, Ramanathan; Harrison, Paul M

    2012-01-01

    Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions), prionoids (i.e., proteins that propagate like prions between individual cells), and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant), and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments). We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic sequences, showing

  2. PrionHome: A Database of Prions and Other Sequences Relevant to Prion Phenomena

    PubMed Central

    Harbi, Djamel; Parthiban, Marimuthu; Gendoo, Deena M. A.; Ehsani, Sepehr; Kumar, Manish; Schmitt-Ulms, Gerold; Sowdhamini, Ramanathan; Harrison, Paul M.

    2012-01-01

    Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions), prionoids (i.e., proteins that propagate like prions between individual cells), and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant), and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments). We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic sequences, showing

  3. The CARLSBAD database: a confederated database of chemical bioactivities.

    PubMed

    Mathias, Stephen L; Hines-Kay, Jarrett; Yang, Jeremy J; Zahoransky-Kohalmi, Gergely; Bologa, Cristian G; Ursu, Oleg; Oprea, Tudor I

    2013-01-01

    Many bioactivity databases offer information regarding the biological activity of small molecules on protein targets. Information in these databases is often hard to resolve with certainty because of subsetting different data in a variety of formats; use of different bioactivity metrics; use of different identifiers for chemicals and proteins; and having to access different query interfaces, respectively. Given the multitude of data sources, interfaces and standards, it is challenging to gather relevant facts and make appropriate connections and decisions regarding chemical-protein associations. The CARLSBAD database has been developed as an integrated resource, focused on high-quality subsets from several bioactivity databases, which are aggregated and presented in a uniform manner, suitable for the study of the relationships between small molecules and targets. In contrast to data collection resources, CARLSBAD provides a single normalized activity value of a given type for each unique chemical-protein target pair. Two types of scaffold perception methods have been implemented and are available for datamining: HierS (hierarchical scaffolds) and MCES (maximum common edge subgraph). The 2012 release of CARLSBAD contains 439 985 unique chemical structures, mapped onto 1,420 889 unique bioactivities, and annotated with 277 140 HierS scaffolds and 54 135 MCES chemical patterns, respectively. Of the 890 323 unique structure-target pairs curated in CARLSBAD, 13.95% are aggregated from multiple structure-target values: 94 975 are aggregated from two bioactivities, 14 544 from three, 7 930 from four and 2214 have five bioactivities, respectively. CARLSBAD captures bioactivities and tags for 1435 unique chemical structures of active pharmaceutical ingredients (i.e. 'drugs'). CARLSBAD processing resulted in a net 17.3% data reduction for chemicals, 34.3% reduction for bioactivities, 23% reduction for HierS and 25% reduction for MCES, respectively. The CARLSBAD database

  4. The CARLSBAD Database: A Confederated Database of Chemical Bioactivities

    PubMed Central

    Mathias, Stephen L.; Hines-Kay, Jarrett; Yang, Jeremy J.; Zahoransky-Kohalmi, Gergely; Bologa, Cristian G.; Ursu, Oleg; Oprea, Tudor I.

    2013-01-01

    Many bioactivity databases offer information regarding the biological activity of small molecules on protein targets. Information in these databases is often hard to resolve with certainty because of subsetting different data in a variety of formats; use of different bioactivity metrics; use of different identifiers for chemicals and proteins; and having to access different query interfaces, respectively. Given the multitude of data sources, interfaces and standards, it is challenging to gather relevant facts and make appropriate connections and decisions regarding chemical–protein associations. The CARLSBAD database has been developed as an integrated resource, focused on high-quality subsets from several bioactivity databases, which are aggregated and presented in a uniform manner, suitable for the study of the relationships between small molecules and targets. In contrast to data collection resources, CARLSBAD provides a single normalized activity value of a given type for each unique chemical–protein target pair. Two types of scaffold perception methods have been implemented and are available for datamining: HierS (hierarchical scaffolds) and MCES (maximum common edge subgraph). The 2012 release of CARLSBAD contains 439 985 unique chemical structures, mapped onto 1,420 889 unique bioactivities, and annotated with 277 140 HierS scaffolds and 54 135 MCES chemical patterns, respectively. Of the 890 323 unique structure–target pairs curated in CARLSBAD, 13.95% are aggregated from multiple structure–target values: 94 975 are aggregated from two bioactivities, 14 544 from three, 7 930 from four and 2214 have five bioactivities, respectively. CARLSBAD captures bioactivities and tags for 1435 unique chemical structures of active pharmaceutical ingredients (i.e. ‘drugs’). CARLSBAD processing resulted in a net 17.3% data reduction for chemicals, 34.3% reduction for bioactivities, 23% reduction for HierS and 25% reduction for MCES, respectively. The CARLSBAD

  5. Introduction to Hydraulics. Student's Manual.

    ERIC Educational Resources Information Center

    Texas Univ., Austin. Extension Instruction and Materials Center.

    This manual on hydraulics is one of a series of individualized instructional materials for students. The manual is self-paced, but is designed to be used under the supervision of an instructor. The manual contains 10 assignments, each with all the information needed, a list of objectives that should be met, and exercise questions that can help in…

  6. Proteome-wide Subcellular Topologies of E. coli Polypeptides Database (STEPdb)*

    PubMed Central

    Orfanoudaki, Georgia; Economou, Anastassios

    2014-01-01

    Cell compartmentalization serves both the isolation and the specialization of cell functions. After synthesis in the cytoplasm, over a third of all proteins are targeted to other subcellular compartments. Knowing how proteins are distributed within the cell and how they interact is a prerequisite for understanding it as a whole. Surface and secreted proteins are important pathogenicity determinants. Here we present the STEP database (STEPdb) that contains a comprehensive characterization of subcellular localization and topology of the complete proteome of Escherichia coli. Two widely used E. coli proteomes (K-12 and BL21) are presented organized into thirteen subcellular classes. STEPdb exploits the wealth of genetic, proteomic, biochemical, and functional information on protein localization, secretion, and targeting in E. coli, one of the best understood model organisms. Subcellular annotations were derived from a combination of bioinformatics prediction, proteomic, biochemical, functional, topological data and extensive literature re-examination that were refined through manual curation. Strong experimental support for the location of 1553 out of 4303 proteins was based on 426 articles and some experimental indications for another 526. Annotations were provided for another 320 proteins based on firm bioinformatic predictions. STEPdb is the first database that contains an extensive set of peripheral IM proteins (PIM proteins) and includes their graphical visualization into complexes, cellular functions, and interactions. It also summarizes all currently known protein export machineries of E. coli K-12 and pairs them, where available, with the secretory proteins that use them. It catalogs the Sec- and TAT-utilizing secretomes and summarizes their topological features such as signal peptides and transmembrane regions, transmembrane topologies and orientations. It also catalogs physicochemical and structural features that influence topology such as abundance

  7. Analysis of disease-associated objects at the Rat Genome Database.

    PubMed

    Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G T; Smith, Jennifer R; Petri, Victoria; Lowry, Timothy F; Nigam, Rajni; Dwinell, Melinda R; Worthey, Elizabeth A; Munzenmaier, Diane H; Shimoyama, Mary; Jacob, Howard J

    2013-01-01

    The Rat Genome Database (RGD) is the premier resource for genetic, genomic and phenotype data for the laboratory rat, Rattus norvegicus. In addition to organizing biological data from rats, the RGD team focuses on manual curation of gene-disease associations for rat, human and mouse. In this work, we have analyzed disease-associated strains, quantitative trait loci (QTL) and genes from rats. These disease objects form the basis for seven disease portals. Among disease portals, the cardiovascular disease and obesity/metabolic syndrome portals have the highest number of rat strains and QTL. These two portals share 398 rat QTL, and these shared QTL are highly concentrated on rat chromosomes 1 and 2. For disease-associated genes, we performed gene ontology (GO) enrichment analysis across portals using RatMine enrichment widgets. Fifteen GO terms, five from each GO aspect, were selected to profile enrichment patterns of each portal. Of the selected biological process (BP) terms, 'regulation of programmed cell death' was the top enriched term across all disease portals except in the obesity/metabolic syndrome portal where 'lipid metabolic process' was the most enriched term. 'Cytosol' and 'nucleus' were common cellular component (CC) annotations for disease genes, but only the cancer portal genes were highly enriched with 'nucleus' annotations. Similar enrichment patterns were observed in a parallel analysis using the DAVID functional annotation tool. The relationship between the preselected 15 GO terms and disease terms was examined reciprocally by retrieving rat genes annotated with these preselected terms. The individual GO term-annotated gene list showed enrichment in physiologically related diseases. For example, the 'regulation of blood pressure' genes were enriched with cardiovascular disease annotations, and the 'lipid metabolic process' genes with obesity annotations. Furthermore, we were able to enhance enrichment of neurological diseases by combining 'G

  8. NUCLEAR SCIENCE REFERENCES CODING MANUAL

    SciTech Connect

    WINCHELL,D.F.

    2007-04-01

    This manual is intended as a guide for Nuclear Science References (NSR) compilers. The basic conventions followed at the National Nuclear Data Center (NNDC), which are compatible with the maintenance and updating of and retrieval from the Nuclear Science References (NSR) file, are outlined. The NSR database originated at the Nuclear Data Project (NDP) at Oak Ridge National Laboratory as part of a project for systematic evaluation of nuclear structure data.1 Each entry in this computer file corresponds to a bibliographic reference that is uniquely identified by a Keynumber and is describable by a Topic and Keywords. It has been used since 1969 to produce bibliographic citations for evaluations published in Nuclear Data Sheets. Periodic additions to the file were published as the ''Recent References'' issues of Nuclear Data Sheets prior to 2005. In October 1980, the maintenance and updating of the NSR file became the responsibility of the NNDC at Brookhaven National Laboratory. The basic structure and contents of the NSR file remained unchanged during the transfer. In Chapter 2, the elements of the NSR file such as the valid record identifiers, record contents, and text fields are enumerated. Relevant comments regarding a new entry into the NSR file and assignment of a keynumber are also given in Chapter 2. In Chapter 3, the format for keyword abstracts is given followed by specific examples; for each TOPIC, the criteria for inclusion of an article as an entry into the NSR file as well as coding procedures are described. Authors preparing Keyword abstracts either to be published in a Journal (e.g., Nucl. Phys. A) or to be sent directly to NNDC (e.g., Phys. Rev. C) should follow the illustrations in Chapter 3. The scope of 1See W.B.Ewbank, ORNL-5397 (1978). the literature covered at the NNDC, the categorization into Primary and Secondary sources, etc., is discussed in Chapter 4. Useful information regarding permitted character sets, recommended abbreviations, etc., is

  9. The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands

    PubMed Central

    Southan, Christopher; Sharman, Joanna L.; Benson, Helen E.; Faccenda, Elena; Pawson, Adam J.; Alexander, Stephen P. H.; Buneman, O. Peter; Davenport, Anthony P.; McGrath, John C.; Peters, John A.; Spedding, Michael; Catterall, William A.; Fabbro, Doriano; Davies, Jamie A.

    2016-01-01

    The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, http://www.guidetopharmacology.org) provides expert-curated molecular interactions between successful and potential drugs and their targets in the human genome. Developed by the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS), this resource, and its earlier incarnation as IUPHAR-DB, is described in our 2014 publication. This update incorporates changes over the intervening seven database releases. The unique model of content capture is based on established and new target class subcommittees collaborating with in-house curators. Most information comes from journal articles, but we now also index kinase cross-screening panels. Targets are specified by UniProtKB IDs. Small molecules are defined by PubChem Compound Identifiers (CIDs); ligand capture also includes peptides and clinical antibodies. We have extended the capture of ligands and targets linked via published quantitative binding data (e.g. Ki, IC50 or Kd). The resulting pharmacological relationship network now defines a data-supported druggable genome encompassing 7% of human proteins. The database also provides an expanded substrate for the biennially published compendium, the Concise Guide to PHARMACOLOGY. This article covers content increase, entity analysis, revised curation strategies, new website features and expanded download options. PMID:26464438

  10. The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands.

    PubMed

    Southan, Christopher; Sharman, Joanna L; Benson, Helen E; Faccenda, Elena; Pawson, Adam J; Alexander, Stephen P H; Buneman, O Peter; Davenport, Anthony P; McGrath, John C; Peters, John A; Spedding, Michael; Catterall, William A; Fabbro, Doriano; Davies, Jamie A

    2016-01-04

    The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, http://www.guidetopharmacology.org) provides expert-curated molecular interactions between successful and potential drugs and their targets in the human genome. Developed by the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS), this resource, and its earlier incarnation as IUPHAR-DB, is described in our 2014 publication. This update incorporates changes over the intervening seven database releases. The unique model of content capture is based on established and new target class subcommittees collaborating with in-house curators. Most information comes from journal articles, but we now also index kinase cross-screening panels. Targets are specified by UniProtKB IDs. Small molecules are defined by PubChem Compound Identifiers (CIDs); ligand capture also includes peptides and clinical antibodies. We have extended the capture of ligands and targets linked via published quantitative binding data (e.g. Ki, IC50 or Kd). The resulting pharmacological relationship network now defines a data-supported druggable genome encompassing 7% of human proteins. The database also provides an expanded substrate for the biennially published compendium, the Concise Guide to PHARMACOLOGY. This article covers content increase, entity analysis, revised curation strategies, new website features and expanded download options.

  11. Agile data management for curation of genomes to watershed datasets

    NASA Astrophysics Data System (ADS)

    Varadharajan, C.; Agarwal, D.; Faybishenko, B.; Versteeg, R.

    2015-12-01

    A software platform is being developed for data management and assimilation [DMA] as part of the U.S. Department of Energy's Genomes to Watershed Sustainable Systems Science Focus Area 2.0. The DMA components and capabilities are driven by the project science priorities and the development is based on agile development techniques. The goal of the DMA software platform is to enable users to integrate and synthesize diverse and disparate field, laboratory, and simulation datasets, including geological, geochemical, geophysical, microbiological, hydrological, and meteorological data across a range of spatial and temporal scales. The DMA objectives are (a) developing an integrated interface to the datasets, (b) storing field monitoring data, laboratory analytical results of water and sediments samples collected into a database, (c) providing automated QA/QC analysis of data and (d) working with data providers to modify high-priority field and laboratory data collection and reporting procedures as needed. The first three objectives are driven by user needs, while the last objective is driven by data management needs. The project needs and priorities are reassessed regularly with the users. After each user session we identify development priorities to match the identified user priorities. For instance, data QA/QC and collection activities have focused on the data and products needed for on-going scientific analyses (e.g. water level and geochemistry). We have also developed, tested and released a broker and portal that integrates diverse datasets from two different databases used for curation of project data. The development of the user interface was based on a user-centered design process involving several user interviews and constant interaction with data providers. The initial version focuses on the most requested feature - i.e. finding the data needed for analyses through an intuitive interface. Once the data is found, the user can immediately plot and download data

  12. Phylesystem: a git-based data store for community-curated phylogenetic estimates

    PubMed Central

    McTavish, Emily Jane; Hinchliff, Cody E.; Allman, James F.; Brown, Joseph W.; Cranston, Karen A.; Rees, Jonathan A.; Smith, Stephen A.

    2015-01-01

    Motivation: Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al., 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. Results: Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git’s version control features. Hosting this data store on GitHub (http://github.com/) provides open access to the data store using tools familiar to many developers. We have deployed a server running the ‘phylesystem-api’, which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. Availability and implementation: Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api. The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem. A web

  13. ALACARTE user manual

    USGS Publications Warehouse

    Wentworth, Carl M.; Fitzgibbon, Todd T.

    1991-01-01

    Compilations begun in other digital systems can be imported for completion as digital databases in ARC/INFO. The digital files that represent a geologic map can be used to prepare near-publication-quality color plots of the maps with full symbology or to create high-quality printing negatives. These files also constitute a digital database that can be used for computer-based query and analysis as well as for digital distribution of the map and associated data.

  14. Enhancing a Pathway-Genome Database (PGDB) to Capture Subcellular Localization of Metabolites and Enzymes: The Nucleotide-Sugar Biosynthetic Pathways of Populus trichocarpa

    SciTech Connect

    Nag, A.; Karpinets, T. V.; Chang, C. H.; Bar-Peled, M.

    2012-01-01

    Understanding how cellular metabolism works and is regulated requires that the underlying biochemical pathways be adequately represented and integrated with large metabolomic data sets to establish a robust network model. Genetically engineering energy crops to be less recalcitrant to saccharification requires detailed knowledge of plant polysaccharide structures and a thorough understanding of the metabolic pathways involved in forming and regulating cell-wall synthesis. Nucleotide-sugars are building blocks for synthesis of cell wall polysaccharides. The biosynthesis of nucleotide-sugars is catalyzed by a multitude of enzymes that reside in different subcellular organelles, and precise representation of these pathways requires accurate capture of this biological compartmentalization. The lack of simple localization cues in genomic sequence data and annotations however leads to missing compartmentalization information for eukaryotes in automatically generated databases, such as the Pathway-Genome Databases (PGDBs) of the SRI Pathway Tools software that drives much biochemical knowledge representation on the internet. In this report, we provide an informal mechanism using the existing Pathway Tools framework to integrate protein and metabolite sub-cellular localization data with the existing representation of the nucleotide-sugar metabolic pathways in a prototype PGDB for Populus trichocarpa. The enhanced pathway representations have been successfully used to map SNP abundance data to individual nucleotide-sugar biosynthetic genes in the PGDB. The manually curated pathway representations are more conducive to the construction of a computational platform that will allow the simulation of natural and engineered nucleotide-sugar precursor fluxes into specific recalcitrant polysaccharide(s).

  15. Bigfoot Field Manual

    NASA Astrophysics Data System (ADS)

    Campbell, J. L.; Burrows, S.; Gower, S. T.; Cohen, W. B.

    1999-09-01

    The BigFoot Project is funded by the Earth Science Enterprise to collect and organize data to be used in the EOS Validation Program. The data collected by the BigFoot Project are unique in being ground-based observations coincident with satellite overpasses. In addition to collecting data, the BigFoot project will develop and test new algorithms for scaling point measurements to the same spatial scales as the EOS satellite products. This BigFoot Field Manual Mill be used to achieve completeness and consistency of data collected at four initial BigFoot sites and at future sites that may collect similar validation data. Therefore, validation datasets submitted to the ORNL DAAC that have been compiled in a manner consistent with the field manual will be especially valuable in the validation program.

  16. Qualified Citation Indexing Project. Information Retrieval Program 'SOUCIT.' Users Manual, Version 1.0.

    ERIC Educational Resources Information Center

    Aberdeen Univ. (Scotland). Univ. Teaching Centre.

    This operating manual was constructed for the first version of the experimental database system SOUCIT, a qualified citation index for educational technology designed by the Computer Services Unit of Robert Gordon's Institute of Technology (RGIT) in association with the Scottish Education Department. The database comprises 45 references generated…

  17. The Steward Observatory asteroid relational database

    NASA Technical Reports Server (NTRS)

    Sykes, Mark V.; Alvarezdelcastillo, Elizabeth M.

    1991-01-01

    The Steward Observatory Asteroid Relational Database (SOARD) was created as a flexible tool for undertaking studies of asteroid populations and sub-populations, to probe the biases intrinsic to asteroid databases, to ascertain the completeness of data pertaining to specific problems, to aid in the development of observational programs, and to develop pedagogical materials. To date, SOARD has compiled an extensive list of data available on asteroids and made it accessible through a single menu-driven database program. Users may obtain tailored lists of asteroid properties for any subset of asteroids or output files which are suitable for plotting spectral data on individual asteroids. The program has online help as well as user and programmer documentation manuals. The SOARD already has provided data to fulfill requests by members of the astronomical community. The SOARD continues to grow as data is added to the database and new features are added to the program.

  18. IAC user manual

    NASA Technical Reports Server (NTRS)

    Vos, R. G.; Beste, D. L.; Gregg, J.

    1984-01-01

    The User Manual for the Integrated Analysis Capability (IAC) Level 1 system is presented. The IAC system currently supports the thermal, structures, controls and system dynamics technologies, and its development is influenced by the requirements for design/analysis of large space systems. The system has many features which make it applicable to general problems in engineering, and to management of data and software. Information includes basic IAC operation, executive commands, modules, solution paths, data organization and storage, IAC utilities, and module implementation.

  19. Force user's manual, revised

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

    1987-01-01

    A methodology for writing parallel programs for shared memory multiprocessors has been formalized as an extension to the Fortran language and implemented as a macro preprocessor. The extended language is known as the Force, and this manual describes how to write Force programs and execute them on the Flexible Computer Corporation Flex/32, the Encore Multimax and the Sequent Balance computers. The parallel extension macros are described in detail, but knowledge of Fortran is assumed.

  20. ASSIST internals reference manual

    NASA Technical Reports Server (NTRS)

    Johnson, Sally C.; Boerschlein, David P.

    1994-01-01

    The Abstract Semi-Markov Specification Interface to the SURE Tool (ASSIST) program was developed at NASA LaRC in order to analyze the reliability of virtually any fault-tolerant system. A user manual was developed to detail its use. Certain technical specifics are of no concern to the end user, yet are of importance to those who must maintain and/or verify the correctness of the tool. This document takes a detailed look into these technical issues.