Science.gov

Sample records for manually curated database

  1. TRIP Database: a manually curated database of protein–protein interactions for mammalian TRP channels

    PubMed Central

    Shin, Young-Cheul; Shin, Soo-Yong; So, Insuk; Kwon, Dongseop; Jeon, Ju-Hong

    2011-01-01

    Transient receptor potential (TRP) channels are a superfamily of Ca2+-permeable cation channels that translate cellular stimuli into electrochemical signals. Aberrant activity of TRP channels has been implicated in a variety of human diseases, such as neurological disorders, cardiovascular disease and cancer. To facilitate the understanding of the molecular network by which TRP channels are associated with biological and disease processes, we have developed the TRIP (TRansient receptor potential channel-Interacting Protein) Database (http://www.trpchannel.org), a manually curated database that aims to offer comprehensive information on protein–protein interactions (PPIs) of mammalian TRP channels. The TRIP Database was created by systematically curating 277 peer-reviewed literature; the current version documents 490 PPI pairs, 28 TRP channels and 297 cellular proteins. The TRIP Database provides a detailed summary of PPI data that fit into four categories: screening, validation, characterization and functional consequence. Users can find in-depth information specified in the literature on relevant analytical methods and experimental resources, such as gene constructs and cell/tissue types. The TRIP Database has user-friendly web interfaces with helpful features, including a search engine, an interaction map and a function for cross-referencing useful external databases. Our TRIP Database will provide a valuable tool to assist in understanding the molecular regulatory network of TRP channels. PMID:20851834

  2. The curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database

    PubMed Central

    Davis, Allan Peter; Wiegers, Thomas C.; Murphy, Cynthia G.; Mattingly, Carolyn J.

    2011-01-01

    The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and convert free-text information into a structured format using official nomenclature, integrating third party controlled vocabularies for chemicals, genes, diseases and organisms, and a novel controlled vocabulary for molecular interactions. Manual curation produces a robust, richly annotated dataset of highly accurate and detailed information. Currently, CTD describes over 349 000 molecular interactions between 6800 chemicals, 20 900 genes (for 330 organisms) and 4300 diseases that have been manually curated from over 25 400 peer-reviewed articles. This manually curated data are further integrated with other third party data (e.g. Gene Ontology, KEGG and Reactome annotations) to generate a wealth of toxicogenomic relationships. Here, we describe our approach to manual curation that uses a powerful and efficient paradigm involving mnemonic codes. This strategy allows biocurators to quickly capture detailed information from articles by generating simple statements using codes to represent the relationships between data types. The paradigm is versatile, expandable, and able to accommodate new data challenges that arise. We have incorporated this strategy into a web-based curation tool to further increase efficiency and productivity, implement quality control in real-time and accommodate biocurators working remotely. Database URL: http://ctd.mdibl.org PMID:21933848

  3. Manual curation is not sufficient for annotation of genomic databases

    PubMed Central

    Baumgartner, William A.; Cohen, K. Bretonnel; Fox, Lynne M.; Acquaah-Mensah, George; Hunter, Lawrence

    2008-01-01

    Motivation Knowledge base construction has been an area of intense activity and great importance in the growth of computational biology. However, there is little or no history of work on the subject of evaluation of knowledge bases, either with respect to their contents or with respect to the processes by which they are constructed. This article proposes the application of a metric from software engineering known as the found/fixed graph to the problem of evaluating the processes by which genomic knowledge bases are built, as well as the completeness of their contents. Results Well-understood patterns of change in the found/fixed graph are found to occur in two large publicly available knowledge bases. These patterns suggest that the current manual curation processes will take far too long to complete the annotations of even just the most important model organisms, and that at their current rate of production, they will never be sufficient for completing the annotation of all currently available proteomes. Contact larry.hunter@uchsc.edu PMID:17646325

  4. A comprehensive manually curated protein–protein interaction database for the Death Domain superfamily

    PubMed Central

    Kwon, Dongseop; Yoon, Jong Hwan; Shin, Soo-Yong; Jang, Tae-Ho; Kim, Hong-Gee; So, Insuk; Jeon, Ju-Hong; Park, Hyun Ho

    2012-01-01

    The Death Domain (DD) superfamily, which is one of the largest classes of protein interaction modules, plays a pivotal role in apoptosis, inflammation, necrosis and immune cell signaling pathways. Because aberrant or inappropriate DD superfamily-mediated signaling events are associated with various human diseases, such as cancers, neurodegenerative diseases and immunological disorders, the studies in these fields are of great biological and clinical importance. To facilitate the understanding of the molecular mechanisms by which the DD superfamily is associated with biological and disease processes, we have developed the DD database (http://www.deathdomain.org), a manually curated database that aims to offer comprehensive information on protein–protein interactions (PPIs) of the DD superfamily. The DD database was created by manually curating 295 peer-reviewed studies that were published in the literature; the current version documents 175 PPI pairs among the 99 DD superfamily proteins. The DD database provides a detailed summary of the DD superfamily proteins and their PPI data. Users can find in-depth information that is specified in the literature on relevant analytical methods, experimental resources and domain structures. Our database provides a definitive and valuable tool that assists researchers in understanding the signaling network that is mediated by the DD superfamily. PMID:22135292

  5. LMPID: A manually curated database of linear motifs mediating protein–protein interactions

    PubMed Central

    Sarkar, Debasree; Jana, Tanmoy; Saha, Sudipto

    2015-01-01

    Linear motifs (LMs), used by a subset of all protein–protein interactions (PPIs), bind to globular receptors or domains and play an important role in signaling networks. LMPID (Linear Motif mediated Protein Interaction Database) is a manually curated database which provides comprehensive experimentally validated information about the LMs mediating PPIs from all organisms on a single platform. About 2200 entries have been compiled by detailed manual curation of PubMed abstracts, of which about 1000 LM entries were being annotated for the first time, as compared with the Eukaryotic LM resource. The users can submit their query through a user-friendly search page and browse the data in the alphabetical order of the bait gene names and according to the domains interacting with the LM. LMPID is freely accessible at http://bicresources.jcbose. ac.in/ssaha4/lmpid and contains 1750 unique LM instances found within 1181 baits interacting with 552 prey proteins. In summary, LMPID is an attempt to enrich the existing repertoire of resources available for studying the LMs implicated in PPIs and may help in understanding the patterns of LMs binding to a specific domain and develop prediction model to identify novel LMs specific to a domain and further able to predict inhibitors/modulators of PPI of interest. Database URL: http://bicresources.jcbose.ac.in/ssaha4/lmpid PMID:25776024

  6. Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees.

    PubMed

    Matsuya, Akihiro; Sakate, Ryuichi; Kawahara, Yoshihiro; Koyanagi, Kanako O; Sato, Yoshiharu; Fujii, Yasuyuki; Yamasaki, Chisato; Habara, Takuya; Nakaoka, Hajime; Todokoro, Fusano; Yamaguchi, Kaori; Endo, Toshinori; Oota, Satoshi; Makalowski, Wojciech; Ikeo, Kazuho; Suzuki, Yoshiyuki; Hanada, Kousuke; Hashimoto, Katsuyuki; Hirai, Momoki; Iwama, Hisakazu; Saitou, Naruya; Hiraki, Aiko T; Jin, Lihua; Kaneko, Yayoi; Kanno, Masako; Murakami, Katsuhiko; Noda, Akiko Ogura; Saichi, Naomi; Sanbonmatsu, Ryoko; Suzuki, Mami; Takeda, Jun-ichi; Tanaka, Masayuki; Gojobori, Takashi; Imanishi, Tadashi; Itoh, Takeshi

    2008-01-01

    Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Currently, with the rapid growth of transcriptome data of various species, more reliable orthology information is prerequisite for further studies. However, detection of orthologs could be erroneous if pairwise distance-based methods, such as reciprocal BLAST searches, are utilized. Thus, as a sub-database of H-InvDB, an integrated database of annotated human genes (http://h-invitational.jp/), we constructed a fully curated database of evolutionary features of human genes, called 'Evola'. In the process of the ortholog detection, computational analysis based on conserved genome synteny and transcript sequence similarity was followed by manual curation by researchers examining phylogenetic trees. In total, 18 968 human genes have orthologs among 11 vertebrates (chimpanzee, mouse, cow, chicken, zebrafish, etc.), either computationally detected or manually curated orthologs. Evola provides amino acid sequence alignments and phylogenetic trees of orthologs and homologs. In 'd(N)/d(S) view', natural selection on genes can be analyzed between human and other species. In 'Locus maps', all transcript variants and their exon/intron structures can be compared among orthologous gene loci. We expect the Evola to serve as a comprehensive and reliable database to be utilized in comparative analyses for obtaining new knowledge about human genes. Evola is available at http://www.h-invitational.jp/evola/. PMID:17982176

  7. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs

    PubMed Central

    Wang, Peng; Zhi, Hui; Zhang, Yunpeng; Liu, Yue; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Ning, Shangwei; Li, Xia

    2015-01-01

    In this study, we describe miRSponge, a manually curated database, which aims at providing an experimentally supported resource for microRNA (miRNA) sponges. Recent evidence suggests that miRNAs are themselves regulated by competing endogenous RNAs (ceRNAs) or ‘miRNA sponges’ that contain miRNA binding sites. These competitive molecules can sequester miRNAs to prevent them interacting with their natural targets to play critical roles in various biological and pathological processes. It has become increasingly important to develop a high quality database to record and store ceRNA data to support future studies. To this end, we have established the experimentally supported miRSponge database that contains data on 599 miRNA-sponge interactions and 463 ceRNA relationships from 11 species following manual curating from nearly 1200 published articles. Database classes include endogenously generated molecules including coding genes, pseudogenes, long non-coding RNAs and circular RNAs, along with exogenously introduced molecules including viral RNAs and artificial engineered sponges. Approximately 70% of the interactions were identified experimentally in disease states. miRSponge provides a user-friendly interface for convenient browsing, retrieval and downloading of dataset. A submission page is also included to allow researchers to submit newly validated miRNA sponge data. Database URL: http://www.bio-bigdata.net/miRSponge. PMID:26424084

  8. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs.

    PubMed

    Wang, Peng; Zhi, Hui; Zhang, Yunpeng; Liu, Yue; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Ning, Shangwei; Li, Xia

    2015-01-01

    In this study, we describe miRSponge, a manually curated database, which aims at providing an experimentally supported resource for microRNA (miRNA) sponges. Recent evidence suggests that miRNAs are themselves regulated by competing endogenous RNAs (ceRNAs) or 'miRNA sponges' that contain miRNA binding sites. These competitive molecules can sequester miRNAs to prevent them interacting with their natural targets to play critical roles in various biological and pathological processes. It has become increasingly important to develop a high quality database to record and store ceRNA data to support future studies. To this end, we have established the experimentally supported miRSponge database that contains data on 599 miRNA-sponge interactions and 463 ceRNA relationships from 11 species following manual curating from nearly 1200 published articles. Database classes include endogenously generated molecules including coding genes, pseudogenes, long non-coding RNAs and circular RNAs, along with exogenously introduced molecules including viral RNAs and artificial engineered sponges. Approximately 70% of the interactions were identified experimentally in disease states. miRSponge provides a user-friendly interface for convenient browsing, retrieval and downloading of dataset. A submission page is also included to allow researchers to submit newly validated miRNA sponge data. Database URL: http://www.bio-bigdata.net/miRSponge. PMID:26424084

  9. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers

    PubMed Central

    Ning, Shangwei; Zhang, Jizhou; Wang, Peng; Zhi, Hui; Wang, Jianjian; Liu, Yue; Gao, Yue; Guo, Maoni; Yue, Ming; Wang, Lihua; Li, Xia

    2016-01-01

    Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) is a manually curated database of cancer-associated long non-coding RNAs (lncRNAs) with experimental support that aims to provide a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. LncRNAs represent a large category of functional RNA molecules that play a significant role in human cancers. A curated collection and summary of deregulated lncRNAs in cancer is essential to thoroughly understand the mechanisms and functions of lncRNAs. Here, we developed the Lnc2Cancer database, which contains 1057 manually curated associations between 531 lncRNAs and 86 human cancers. Each association includes lncRNA and cancer name, the lncRNA expression pattern, experimental techniques, a brief functional description, the original reference and additional annotation information. Lnc2Cancer provides a user-friendly interface to conveniently browse, retrieve and download data. Lnc2Cancer also offers a submission page for researchers to submit newly validated lncRNA-cancer associations. With the rapidly increasing interest in lncRNAs, Lnc2Cancer will significantly improve our understanding of lncRNA deregulation in cancer and has the potential to be a timely and valuable resource. PMID:26481356

  10. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers.

    PubMed

    Ning, Shangwei; Zhang, Jizhou; Wang, Peng; Zhi, Hui; Wang, Jianjian; Liu, Yue; Gao, Yue; Guo, Maoni; Yue, Ming; Wang, Lihua; Li, Xia

    2016-01-01

    Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) is a manually curated database of cancer-associated long non-coding RNAs (lncRNAs) with experimental support that aims to provide a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. LncRNAs represent a large category of functional RNA molecules that play a significant role in human cancers. A curated collection and summary of deregulated lncRNAs in cancer is essential to thoroughly understand the mechanisms and functions of lncRNAs. Here, we developed the Lnc2Cancer database, which contains 1057 manually curated associations between 531 lncRNAs and 86 human cancers. Each association includes lncRNA and cancer name, the lncRNA expression pattern, experimental techniques, a brief functional description, the original reference and additional annotation information. Lnc2Cancer provides a user-friendly interface to conveniently browse, retrieve and download data. Lnc2Cancer also offers a submission page for researchers to submit newly validated lncRNA-cancer associations. With the rapidly increasing interest in lncRNAs, Lnc2Cancer will significantly improve our understanding of lncRNA deregulation in cancer and has the potential to be a timely and valuable resource. PMID:26481356

  11. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation

    PubMed Central

    Thangakani, A. Mary; Nagarajan, R.; Kumar, Sandeep; Sakthivel, R.; Velmurugan, D.; Gromiha, M. Michael

    2016-01-01

    Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are critical to research on protein misfolding diseases, such as Alzheimer’s and Parkinson’s, as well as biotechnological production of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from experimental studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 experimentally observed aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of additional information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Experimentally validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways. CPAD is freely available at http://www.iitm.ac.in/bioinfo/CPAD/. The potential applications of CPAD have also been discussed. PMID:27043825

  12. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation.

    PubMed

    Thangakani, A Mary; Nagarajan, R; Kumar, Sandeep; Sakthivel, R; Velmurugan, D; Gromiha, M Michael

    2016-01-01

    Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are critical to research on protein misfolding diseases, such as Alzheimer's and Parkinson's, as well as biotechnological production of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from experimental studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 experimentally observed aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of additional information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Experimentally validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways. CPAD is freely available at http://www.iitm.ac.in/bioinfo/CPAD/. The potential applications of CPAD have also been discussed. PMID:27043825

  13. PCOSDB: PolyCystic Ovary Syndrome Database for manually curated disease associated genes

    PubMed Central

    Jesintha Mary, Maniraja; Vetrivel, Umashankar; Munuswamy, Deecaraman; Melanathuru, Vijayalakshmi

    2016-01-01

    Polycystic ovary syndrome (PCOS) is a complex disorder affecting approximately 5–10 percent of all women of reproductive age. It is a multi-factorial endocrine disorder, which demonstrates menstrual disturbance, infertility, anovulation, hirsutism, hyper androgenism and others. It has been indicated that differential expression of genes, genetic level variations, and other molecular alterations interplay in PCOS and are the target sites for clinical applications. Therefore, integrating the PCOS-associated genes along with its alteration and underpinning the underlying mechanism might definitely provide valuable information to understand the disease mechanism. We manually curated the information from 234 published literatures, including gene, molecular alteration, details of association, significance of association, ethnicity, age, drug, and other annotated summaries. PCOSDB is an online resource that brings comprehensive information about the disease, and the implication of various genes and its mechanism. We present the curated information from peer reviewed literatures, and organized the information at various levels including differentially expressed genes in PCOS, genetic variations such as polymorphisms, mutations causing PCOS across various ethnicities. We have covered both significant and non-significant associations along with conflicting studies. PCOSDB v1.0 contains 208 gene reports, 427 molecular alterations, and 46 phenotypes associated with PCOS PMID:27212836

  14. EpiDBase: a manually curated database for small molecule modulators of epigenetic landscape

    PubMed Central

    Loharch, Saurabh; Bhutani, Isha; Jain, Kamal; Gupta, Pawan; Sahoo, Debendra K.; Parkesh, Raman

    2015-01-01

    We have developed EpiDBase (www.epidbase.org), an interactive database of small molecule ligands of epigenetic protein families by bringing together experimental, structural and chemoinformatic data in one place. Currently, EpiDBase encompasses 5784 unique ligands (11 422 entries) of various epigenetic markers such as writers, erasers and readers. The EpiDBase includes experimental IC50 values, ligand molecular weight, hydrogen bond donor and acceptor count, XlogP, number of rotatable bonds, number of aromatic rings, InChIKey, two-dimensional and three-dimensional (3D) chemical structures. A catalog of all epidbase ligands based on the molecular weight is also provided. A structure editor is provided for 3D visualization of ligands. EpiDBase is integrated with tools like text search, disease-specific search, advanced search, substructure, and similarity analysis. Advanced analysis can be performed using substructure and OpenBabel-based chemical similarity fingerprints. The EpiDBase is curated to identify unique molecular scaffolds. Initially, molecules were selected by removing peptides, macrocycles and other complex structures and then processed for conformational sampling by generating 3D conformers. Subsequent filtering through Zinc Is Not Commercial (ZINC: a free database of commercially available compounds for virtual screening) and Lilly MedChem regular rules retained many distinctive drug-like molecules. These molecules were then analyzed for physicochemical properties using OpenBabel descriptors and clustered using various methods such as hierarchical clustering, binning partition and multidimensional scaling. EpiDBase provides comprehensive resources for further design, development and refinement of small molecule modulators of epigenetic markers. Database URL: www.epidbase.org PMID:25776023

  15. EpiDBase: a manually curated database for small molecule modulators of epigenetic landscape.

    PubMed

    Loharch, Saurabh; Bhutani, Isha; Jain, Kamal; Gupta, Pawan; Sahoo, Debendra K; Parkesh, Raman

    2015-01-01

    We have developed EpiDBase (www.epidbase.org), an interactive database of small molecule ligands of epigenetic protein families by bringing together experimental, structural and chemoinformatic data in one place. Currently, EpiDBase encompasses 5784 unique ligands (11 422 entries) of various epigenetic markers such as writers, erasers and readers. The EpiDBase includes experimental IC(50) values, ligand molecular weight, hydrogen bond donor and acceptor count, XlogP, number of rotatable bonds, number of aromatic rings, InChIKey, two-dimensional and three-dimensional (3D) chemical structures. A catalog of all epidbase ligands based on the molecular weight is also provided. A structure editor is provided for 3D visualization of ligands. EpiDBase is integrated with tools like text search, disease-specific search, advanced search, substructure, and similarity analysis. Advanced analysis can be performed using substructure and OpenBabel-based chemical similarity fingerprints. The EpiDBase is curated to identify unique molecular scaffolds. Initially, molecules were selected by removing peptides, macrocycles and other complex structures and then processed for conformational sampling by generating 3D conformers. Subsequent filtering through Zinc Is Not Commercial (ZINC: a free database of commercially available compounds for virtual screening) and Lilly MedChem regular rules retained many distinctive drug-like molecules. These molecules were then analyzed for physicochemical properties using OpenBabel descriptors and clustered using various methods such as hierarchical clustering, binning partition and multidimensional scaling. EpiDBase provides comprehensive resources for further design, development and refinement of small molecule modulators of epigenetic markers. PMID:25776023

  16. ONRLDB—manually curated database of experimentally validated ligands for orphan nuclear receptors: insights into new drug discovery

    PubMed Central

    Nanduri, Ravikanth; Bhutani, Isha; Somavarapu, Arun Kumar; Mahajan, Sahil; Parkesh, Raman; Gupta, Pawan

    2015-01-01

    Orphan nuclear receptors are potential therapeutic targets. The Orphan Nuclear Receptor Ligand Binding Database (ONRLDB) is an interactive, comprehensive and manually curated database of small molecule ligands targeting orphan nuclear receptors. Currently, ONRLDB consists of ∼11 000 ligands, of which ∼6500 are unique. All entries include information for the ligand, such as EC50 and IC50, number of aromatic rings and rotatable bonds, XlogP, hydrogen donor and acceptor count, molecular weight (MW) and structure. ONRLDB is a cross-platform database, where either the cognate small molecule modulators of a receptor or the cognate receptors to a ligand can be searched. The database can be searched using three methods: text search, advanced search or similarity search. Substructure search, cataloguing tools, and clustering tools can be used to perform advanced analysis of the ligand based on chemical similarity fingerprints, hierarchical clustering, binning partition and multidimensional scaling. These tools, together with the Tree function provided, deliver an interactive platform and a comprehensive resource for identification of common and unique scaffolds. As demonstrated, ONRLDB is designed to allow selection of ligands based on various properties and for designing novel ligands or to improve the existing ones. Database URL: http://www.onrldb.org/ PMID:26637529

  17. 3CDB: a manually curated database of chromosome conformation capture data.

    PubMed

    Yun, Xiaoxiao; Xia, Lili; Tang, Bixia; Zhang, Hui; Li, Feifei; Zhang, Zhihua

    2016-01-01

    Chromosome conformation capture (3C) is a biochemical technology to analyse contact frequencies between selected genomic sites in a cell population. Its recent genomic variants, e.g. Hi-C/ chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the study of nuclear organization at an unprecedented level. However, due to the inherent low resolution and ultrahigh cost of Hi-C/ChIA-PET, 3C is still the gold standard for determining interactions between given regulatory DNA elements, such as enhancers and promoters. Therefore, we developed a database of 3C determined functional chromatin interactions (3CDB;http://3cdb.big.ac.cn). To construct 3CDB, we searched PubMed and Google Scholar with carefully designed keyword combinations and retrieved more than 5000 articles from which we manually extracted 3319 interactions in 17 species. Moreover, we proposed a systematic evaluation scheme for data reliability and classified the interactions into four categories. Contact frequencies are not directly comparable as a result of various modified 3C protocols employed among laboratories. Our evaluation scheme provides a plausible solution to this long-standing problem in the field. A user-friendly web interface was designed to assist quick searches in 3CDB. We believe that 3CDB will provide fundamental information for experimental design and phylogenetic analysis, as well as bridge the gap between molecular and systems biologists who must now contend with noisy high-throughput data.Database URL:http://3cdb.big.ac.cn. PMID:27081154

  18. 3CDB: a manually curated database of chromosome conformation capture data

    PubMed Central

    Yun, Xiaoxiao; Xia, Lili; Tang, Bixia; Zhang, Hui; Li, Feifei; Zhang, Zhihua

    2016-01-01

    Chromosome conformation capture (3C) is a biochemical technology to analyse contact frequencies between selected genomic sites in a cell population. Its recent genomic variants, e.g. Hi-C/ chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the study of nuclear organization at an unprecedented level. However, due to the inherent low resolution and ultrahigh cost of Hi-C/ChIA-PET, 3C is still the gold standard for determining interactions between given regulatory DNA elements, such as enhancers and promoters. Therefore, we developed a database of 3C determined functional chromatin interactions (3CDB; http://3cdb.big.ac.cn). To construct 3CDB, we searched PubMed and Google Scholar with carefully designed keyword combinations and retrieved more than 5000 articles from which we manually extracted 3319 interactions in 17 species. Moreover, we proposed a systematic evaluation scheme for data reliability and classified the interactions into four categories. Contact frequencies are not directly comparable as a result of various modified 3C protocols employed among laboratories. Our evaluation scheme provides a plausible solution to this long-standing problem in the field. A user-friendly web interface was designed to assist quick searches in 3CDB. We believe that 3CDB will provide fundamental information for experimental design and phylogenetic analysis, as well as bridge the gap between molecular and systems biologists who must now contend with noisy high-throughput data. Database URL: http://3cdb.big.ac.cn PMID:27081154

  19. A Tool for Biomarker Discovery in the Urinary Proteome: A Manually Curated Human and Animal Urine Protein Biomarker Database*

    PubMed Central

    Shao, Chen; Li, Menglin; Li, Xundou; Wei, Lilong; Zhu, Lisi; Yang, Fan; Jia, Lulu; Mu, Yi; Wang, Jiangning; Guo, Zhengguang; Zhang, Dan; Yin, Jianrui; Wang, Zhigang; Sun, Wei; Zhang, Zhengguo; Gao, Youhe

    2011-01-01

    Urine is an important source of biomarkers. A single proteomics assay can identify hundreds of differentially expressed proteins between disease and control samples; however, the ability to select biomarker candidates with the most promise for further validation study remains difficult. A bioinformatics tool that allows accurate and convenient comparison of all of the existing related studies can markedly aid the development of this area. In this study, we constructed the Urinary Protein Biomarker (UPB) database to collect existing studies of urinary protein biomarkers from published literature. To ensure the quality of data collection, all literature was manually curated. The website (http://122.70.220.102/biomarker) allows users to browse the database by disease categories and search by protein IDs in bulk. Researchers can easily determine whether a biomarker candidate has already been identified by another group for the same disease or for other diseases, which allows for the confidence and disease specificity of their biomarker candidate to be evaluated. Additionally, the pathophysiological processes of the diseases can be studied using our database with the hypothesis that diseases that share biomarkers may have the same pathophysiological processes. Because of the natural relationship between urinary proteins and the urinary system, this database may be especially suitable for studying the pathogenesis of urological diseases. Currently, the database contains 553 and 275 records compiled from 174 and 31 publications of human and animal studies, respectively. We found that biomarkers identified by different proteomic methods had a poor overlap with each other. The differences between sample preparation and separation methods, mass spectrometers, and data analysis algorithms may be influencing factors. Biomarkers identified from animal models also overlapped poorly with those from human samples, but the overlap rate was not lower than that of human proteomics

  20. Curation accuracy of model organism databases.

    PubMed

    Keseler, Ingrid M; Skrzypek, Marek; Weerasinghe, Deepika; Chen, Albert Y; Fulcher, Carol; Li, Gene-Wei; Lemmer, Kimberly C; Mladinich, Katherine M; Chow, Edmond D; Sherlock, Gavin; Karp, Peter D

    2014-01-01

    Manual extraction of information from the biomedical literature-or biocuration-is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org// PMID:24923819

  1. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot

    PubMed Central

    Lima, Tania; Auchincloss, Andrea H.; Coudert, Elisabeth; Keller, Guillaume; Michoud, Karine; Rivoire, Catherine; Bulliard, Virginie; de Castro, Edouard; Lachaize, Corinne; Baratin, Delphine; Phan, Isabelle; Bougueleret, Lydie; Bairoch, Amos

    2009-01-01

    The growth in the number of completely sequenced microbial genomes (bacterial and archaeal) has generated a need for a procedure that provides UniProtKB/Swiss-Prot-quality annotation to as many protein sequences as possible. We have devised a semi-automated system, HAMAP (High-quality Automated and Manual Annotation of microbial Proteomes), that uses manually built annotation templates for protein families to propagate annotation to all members of manually defined protein families, using very strict criteria. The HAMAP system is composed of two databases, the proteome database and the family database, and of an automatic annotation pipeline. The proteome database comprises biological and sequence information for each completely sequenced microbial proteome, and it offers several tools for CDS searches, BLAST options and retrieval of specific sets of proteins. The family database currently comprises more than 1500 manually curated protein families and their annotation templates that are used to annotate proteins that belong to one of the HAMAP families. On the HAMAP website, individual sequences as well as whole genomes can be scanned against all HAMAP families. The system provides warnings for the absence of conserved amino acid residues, unusual sequence length, etc. Thanks to the implementation of HAMAP, more than 200 000 microbial proteins have been fully annotated in UniProtKB/Swiss-Prot (HAMAP website: http://www.expasy.org/sprot/hamap). PMID:18849571

  2. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot.

    PubMed

    Lima, Tania; Auchincloss, Andrea H; Coudert, Elisabeth; Keller, Guillaume; Michoud, Karine; Rivoire, Catherine; Bulliard, Virginie; de Castro, Edouard; Lachaize, Corinne; Baratin, Delphine; Phan, Isabelle; Bougueleret, Lydie; Bairoch, Amos

    2009-01-01

    The growth in the number of completely sequenced microbial genomes (bacterial and archaeal) has generated a need for a procedure that provides UniProtKB/Swiss-Prot-quality annotation to as many protein sequences as possible. We have devised a semi-automated system, HAMAP (High-quality Automated and Manual Annotation of microbial Proteomes), that uses manually built annotation templates for protein families to propagate annotation to all members of manually defined protein families, using very strict criteria. The HAMAP system is composed of two databases, the proteome database and the family database, and of an automatic annotation pipeline. The proteome database comprises biological and sequence information for each completely sequenced microbial proteome, and it offers several tools for CDS searches, BLAST options and retrieval of specific sets of proteins. The family database currently comprises more than 1500 manually curated protein families and their annotation templates that are used to annotate proteins that belong to one of the HAMAP families. On the HAMAP website, individual sequences as well as whole genomes can be scanned against all HAMAP families. The system provides warnings for the absence of conserved amino acid residues, unusual sequence length, etc. Thanks to the implementation of HAMAP, more than 200,000 microbial proteins have been fully annotated in UniProtKB/Swiss-Prot (HAMAP website: http://www.expasy.org/sprot/hamap). PMID:18849571

  3. VaDE: a manually curated database of reproducible associations between various traits and human genomic polymorphisms.

    PubMed

    Nagai, Yoko; Takahashi, Yasuko; Imanishi, Tadashi

    2015-01-01

    Genome-wide association studies (GWASs) have identified numerous single nucleotide polymorphisms (SNPs) associated with the development of common diseases. However, it is clear that genetic risk factors of common diseases are heterogeneous among human populations. Therefore, we developed a database of genomic polymorphisms that are reproducibly associated with disease susceptibilities, drug responses and other traits for each human population: 'VarySysDB Disease Edition' (VaDE; http://bmi-tokai.jp/VaDE/). SNP-trait association data were obtained from the National Human Genome Research Institute GWAS (NHGRI GWAS) catalog and RAvariome, and we added detailed information of sample populations by curating original papers. In addition, we collected and curated original papers, and registered the detailed information of SNP-trait associations in VaDE. Then, we evaluated reproducibility of associations in each population by counting the number of significantly associated studies. VaDE provides literature-based SNP-trait association data and functional genomic region annotation for SNP functional research. SNP functional annotation data included experimental data of the ENCODE project, H-InvDB transcripts and the 1000 Genome Project. A user-friendly web interface was developed to assist quick search, easy download and fast swapping among viewers. We believe that our database will contribute to the future establishment of personalized medicine and increase our understanding of genetic factors underlying diseases. PMID:25361969

  4. BIAdb: A curated database of benzylisoquinoline alkaloids

    PubMed Central

    2010-01-01

    Background Benzylisoquinoline is the structural backbone of many alkaloids with a wide variety of structures including papaverine, noscapine, codeine, morphine, apomorphine, berberine, protopine and tubocurarine. Many benzylisoquinoline alkaloids have been reported to show therapeutic properties and to act as novel medicines. Thus it is important to collect and compile benzylisoquinoline alkaloids in order to explore their usage in medicine. Description We extract information about benzylisoquinoline alkaloids from various sources like PubChem, KEGG, KNApSAcK and manual curation from literature. This information was processed and compiled in order to create a comprehensive database of benzylisoquinoline alkaloids, called BIAdb. The current version of BIAdb contains information about 846 unique benzylisoquinoline alkaloids, with multiple entries in term of source, function leads to total number of 2504 records. One of the major features of this database is that it provides data about 627 different plant species as a source of benzylisoquinoline and 114 different types of function performed by these compounds. A large number of online tools have been integrated, which facilitate user in exploring full potential of BIAdb. In order to provide additional information, we give external links to other resources/databases. One of the important features of this database is that it is tightly integrated with Drugpedia, which allows managing data in fixed/flexible format. Conclusions A database of benzylisoquinoline compounds has been created, which provides comprehensive information about benzylisoquinoline alkaloids. This database will be very useful for those who are working in the field of drug discovery based on natural products. This database will also serve researchers working in the field of synthetic biology, as developing medicinally important alkaloids using synthetic process are one of important challenges. This database is available from http

  5. Activity, assay and target data curation and quality in the ChEMBL database.

    PubMed

    Papadatos, George; Gaulton, Anna; Hersey, Anne; Overington, John P

    2015-09-01

    The emergence of a number of publicly available bioactivity databases, such as ChEMBL, PubChem BioAssay and BindingDB, has raised awareness about the topics of data curation, quality and integrity. Here we provide an overview and discussion of the current and future approaches to activity, assay and target data curation of the ChEMBL database. This curation process involves several manual and automated steps and aims to: (1) maximise data accessibility and comparability; (2) improve data integrity and flag outliers, ambiguities and potential errors; and (3) add further curated annotations and mappings thus increasing the usefulness and accuracy of the ChEMBL data for all users and modellers in particular. Issues related to activity, assay and target data curation and integrity along with their potential impact for users of the data are discussed, alongside robust selection and filter strategies in order to avoid or minimise these, depending on the desired application. PMID:26201396

  6. DIDA: A curated and annotated digenic diseases database.

    PubMed

    Gazzo, Andrea M; Daneels, Dorien; Cilia, Elisa; Bonduelle, Maryse; Abramowicz, Marc; Van Dooren, Sonia; Smits, Guillaume; Lenaerts, Tom

    2016-01-01

    DIDA (DIgenic diseases DAtabase) is a novel database that provides for the first time detailed information on genes and associated genetic variants involved in digenic diseases, the simplest form of oligogenic inheritance. The database is accessible via http://dida.ibsquare.be and currently includes 213 digenic combinations involved in 44 different digenic diseases. These combinations are composed of 364 distinct variants, which are distributed over 136 distinct genes. The web interface provides browsing and search functionalities, as well as documentation and help pages, general database statistics and references to the original publications from which the data have been collected. The possibility to submit novel digenic data to DIDA is also provided. Creating this new repository was essential as current databases do not allow one to retrieve detailed records regarding digenic combinations. Genes, variants, diseases and digenic combinations in DIDA are annotated with manually curated information and information mined from other online resources. Next to providing a unique resource for the development of new analysis methods, DIDA gives clinical and molecular geneticists a tool to find the most comprehensive information on the digenic nature of their diseases of interest. PMID:26481352

  7. A Curated Database of Rodent Uterotrophic Bioactivity

    PubMed Central

    Kleinstreuer, Nicole C.; Ceger, Patricia C.; Allen, David G.; Strickland, Judy; Chang, Xiaoqing; Hamm, Jonathan T.; Casey, Warren M.

    2015-01-01

    Background: Novel in vitro methods are being developed to identify chemicals that may interfere with estrogen receptor (ER) signaling, but the results are difficult to put into biological context because of reliance on reference chemicals established using results from other in vitro assays and because of the lack of high-quality in vivo reference data. The Organisation for Economic Co-operation and Development (OECD)-validated rodent uterotrophic bioassay is considered the “gold standard” for identifying potential ER agonists. Objectives: We performed a comprehensive literature review to identify and evaluate data from uterotrophic studies and to analyze study variability. Methods: We reviewed 670 articles with results from 2,615 uterotrophic bioassays using 235 unique chemicals. Study descriptors, such as species/strain, route of administration, dosing regimen, lowest effect level, and test outcome, were captured in a database of uterotrophic results. Studies were assessed for adherence to six criteria that were based on uterotrophic regulatory test guidelines. Studies meeting all six criteria (458 bioassays on 118 unique chemicals) were considered guideline-like (GL) and were subsequently analyzed. Results: The immature rat model was used for 76% of the GL studies. Active outcomes were more prevalent across rat models (74% active) than across mouse models (36% active). Of the 70 chemicals with at least two GL studies, 18 (26%) had discordant outcomes and were classified as both active and inactive. Many discordant results were attributable to differences in study design (e.g., injection vs. oral dosing). Conclusions: This uterotrophic database provides a valuable resource for understanding in vivo outcome variability and for evaluating the performance of in vitro assays that measure estrogenic activity. Citation: Kleinstreuer NC, Ceger PC, Allen DG, Strickland J, Chang X, Hamm JT, Casey WM. 2016. A curated database of rodent uterotrophic bioactivity. Environ

  8. Manual classification strategies in the ECOD database

    PubMed Central

    Cheng, Hua; Liao, Yuxing; Schaeffer, R. Dustin; Grishin, Nick V.

    2015-01-01

    ECOD (Evolutionary Classification Of protein Domains) is a comprehensive and up-to-date protein structure classification database. The majority of new structures released from the PDB (Protein Data Bank) every week already have close homologs in the ECOD hierarchy and thus can be reliably partitioned into domains and classified by software without manual intervention. However, those proteins that lack confidently detectable homologs require careful analysis by experts. Although many bioinformatics resources rely on expert curation to some degree, specific examples of how this curation occurs and in what cases it is necessary are not always described. Here, we illustrate the manual classification strategy in ECOD by example, focusing on two major issues in protein classification: domain partitioning and the relationship between homology and similarity scores. Most examples show recently released and manually classified PDB structures. We discuss multi-domain proteins, discordance between sequence and structural similarities, difficulties with assessing homology with scores, and integral membrane proteins homologous to soluble proteins. By timely assimilation of newly available structures into its hierarchy, ECOD strives to provide a most accurate and updated view of the protein structure world as a result of combined computational and expert-driven analysis. PMID:25917548

  9. Assisting manual literature curation for protein–protein interactions using BioQRator

    PubMed Central

    Kwon, Dongseop; Kim, Sun; Shin, Soo-Yong; Chatr-aryamontri, Andrew; Wilbur, W. John

    2014-01-01

    The time-consuming nature of manual curation and the rapid growth of biomedical literature severely limit the number of articles that database curators can scrutinize and annotate. Hence, semi-automatic tools can be a valid support to increase annotation throughput. Although a handful of curation assistant tools are already available, to date, little has been done to formally evaluate their benefit to biocuration. Moreover, most curation tools are designed for specific problems. Thus, it is not easy to apply an annotation tool for multiple tasks. BioQRator is a publicly available web-based tool for annotating biomedical literature. It was designed to support general tasks, i.e. any task annotating entities and relationships. In the BioCreative IV edition, BioQRator was tailored for protein– protein interaction (PPI) annotation by migrating information from PIE the search. The results obtained from six curators showed that the precision on the top 10 documents doubled with PIE the search compared with PubMed search results. It was also observed that the annotation time for a full PPI annotation task decreased for a beginner-intermediate level annotator. This finding is encouraging because text-mining techniques were not directly involved in the full annotation task and BioQRator can be easily integrated with any text-mining resources. Database URL: http://www.bioqrator.org/ PMID:25052701

  10. Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

    PubMed Central

    2010-01-01

    Background Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in

  11. A computational platform to maintain and migrate manual functional annotations for BioCyc databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Model organism databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integratio...

  12. OntoMate: a text-mining tool aiding curation at the Rat Genome Database.

    PubMed

    Liu, Weisong; Laulederkind, Stanley J F; Hayman, G Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu. PMID:25619558

  13. OntoMate: a text-mining tool aiding curation at the Rat Genome Database

    PubMed Central

    Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558

  14. The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases.

    PubMed

    Orchard, Sandra; Ammari, Mais; Aranda, Bruno; Breuza, Lionel; Briganti, Leonardo; Broackes-Carter, Fiona; Campbell, Nancy H; Chavali, Gayatri; Chen, Carol; del-Toro, Noemi; Duesbury, Margaret; Dumousseau, Marine; Galeota, Eugenia; Hinz, Ursula; Iannuccelli, Marta; Jagannathan, Sruthi; Jimenez, Rafael; Khadake, Jyoti; Lagreid, Astrid; Licata, Luana; Lovering, Ruth C; Meldal, Birgit; Melidoni, Anna N; Milagros, Mila; Peluso, Daniele; Perfetto, Livia; Porras, Pablo; Raghunath, Arathi; Ricard-Blum, Sylvie; Roechert, Bernd; Stutz, Andre; Tognolli, Michael; van Roey, Kim; Cesareni, Gianni; Hermjakob, Henning

    2014-01-01

    IntAct (freely available at http://www.ebi.ac.uk/intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMEx- and MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriate levels of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium (http://www.imexconsortium.org). PMID:24234451

  15. Curating and Preserving the Big Canopy Database System: an Active Curation Approach using SEAD

    NASA Astrophysics Data System (ADS)

    Myers, J.; Cushing, J. B.; Lynn, P.; Weiner, N.; Ovchinnikova, A.; Nadkarni, N.; McIntosh, A.

    2015-12-01

    Modern research is increasingly dependent upon highly heterogeneous data and on the associated cyberinfrastructure developed to organize, analyze, and visualize that data. However, due to the complexity and custom nature of such combined data-software systems, it can be very challenging to curate and preserve them for the long term at reasonable cost and in a way that retains their scientific value. In this presentation, we describe how this challenge was met in preserving the Big Canopy Database (CanopyDB) system using an agile approach and leveraging the Sustainable Environment - Actionable Data (SEAD) DataNet project's hosted data services. The CanopyDB system was developed over more than a decade at Evergreen State College to address the needs of forest canopy researchers. It is an early yet sophisticated exemplar of the type of system that has become common in biological research and science in general, including multiple relational databases for different experiments, a custom database generation tool used to create them, an image repository, and desktop and web tools to access, analyze, and visualize this data. SEAD provides secure project spaces with a semantic content abstraction (typed content with arbitrary RDF metadata statements and relationships to other content), combined with a standards-based curation and publication pipeline resulting in packaged research objects with Digital Object Identifiers. Using SEAD, our cross-project team was able to incrementally ingest CanopyDB components (images, datasets, software source code, documentation, executables, and virtualized services) and to iteratively define and extend the metadata and relationships needed to document them. We believe that both the process, and the richness of the resultant standards-based (OAI-ORE) preservation object, hold lessons for the development of best-practice solutions for preserving scientific data in association with the tools and services needed to derive value from it.

  16. A Manual Curation Strategy to Improve Genome Annotation: Application to a Set of Haloarchael Genomes

    PubMed Central

    Pfeiffer, Friedhelm; Oesterhelt, Dieter

    2015-01-01

    Genome annotation errors are a persistent problem that impede research in the biosciences. A manual curation effort is described that attempts to produce high-quality genome annotations for a set of haloarchaeal genomes (Halobacterium salinarum and Hbt. hubeiense, Haloferax volcanii and Hfx. mediterranei, Natronomonas pharaonis and Nmn. moolapensis, Haloquadratum walsbyi strains HBSQ001 and C23, Natrialba magadii, Haloarcula marismortui and Har. hispanica, and Halohasta litchfieldiae). Genomes are checked for missing genes, start codon misassignments, and disrupted genes. Assignments of a specific function are preferably based on experimentally characterized homologs (Gold Standard Proteins). To avoid overannotation, which is a major source of database errors, we restrict annotation to only general function assignments when support for a specific substrate assignment is insufficient. This strategy results in annotations that are resistant to the plethora of errors that compromise public databases. Annotation consistency is rigorously validated for ortholog pairs from the genomes surveyed. The annotation is regularly crosschecked against the UniProt database to further improve annotations and increase the level of standardization. Enhanced genome annotations are submitted to public databases (EMBL/GenBank, UniProt), to the benefit of the scientific community. The enhanced annotations are also publically available via HaloLex. PMID:26042526

  17. VIRsiRNAdb: a curated database of experimentally validated viral siRNA/shRNA

    PubMed Central

    Thakur, Nishant; Qureshi, Abid; Kumar, Manoj

    2012-01-01

    RNAi technology has been emerging as a potential modality to inhibit viruses during past decade. In literature a few siRNA databases have been reported that focus on targeting human and mammalian genes but experimentally validated viral siRNA databases are lacking. We have developed VIRsiRNAdb, a manually curated database having comprehensive details of 1358 siRNA/shRNA targeting viral genome regions. Further, wherever available, information regarding alternative efficacies of above 300 siRNAs derived from different assays has also been incorporated. Important fields included in the database are siRNA sequence, virus subtype, target genome region, cell type, target object, experimental assay, efficacy, off-target and siRNA matching with reference viral sequences. Database also provides the users with facilities of advance search, browsing, data submission, linking to external databases and useful siRNA analysis tools especially siTarAlign which align the siRNA with reference viral genomes or user defined sequences. VIRsiRNAdb contains extensive details of siRNA/shRNA targeting 42 important human viruses including influenza virus, hepatitis B virus, HPV and SARS Corona virus. VIRsiRNAdb would prove useful for researchers in picking up the best viral siRNA for antiviral therapeutics development and also for developing better viral siRNA design tools. The database is freely available at http://crdd.osdd.net/servers/virsirnadb. PMID:22139916

  18. HPIDB 2.0: a curated database for host-pathogen interactions.

    PubMed

    Ammari, Mais G; Gresham, Cathy R; McCarthy, Fiona M; Nanduri, Bindu

    2016-01-01

    Identification and analysis of host-pathogen interactions (HPI) is essential to study infectious diseases. However, HPI data are sparse in existing molecular interaction databases, especially for agricultural host-pathogen systems. Therefore, resources that annotate, predict and display the HPI that underpin infectious diseases are critical for developing novel intervention strategies. HPIDB 2.0 (http://www.agbase.msstate.edu/hpi/main.html) is a resource for HPI data, and contains 45, 238 manually curated entries in the current release. Since the first description of the database in 2010, multiple enhancements to HPIDB data and interface services were made that are described here. Notably, HPIDB 2.0 now provides targeted biocuration of molecular interaction data. As a member of the International Molecular Exchange consortium, annotations provided by HPIDB 2.0 curators meet community standards to provide detailed contextual experimental information and facilitate data sharing. Moreover, HPIDB 2.0 provides access to rapidly available community annotations that capture minimum molecular interaction information to address immediate researcher needs for HPI network analysis. In addition to curation, HPIDB 2.0 integrates HPI from existing external sources and contains tools to infer additional HPI where annotated data are scarce. Compared to other interaction databases, our data collection approach ensures HPIDB 2.0 users access the most comprehensive HPI data from a wide range of pathogens and their hosts (594 pathogen and 70 host species, as of February 2016). Improvements also include enhanced search capacity, addition of Gene Ontology functional information, and implementation of network visualization. The changes made to HPIDB 2.0 content and interface ensure that users, especially agricultural researchers, are able to easily access and analyse high quality, comprehensive HPI data. All HPIDB 2.0 data are updated regularly, are publically available for direct

  19. HPIDB 2.0: a curated database for host–pathogen interactions

    PubMed Central

    Ammari, Mais G.; Gresham, Cathy R.; McCarthy, Fiona M.; Nanduri, Bindu

    2016-01-01

    Identification and analysis of host–pathogen interactions (HPI) is essential to study infectious diseases. However, HPI data are sparse in existing molecular interaction databases, especially for agricultural host–pathogen systems. Therefore, resources that annotate, predict and display the HPI that underpin infectious diseases are critical for developing novel intervention strategies. HPIDB 2.0 (http://www.agbase.msstate.edu/hpi/main.html) is a resource for HPI data, and contains 45, 238 manually curated entries in the current release. Since the first description of the database in 2010, multiple enhancements to HPIDB data and interface services were made that are described here. Notably, HPIDB 2.0 now provides targeted biocuration of molecular interaction data. As a member of the International Molecular Exchange consortium, annotations provided by HPIDB 2.0 curators meet community standards to provide detailed contextual experimental information and facilitate data sharing. Moreover, HPIDB 2.0 provides access to rapidly available community annotations that capture minimum molecular interaction information to address immediate researcher needs for HPI network analysis. In addition to curation, HPIDB 2.0 integrates HPI from existing external sources and contains tools to infer additional HPI where annotated data are scarce. Compared to other interaction databases, our data collection approach ensures HPIDB 2.0 users access the most comprehensive HPI data from a wide range of pathogens and their hosts (594 pathogen and 70 host species, as of February 2016). Improvements also include enhanced search capacity, addition of Gene Ontology functional information, and implementation of network visualization. The changes made to HPIDB 2.0 content and interface ensure that users, especially agricultural researchers, are able to easily access and analyse high quality, comprehensive HPI data. All HPIDB 2.0 data are updated regularly, are publically available for direct

  20. SORGOdb: Superoxide Reductase Gene Ontology curated DataBase

    PubMed Central

    2011-01-01

    Background Superoxide reductases (SOR) catalyse the reduction of superoxide anions to hydrogen peroxide and are involved in the oxidative stress defences of anaerobic and facultative anaerobic organisms. Genes encoding SOR were discovered recently and suffer from annotation problems. These genes, named sor, are short and the transfer of annotations from previously characterized neelaredoxin, desulfoferrodoxin, superoxide reductase and rubredoxin oxidase has been heterogeneous. Consequently, many sor remain anonymous or mis-annotated. Description SORGOdb is an exhaustive database of SOR that proposes a new classification based on domain architecture. SORGOdb supplies a simple user-friendly web-based database for retrieving and exploring relevant information about the proposed SOR families. The database can be queried using an organism name, a locus tag or phylogenetic criteria, and also offers sequence similarity searches using BlastP. Genes encoding SOR have been re-annotated in all available genome sequences (prokaryotic and eukaryotic (complete and in draft) genomes, updated in May 2010). Conclusions SORGOdb contains 325 non-redundant and curated SOR, from 274 organisms. It proposes a new classification of SOR into seven different classes and allows biologists to explore and analyze sor in order to establish correlations between the class of SOR and organism phenotypes. SORGOdb is freely available at http://sorgo.genouest.org/index.php. PMID:21575179

  1. Active Design Database (ADDB) user's manual

    SciTech Connect

    Schwarz, R.L.; Nations, J.A.; Rosser, J.H.

    1991-02-01

    This manual is a guide to the Active Design Database (ADDB) on the Martin Marietta Energy Systems, Inc., IBM 3084 unclassified computer. The ADDB is an index to all CADAM models in the unclassified CADAM database and provides query and report capabilities. Section 2.0 of this manual presents an overview of the ADDB, describing the system's purpose; the functions it performs; hardware, software, and security requirements; and help and error functions. Section 3.0 describes how to access the system and how to operate the system functions using Database 2 (DB2), Time Sharing Option (TSO), and Interactive System Productivity Facility (ISPF) features employed by this system. Appendix A contains a dictionary of data elements maintained by the system. The data values are collected from the unclassified CADAM database. Appendix B provides a printout of the system help and error screens.

  2. PHYMYCO-DB: A Curated Database for Analyses of Fungal Diversity and Evolution

    PubMed Central

    Mahé, Stéphane; Duhamel, Marie; Le Calvez, Thomas; Guillot, Laetitia; Sarbu, Ludmila; Bretaudeau, Anthony; Collin, Olivier; Dufresne, Alexis; Kiers, E. Toby; Vandenkoornhuyse, Philippe

    2012-01-01

    Background In environmental sequencing studies, fungi can be identified based on nucleic acid sequences, using either highly variable sequences as species barcodes or conserved sequences containing a high-quality phylogenetic signal. For the latter, identification relies on phylogenetic analyses and the adoption of the phylogenetic species concept. Such analysis requires that the reference sequences are well identified and deposited in public-access databases. However, many entries in the public sequence databases are problematic in terms of quality and reliability and these data require screening to ensure correct phylogenetic interpretation. Methods and Principal Findings To facilitate phylogenetic inferences and phylogenetic assignment, we introduce a fungal sequence database. The database PHYMYCO-DB comprises fungal sequences from GenBank that have been filtered to satisfy stringent sequence quality criteria. For the first release, two widely used molecular taxonomic markers were chosen: the nuclear SSU rRNA and EF1-α gene sequences. Following the automatic extraction and filtration, a manual curation is performed to remove problematic sequences while preserving relevant sequences useful for phylogenetic studies. As a result of curation, ∼20% of the automatically filtered sequences have been removed from the database. To demonstrate how PHYMYCO-DB can be employed, we test a set of environmental Chytridiomycota sequences obtained from deep sea samples. Conclusion PHYMYCO-DB offers the tools necessary to: (i) extract high quality fungal sequences for each of the 5 fungal phyla, at all taxonomic levels, (ii) extract already performed alignments, to act as ‘reference alignments’, (iii) launch alignments of personal sequences along with stored data. A total of 9120 SSU rRNA and 672 EF1-α high-quality fungal sequences are now available. The PHYMYCO-DB is accessible through the URL http://phymycodb.genouest.org/. PMID:23028445

  3. PMRD: a curated database for genes and mutants involved in plant male reproduction

    PubMed Central

    2012-01-01

    Background Male reproduction is an essential biological event in the plant life cycle separating the diploid sporophyte and haploid gametophyte generations, which involves expression of approximately 20,000 genes. The control of male reproduction is also of economic importance for plant breeding and hybrid seed production. With the advent of forward and reverse genetics and genomic technologies, a large number of male reproduction-related genes have been identified. Thus it is extremely challenging for individual researchers to systematically collect, and continually update, all the available information on genes and mutants related to plant male reproduction. The aim of this study is to manually curate such gene and mutant information and provide a web-accessible resource to facilitate the effective study of plant male reproduction. Description Plant Male Reproduction Database (PMRD) is a comprehensive resource for browsing and retrieving knowledge on genes and mutants related to plant male reproduction. It is based upon literature and biological databases and includes 506 male sterile genes and 484 mutants with defects of male reproduction from a variety of plant species. Based on Gene Ontology (GO) annotations and literature, information relating to a further 3697 male reproduction related genes were systematically collected and included, and using in text curation, gene expression and phenotypic information were captured from the literature. PMRD provides a web interface which allows users to easily access the curated annotations and genomic information, including full names, symbols, locations, sequences, expression patterns, functions of genes, mutant phenotypes, male sterile categories, and corresponding publications. PMRD also provides mini tools to search and browse expression patterns of genes in microarray datasets, run BLAST searches, convert gene ID and generate gene networks. In addition, a Mediawiki engine and a forum have been integrated within the

  4. MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution

    PubMed Central

    Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian

    2015-01-01

    Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. PMID:26286928

  5. MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution.

    PubMed

    Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian

    2015-01-01

    Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. PMID:26286928

  6. RiceWiki: a wiki-based database for community curation of rice genes.

    PubMed

    Zhang, Zhang; Sang, Jian; Ma, Lina; Wu, Gang; Wu, Hao; Huang, Dawei; Zou, Dong; Liu, Siqi; Li, Ang; Hao, Lili; Tian, Ming; Xu, Chao; Wang, Xumin; Wu, Jiayan; Xiao, Jingfa; Dai, Lin; Chen, Ling-Ling; Hu, Songnian; Yu, Jun

    2014-01-01

    Rice is the most important staple food for a large part of the world's human population and also a key model organism for biological studies of crops as well as other related plants. Here we present RiceWiki (http://ricewiki.big.ac.cn), a wiki-based, publicly editable and open-content platform for community curation of rice genes. Most existing related biological databases are based on expert curation; with the exponentially exploding volume of rice knowledge and other relevant data, however, expert curation becomes increasingly laborious and time-consuming to keep knowledge up-to-date, accurate and comprehensive, struggling with the flood of data and requiring a large number of people getting involved in rice knowledge curation. Unlike extant relevant databases, RiceWiki features harnessing collective intelligence in community curation of rice genes, quantifying users' contributions in each curated gene and providing explicit authorship for each contributor in any given gene, with the aim to exploit the full potential of the scientific community for rice knowledge curation. Based on community curation, RiceWiki bears the potential to make it possible to build a rice encyclopedia by and for the scientific community that harnesses community intelligence for collaborative knowledge curation, covers all aspects of biological knowledge and keeps evolving with novel knowledge. PMID:24136999

  7. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences

    PubMed Central

    McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta

    2016-01-01

    BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation. Database URL: https://www.biosharing.org PMID:27189610

  8. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences.

    PubMed

    McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta

    2016-01-01

    BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation.Database URL: https://www.biosharing.org. PMID:27189610

  9. Instruction manual for the Wahoo computerized database

    SciTech Connect

    Lasota, D.; Watts, K.

    1995-05-01

    As part of our research on the Lisburne Group, we have developed a powerful relational computerized database to accommodate the huge amounts of data generated by our multi-disciplinary research project. The Wahoo database has data files on petrographic data, conodont analyses, locality and sample data, well logs and diagenetic (cement) studies. Chapter 5 is essentially an instruction manual that summarizes some of the unique attributes and operating procedures of the Wahoo database. The main purpose of a database is to allow users to manipulate their data and produce reports and graphs for presentation. We present a variety of data tables in appendices at the end of this report, each encapsulating a small part of the data contained in the Wahoo database. All the data are sorted and listed by map index number and stratigraphic position (depth). The Locality data table (Appendix A) lists of the stratigraphic sections examined in our study. It gives names of study areas, stratigraphic units studied, locality information, and researchers. Most localities are keyed to a geologic map that shows the distribution of the Lisburne Group and location of our sections in ANWR. Petrographic reports (Appendix B) are detailed summaries of data the composition and texture of the Lisburne Group carbonates. The relative abundance of different carbonate grains (allochems) and carbonate texture are listed using symbols that portray data in a format similar to stratigraphic columns. This enables researchers to recognize trends in the evolution of the Lisburne carbonate platform and to check their paleoenvironmental interpretations in a stratigraphic context. Some of the figures in Chapter 1 were made using the Wahoo database.

  10. Geroprotectors.org: a new, structured and curated database of current therapeutic interventions in aging and age-related disease

    PubMed Central

    Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E.; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C.; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex

    2015-01-01

    As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions. PMID:26342919

  11. National Solar Radiation Database 1991-2010 Update: User's Manual

    SciTech Connect

    Wilcox, S. M.

    2012-08-01

    This user's manual provides information on the updated 1991-2010 National Solar Radiation Database. Included are data format descriptions, data sources, production processes, and information about data uncertainty.

  12. National Solar Radiation Database 1991-2005 Update: User's Manual

    SciTech Connect

    Wilcox, S.

    2007-04-01

    This manual describes how to obtain and interpret the data products from the updated 1991-2005 National Solar Radiation Database (NSRDB). This is an update of the original 1961-1990 NSRDB released in 1992.

  13. A CTD–Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug–disease and drug–phenotype interactions

    PubMed Central

    Davis, Allan Peter; Wiegers, Thomas C.; Roberts, Phoebe M.; King, Benjamin L.; Lay, Jean M.; Lennon-Hopkins, Kelley; Sciaky, Daniela; Johnson, Robin; Keating, Heather; Greene, Nigel; Hernandez, Robert; McConnell, Kevin J.; Enayetallah, Ahmed E.; Mattingly, Carolyn J.

    2013-01-01

    Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88 629 articles relating over 1 200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 2 54 173 toxicogenomic interactions (1 52 173 chemical–disease, 58 572 chemical–gene, 5 345 gene–disease and 38 083 phenotype interactions). All chemical–gene–disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer’s text-mining process to collate the articles, and CTD’s curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug–disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades’ worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/ PMID:24288140

  14. Hydrologic database user`s manual

    SciTech Connect

    Champman, J.B.; Gray, K.J.; Thompson, C.B.

    1993-09-01

    The Hydrologic Database is an electronic filing cabinet containing water-related data for the Nevada Test Site (NTS). The purpose of the database is to enhance research on hydrologic issues at the NTS by providing efficient access to information gathered by a variety of scientists. Data are often generated for specific projects and are reported to DOE in the context of specific project goals. The originators of the database recognized that much of this information has a general value that transcends project-specific requirements. Allowing researchers access to information generated by a wide variety of projects can prevent needless duplication of data-gathering efforts and can augment new data collection and interpretation. In addition, collecting this information in the database ensures that the results are not lost at the end of discrete projects as long as the database is actively maintained. This document is a guide to using the database.

  15. DEPOT database: Reference manual and user's guide

    SciTech Connect

    Clancey, P.; Logg, C.

    1991-03-01

    DEPOT has been developed to provide tracking for the Stanford Linear Collider (SLC) control system equipment. For each piece of equipment entered into the database, complete location, service, maintenance, modification, certification, and radiation exposure histories can be maintained. To facilitate data entry accuracy, efficiency, and consistency, barcoding technology has been used extensively. DEPOT has been an important tool in improving the reliability of the microsystems controlling SLC. This document describes the components of the DEPOT database, the elements in the database records, and the use of the supporting programs for entering data, searching the database, and producing reports from the information.

  16. Human and chicken TLR pathways: manual curation and computer-based orthology analysis

    PubMed Central

    Gillespie, Marc; Shamovsky, Veronica; D’Eustachio, Peter

    2011-01-01

    The innate immune responses mediated by Toll-like receptors (TLR) provide an evolutionarily well-conserved first line of defense against microbial pathogens. In the Reactome Knowledgebase we previously integrated annotations of human TLR molecular functions with those of over 4000 other human proteins involved in processes such as adaptive immunity, DNA replication, signaling, and intermediary metabolism, and have linked these annotations to external resources, including PubMed, UniProt, EntrezGene, Ensembl, and the Gene Ontology to generate a resource suitable for data mining, pathway analysis, and other systems biology approaches. We have now used a combination of manual expert curation and computer-based orthology analysis to generate a set of annotations for TLR molecular function in the chicken (Gallus gallus). Mammalian and avian lineages diverged approximately 300 million years ago, and the avian TLR repertoire consists of both orthologs and distinct new genes. The work described here centers on the molecular biology of TLR3, the host receptor that mediates responses to viral and other doubled-stranded polynucleotides, as a paradigm for our approach to integrated manual and computationally based annotation and data analysis. It tests the quality of computationally generated annotations projected from human onto other species and supports a systems biology approach to analysis of virus-activated signaling pathways and identification of clinically useful antiviral measures. PMID:21052677

  17. mycoCLAP, the database for characterized lignocellulose-active proteins of fungal origin: resource and text mining curation support

    PubMed Central

    Strasser, Kimchi; McDonnell, Erin; Nyaga, Carol; Wu, Min; Wu, Sherry; Almeida, Hayda; Meurs, Marie-Jean; Kosseim, Leila; Powlowski, Justin; Butler, Greg; Tsang, Adrian

    2015-01-01

    Enzymes active on components of lignocellulosic biomass are used for industrial applications ranging from food processing to biofuels production. These include a diverse array of glycoside hydrolases, carbohydrate esterases, polysaccharide lyases and oxidoreductases. Fungi are prolific producers of these enzymes, spurring fungal genome sequencing efforts to identify and catalogue the genes that encode them. To facilitate the functional annotation of these genes, biochemical data on over 800 fungal lignocellulose-degrading enzymes have been collected from the literature and organized into the searchable database, mycoCLAP (http://mycoclap.fungalgenomics.ca). First implemented in 2011, and updated as described here, mycoCLAP is capable of ranking search results according to closest biochemically characterized homologues: this improves the quality of the annotation, and significantly decreases the time required to annotate novel sequences. The database is freely available to the scientific community, as are the open source applications based on natural language processing developed to support the manual curation of mycoCLAP. Database URL: http://mycoclap.fungalgenomics.ca PMID:25754864

  18. The SIB Swiss Institute of Bioinformatics’ resources: focus on curated databases

    PubMed Central

    2016-01-01

    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article. PMID:26615188

  19. The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases.

    PubMed

    2016-01-01

    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article. PMID:26615188

  20. CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by 13C-fluxomics.

    PubMed

    Zhang, Zhengdong; Shen, Tie; Rui, Bin; Zhou, Wenwei; Zhou, Xiangfei; Shang, Chuanyu; Xin, Chenwei; Liu, Xiaoguang; Li, Gang; Jiang, Jiansi; Li, Chao; Li, Ruiyuan; Han, Mengshu; You, Shanping; Yu, Guojun; Yi, Yin; Wen, Han; Liu, Zhijie; Xie, Xiaoyao

    2015-01-01

    The Central Carbon Metabolic Flux Database (CeCaFDB, available at http://www.cecafdb.org) is a manually curated, multipurpose and open-access database for the documentation, visualization and comparative analysis of the quantitative flux results of central carbon metabolism among microbes and animal cells. It encompasses records for more than 500 flux distributions among 36 organisms and includes information regarding the genotype, culture medium, growth conditions and other specific information gathered from hundreds of journal articles. In addition to its comprehensive literature-derived data, the CeCaFDB supports a common text search function among the data and interactive visualization of the curated flux distributions with compartmentation information based on the Cytoscape Web API, which facilitates data interpretation. The CeCaFDB offers four modules to calculate a similarity score or to perform an alignment between the flux distributions. One of the modules was built using an inter programming algorithm for flux distribution alignment that was specifically designed for this study. Based on these modules, the CeCaFDB also supports an extensive flux distribution comparison function among the curated data. The CeCaFDB is strenuously designed to address the broad demands of biochemists, metabolic engineers, systems biologists and members of the -omics community. PMID:25392417

  1. Strategies for annotation and curation of translational databases: the eTUMOUR project.

    PubMed

    Julià-Sapé, Margarida; Lurgi, Miguel; Mier, Mariola; Estanyol, Francesc; Rafael, Xavier; Candiota, Ana Paula; Barceló, Anna; García, Alina; Martínez-Bisbal, M Carmen; Ferrer-Luna, Rubén; Moreno-Torres, Ángel; Celda, Bernardo; Arús, Carles

    2012-01-01

    The eTUMOUR (eT) multi-centre project gathered in vivo and ex vivo magnetic resonance (MR) data, as well as transcriptomic and clinical information from brain tumour patients, with the purpose of improving the diagnostic and prognostic evaluation of future patients. In order to carry this out, among other work, a database--the eTDB--was developed. In addition to complex permission rules and software and management quality control (QC), it was necessary to develop anonymization, processing and data visualization tools for the data uploaded. It was also necessary to develop sophisticated curation strategies that involved on one hand, dedicated fields for QC-generated meta-data and specialized queries and global permissions for senior curators and on the other, to establish a set of metrics to quantify its contents. The indispensable dataset (ID), completeness and pairedness indices were set. The database contains 1317 cases created as a result of the eT project and 304 from a previous project, INTERPRET. The number of cases fulfilling the ID was 656. Completeness and pairedness were heterogeneous, depending on the data type involved. PMID:23180768

  2. Laminin database: a tool to retrieve high-throughput and curated data for studies on laminins.

    PubMed

    Golbert, Daiane C F; Linhares-Lacerda, Leandra; Almeida, Luiz G; Correa-de-Santana, Eliane; de Oliveira, Alice R; Mundstein, Alex S; Savino, Wilson; de Vasconcelos, Ana T R

    2011-01-01

    The Laminin(LM)-database, hosted at http://www.lm.lncc.br, is the first database focusing a non-collagenous extracellular matrix protein family, the LMs. Part of the knowledge available in this website is automatically retrieved, whereas a significant amount of information is curated and annotated, thus placing LM-database beyond a simple repository of data. In its home page, an overview of the rationale for the database is seen and readers can access a tutorial to facilitate navigation in the website, which in turn is presented with tabs subdivided into LMs, receptors, extracellular binding and other related proteins. Each tab opens into a given LM or LM-related molecule, where the reader finds a series of further tabs for 'protein', 'gene structure', 'gene expression' and 'tissue distribution' and 'therapy'. Data are separated as a function of species, comprising Homo sapiens, Mus musculus and Rattus novergicus. Furthermore, there is specific tab displaying the LM nomenclatures. In another tab, a direct link to PubMed, which can be then consulted in a specific way, in terms of the biological functions of each molecule, knockout animals and genetic diseases, immune response and lymphomas/leukemias. LM-database will hopefully be a relevant tool for retrieving information concerning LMs in health and disease, particularly regarding the hemopoietic system. PMID:21087995

  3. The Developmental Brain Disorders Database (DBDB): A Curated Neurogenetics Knowledge Base With Clinical and Research Applications

    PubMed Central

    Mirzaa, Ghayda M.; Millen, Kathleen J.; Barkovich, A. James; Dobyns, William B.; Paciorkowski, Alex R.

    2014-01-01

    The number of single genes associated with neurodevelopmental disorders has increased dramatically over the past decade. The identification of causative genes for these disorders is important to clinical outcome as it allows for accurate assessment of prognosis, genetic counseling, delineation of natural history, inclusion in clinical trials, and in some cases determines therapy. Clinicians face the challenge of correctly identifying neurodevelopmental phenotypes, recognizing syndromes, and prioritizing the best candidate genes for testing. However, there is no central repository of definitions for many phenotypes, leading to errors of diagnosis. Additionally, there is no system of levels of evidence linking genes to phenotypes, making it difficult for clinicians to know which genes are most strongly associated with a given condition. We have developed the Developmental Brain Disorders Database (DBDB: https://www.dbdb.urmc.rochester.edu/home), a publicly available, online-curated repository of genes, phenotypes, and syndromes associated with neurodevelopmental disorders. DBDB contains the first referenced ontology of developmental brain phenotypes, and uses a novel system of levels of evidence for gene-phenotype associations. It is intended to assist clinicians in arriving at the correct diagnosis, select the most appropriate genetic test for that phenotype, and improve the care of patients with developmental brain disorders. For researchers interested in the discovery of novel genes for developmental brain disorders, DBDB provides a well-curated source of important genes against which research sequencing results can be compared. Finally, DBDB allows novel observations about the landscape of the neurogenetics knowledge base. PMID:24700709

  4. Strategies for annotation and curation of translational databases: the eTUMOUR project

    PubMed Central

    Julià-Sapé, Margarida; Lurgi, Miguel; Mier, Mariola; Estanyol, Francesc; Rafael, Xavier; Candiota, Ana Paula; Barceló, Anna; García, Alina; Martínez-Bisbal, M. Carmen; Ferrer-Luna, Rubén; Moreno-Torres, Àngel; Celda, Bernardo; Arús, Carles

    2012-01-01

    The eTUMOUR (eT) multi-centre project gathered in vivo and ex vivo magnetic resonance (MR) data, as well as transcriptomic and clinical information from brain tumour patients, with the purpose of improving the diagnostic and prognostic evaluation of future patients. In order to carry this out, among other work, a database—the eTDB—was developed. In addition to complex permission rules and software and management quality control (QC), it was necessary to develop anonymization, processing and data visualization tools for the data uploaded. It was also necessary to develop sophisticated curation strategies that involved on one hand, dedicated fields for QC-generated meta-data and specialized queries and global permissions for senior curators and on the other, to establish a set of metrics to quantify its contents. The indispensable dataset (ID), completeness and pairedness indices were set. The database contains 1317 cases created as a result of the eT project and 304 from a previous project, INTERPRET. The number of cases fulfilling the ID was 656. Completeness and pairedness were heterogeneous, depending on the data type involved. PMID:23180768

  5. NCG 5.0: updates of a manually curated repository of cancer genes and associated properties from cancer mutational screenings

    PubMed Central

    An, Omer; Dall'Olio, Giovanni M.; Mourikis, Thanos P.; Ciccarelli, Francesca D.

    2016-01-01

    The Network of Cancer Genes (NCG, http://ncg.kcl.ac.uk/) is a manually curated repository of cancer genes derived from the scientific literature. Due to the increasing amount of cancer genomic data, we have introduced a more robust procedure to extract cancer genes from published cancer mutational screenings and two curators independently reviewed each publication. NCG release 5.0 (August 2015) collects 1571 cancer genes from 175 published studies that describe 188 mutational screenings of 13 315 cancer samples from 49 cancer types and 24 primary sites. In addition to collecting cancer genes, NCG also provides information on the experimental validation that supports the role of these genes in cancer and annotates their properties (duplicability, evolutionary origin, expression profile, function and interactions with proteins and miRNAs). PMID:26516186

  6. ProPepper: a curated database for identification and analysis of peptide and immune-responsive epitope composition of cereal grain protein families

    PubMed Central

    Juhász, Angéla; Haraszi, Réka; Maulis, Csaba

    2015-01-01

    ProPepper is a database that contains prolamin proteins identified from true grasses (Poaceae), their peptides obtained with single- and multi-enzyme in silico digestions as well as linear T- and B-cell-specific epitopes that are responsible for wheat-related food disorders. The integrated database and analysis platform contains datasets that are collected from multiple public databases (UniprotKB, IEDB, NCBI GenBank), manually curated and annotated, and interpreted in three main data tables: Protein-, Peptide- and Epitope list views that are cross-connected by unique identifications. Altogether 21 genera and 80 different species are represented. Currently, the database contains 2146 unique and complete protein sequences related to 2618 GenBank entries and 35 657 unique peptide sequences that are a result of 575 110 unique digestion events obtained by in silico digestion methods involving six proteolytic enzymes and their combinations. The interface allows advanced global and parametric search functions along with a download option, with direct connections to the relevant public databases. Database URL: https://propepper.net PMID:26450949

  7. Biomedical text summarization to support genetic database curation: using Semantic MEDLINE to create a secondary database of genetic information

    PubMed Central

    Fiszman, Marcelo; Hurdle, John F; Rindflesch, Thomas C

    2010-01-01

    Objective: This paper examines the development and evaluation of an automatic summarization system in the domain of molecular genetics. The system is a potential component of an advanced biomedical information management application called Semantic MEDLINE and could assist librarians in developing secondary databases of genetic information extracted from the primary literature. Methods: An existing summarization system was modified for identifying biomedical text relevant to the genetic etiology of disease. The summarization system was evaluated on the task of identifying data describing genes associated with bladder cancer in MEDLINE citations. A gold standard was produced using records from Genetics Home Reference and Online Mendelian Inheritance in Man. Genes in text found by the system were compared to the gold standard. Recall, precision, and F-measure were calculated. Results: The system achieved recall of 46%, and precision of 88% (F-measure = 0.61) by taking Gene References into Function (GeneRIFs) into account. Conclusion: The new summarization schema for genetic etiology has potential as a component in Semantic MEDLINE to support the work of data curators. PMID:20936065

  8. EpimiR: a database of curated mutual regulation between miRNAs and epigenetic modifications.

    PubMed

    Dai, Enyu; Yu, Xuexin; Zhang, Yan; Meng, Fanlin; Wang, Shuyuan; Liu, Xinyi; Liu, Dianming; Wang, Jing; Li, Xia; Jiang, Wei

    2014-01-01

    As two kinds of important gene expression regulators, both epigenetic modification and microRNA (miRNA) can play significant roles in a wide range of human diseases. Recently, many studies have demonstrated that epigenetics and miRNA can affect each other in various ways. In this study, we established the EpimiR database, which collects 1974 regulations between 19 kinds of epigenetic modifications (such as DNA methylation, histone acetylation, H3K4me3, H3S10p) and 617 miRNAs across seven species (including Homo sapiens, Mus musculus, Rattus norvegicus, Gallus gallus, Epstein-Barr virus, Canis familiaris and Arabidopsis thaliana) from >300 references in the literature. These regulations can be divided into two parts: miR2Epi (103 entries describing how miRNA regulates epigenetic modification) and Epi2miR (1871 entries describing how epigenetic modification affects miRNA). Each entry of EpimiR not only contains basic descriptions of the validated experiment (method, species, reference and so on) but also clearly illuminates the regulatory pathway between epigenetics and miRNA. As a supplement to the curated information, the EpimiR extends to gather predicted epigenetic features (such as predicted transcription start site, upstream CpG island) associated with miRNA for users to guide their future biological experiments. Finally, EpimiR offers download and submission pages. Thus, EpimiR provides a fairly comprehensive repository about the mutual regulation between epigenetic modifications and miRNAs, which will promote the research on the regulatory mechanism of epigenetics and miRNA. Database URL: http://bioinfo.hrbmu.edu.cn/EpimiR/. PMID:24682734

  9. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    PubMed

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248

  10. RNRdb, a curated database of the universal enzyme family ribonucleotide reductase, reveals a high level of misannotation in sequences deposited to Genbank

    PubMed Central

    2009-01-01

    Background Ribonucleotide reductases (RNRs) catalyse the only known de novo pathway for deoxyribonucleotide synthesis, and are therefore essential to DNA-based life. While ribonucleotide reduction has a single evolutionary origin, significant differences between RNRs nevertheless exist, notably in cofactor requirements, subunit composition and allosteric regulation. These differences result in distinct operational constraints (anaerobicity, iron/oxygen dependence and cobalamin dependence), and form the basis for the classification of RNRs into three classes. Description In RNRdb (Ribonucleotide Reductase database), we have collated and curated all known RNR protein sequences with the aim of providing a resource for exploration of RNR diversity and distribution. By comparing expert manual annotations with annotations stored in Genbank, we find that significant inaccuracies exist in larger databases. To our surprise, only 23% of protein sequences included in RNRdb are correctly annotated across the key attributes of class, role and function, with 17% being incorrectly annotated across all three categories. This illustrates the utility of specialist databases for applications where a high degree of annotation accuracy may be important. The database houses information on annotation, distribution and diversity of RNRs, and links to solved RNR structures, and can be searched through a BLAST interface. RNRdb is accessible through a public web interface at http://rnrdb.molbio.su.se. Conclusion RNRdb is a specialist database that provides a reliable annotation and classification resource for RNR proteins, as well as a tool to explore distribution patterns of RNR classes. The recent expansion in available genome sequence data have provided us with a picture of RNR distribution that is more complex than believed only a few years ago; our database indicates that RNRs of all three classes are found across all three cellular domains. Moreover, we find a number of organisms that

  11. PASmiR: a literature-curated database for miRNA molecular regulation in plant response to abiotic stress

    PubMed Central

    2013-01-01

    Background Over 200 published studies of more than 30 plant species have reported a role for miRNAs in regulating responses to abiotic stresses. However, data from these individual reports has not been collected into a single database. The lack of a curated database of stress-related miRNAs limits research in this field, and thus a cohesive database system should necessarily be constructed for data deposit and further application. Description PASmiR, a literature-curated and web-accessible database, was developed to provide detailed, searchable descriptions of miRNA molecular regulation in different plant abiotic stresses. PASmiR currently includes data from ~200 published studies, representing 1038 regulatory relationships between 682 miRNAs and 35 abiotic stresses in 33 plant species. PASmiR’s interface allows users to retrieve miRNA-stress regulatory entries by keyword search using plant species, abiotic stress, and miRNA identifier. Each entry upon keyword query contains detailed regulation information for a specific miRNA, including species name, miRNA identifier, stress name, miRNA expression pattern, detection method for miRNA expression, a reference literature, and target gene(s) of the miRNA extracted from the corresponding reference or miRBase. Users can also contribute novel regulatory entries by using a web-based submission page. The PASmiR database is freely accessible from the two URLs of http://hi.ustc.edu.cn:8080/PASmiR, and http://pcsb.ahau.edu.cn:8080/PASmiR. Conclusion The PASmiR database provides a solid platform for collection, standardization, and searching of miRNA-abiotic stress regulation data in plants. As such this database will be a comprehensive repository for miRNA regulatory mechanisms involved in plant response to abiotic stresses for the plant stress physiology community. PMID:23448274

  12. Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?

    PubMed

    Winnenburg, Rainer; Wächter, Thomas; Plake, Conrad; Doms, Andreas; Schroeder, Michael

    2008-11-01

    The biomedical literature can be seen as a large integrated, but unstructured data repository. Extracting facts from literature and making them accessible is approached from two directions: manual curation efforts develop ontologies and vocabularies to annotate gene products based on statements in papers. Text mining aims to automatically identify entities and their relationships in text using information retrieval and natural language processing techniques. Manual curation is highly accurate but time consuming, and does not scale with the ever increasing growth of literature. Text mining as a high-throughput computational technique scales well, but is error-prone due to the complexity of natural language. How can both be married to combine scalability and accuracy? Here, we review the state-of-the-art text mining approaches that are relevant to annotation and discuss available online services analysing biomedical literature by means of text mining techniques, which could also be utilised by annotation projects. We then examine how far text mining has already been utilised in existing annotation projects and conclude how these techniques could be tightly integrated into the manual annotation process through novel authoring systems to scale-up high-quality manual curation. PMID:19060303

  13. Improving the Discoverability and Availability of Sample Data and Imagery in NASA's Astromaterials Curation Digital Repository Using a New Common Architecture for Sample Databases

    NASA Technical Reports Server (NTRS)

    Todd, N. S.; Evans, C.

    2015-01-01

    The Astromaterials Acquisition and Curation Office at NASA's Johnson Space Center (JSC) is the designated facility for curating all of NASA's extraterrestrial samples. The suite of collections includes the lunar samples from the Apollo missions, cosmic dust particles falling into the Earth's atmosphere, meteorites collected in Antarctica, comet and interstellar dust particles from the Stardust mission, asteroid particles from the Japanese Hayabusa mission, and solar wind atoms collected during the Genesis mission. To support planetary science research on these samples, NASA's Astromaterials Curation Office hosts the Astromaterials Curation Digital Repository, which provides descriptions of the missions and collections, and critical information about each individual sample. Our office is implementing several informatics initiatives with the goal of better serving the planetary research community. One of these initiatives aims to increase the availability and discoverability of sample data and images through the use of a newly designed common architecture for Astromaterials Curation databases.

  14. The Rat Genome Database curation tool suite: a set of optimized software tools enabling efficient acquisition, organization, and presentation of biological data

    PubMed Central

    Laulederkind, Stanley J. F.; Shimoyama, Mary; Hayman, G. Thomas; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Wang, Shur-Jen; de Pons, Jeff; Kowalski, George; Liu, Weisong; Rood, Wes; Munzenmaier, Diane H.; Dwinell, Melinda R.; Twigger, Simon N.; Jacob, Howard J.

    2011-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40 000 rat gene records as well as human and mouse orthologs, 1771 rat and 1911 human quantitative trait loci (QTLs) and 2209 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. A suite of tools has been developed to aid curators in acquiring and validating data objects, assigning nomenclature, attaching biological information to objects and making connections among data types. The software used to assign nomenclature, to create and edit objects and to make annotations to the data objects has been specifically designed to make the curation process as fast and efficient as possible. The user interfaces have been adapted to the work routines of the curators, creating a suite of tools that is intuitive and powerful. Database URL: http://rgd.mcw.edu PMID:21321022

  15. UNSODA UNSATURATED SOIL HYDRAULIC DATABASE USER'S MANUAL VERSION 1.0

    EPA Science Inventory

    This report contains general documentation and serves as a user manual of the UNSODA program. UNSODA is a database of unsaturated soil hydraulic properties (water retention, hydraulic conductivity, and soil water diffusivity), basic soil properties (particle-size distribution, b...

  16. MALIDUP: a database of manually constructed structure alignments for duplicated domain pairs.

    PubMed

    Cheng, Hua; Kim, Bong-Hyun; Grishin, Nick V

    2008-03-01

    We describe MALIDUP (manual alignments of duplicated domains), a database of 241 pairwise structure alignments for homologous domains originated by internal duplication within the same polypeptide chain. Since duplicated domains within a protein frequently diverge in function and thus in sequence, this would be the first database of structurally similar homologs that is not strongly biased by sequence or functional similarity. Our manual alignments in most cases agree with the automatic structural alignments generated by several commonly used programs. This carefully constructed database could be used in studies on protein evolution and as a reference for testing structure alignment programs. The database is available at http://prodata.swmed.edu/malidup. PMID:17932926

  17. Data Curation for the Exploitation of Large Earth Observation Products Databases - The MEA system

    NASA Astrophysics Data System (ADS)

    Mantovani, Simone; Natali, Stefano; Barboni, Damiano; Cavicchi, Mario; Della Vecchia, Andrea

    2014-05-01

    National Space Agencies under the umbrella of the European Space Agency are performing a strong activity to handle and provide solutions to Big Data and related knowledge (metadata, software tools and services) management and exploitation. The continuously increasing amount of long-term and of historic data in EO facilities in the form of online datasets and archives, the incoming satellite observation platforms that will generate an impressive amount of new data and the new EU approach on the data distribution policy make necessary to address technologies for the long-term management of these data sets, including their consolidation, preservation, distribution, continuation and curation across multiple missions. The management of long EO data time series of continuing or historic missions - with more than 20 years of data available already today - requires technical solutions and technologies which differ considerably from the ones exploited by existing systems. Several tools, both open source and commercial, are already providing technologies to handle data and metadata preparation, access and visualization via OGC standard interfaces. This study aims at describing the Multi-sensor Evolution Analysis (MEA) system and the Data Curation concept as approached and implemented within the ASIM and EarthServer projects, funded by the European Space Agency and the European Commission, respectively.

  18. Selective databases distributed on the basis of Frascati manual

    PubMed Central

    Kujundzic, Enes; Masic, Izet

    2013-01-01

    Introduction The answer to the question of what a database is and what is its relevance to the scientific research is not easy. We may not be wrong if we say that it is, basically, a kind of information resource, often incomparably richer than it is, for example, a single book or magazine. Discussion and conclusion As a form of storing and retrieval of the knowledge, appeared in the information age, which we’ve just participated and witnesses. In it, thanks to the technical possibilities of information networks, it can be searched for a number of more or less relevant information, and that scientific and profound content. Databases are divided into: bibliographic databases, citation databases and databases containing full-text. In the paper are shortly presented most important on-line databases with their web site links. Thanks to those online databases scientific knowledge is spreading much more easy and useful. PMID:23572867

  19. Practical guidelines addressing ethical issues pertaining to the curation of human locus-specific variation databases (LSDBs)

    PubMed Central

    Povey, Sue; Al Aqeel, Aida I; Cambon-Thomsen, Anne; Dalgleish, Raymond; den Dunnen, Johan T; Firth, Helen V; Greenblatt, Marc S; Barash, Carol Isaacson; Parker, Michael; Patrinos, George P; Savige, Judith; Sobrido, Maria-Jesus; Winship, Ingrid; Cotton, Richard GH

    2010-01-01

    More than 1,000 Web-based locus-specific variation databases (LSDBs) are listed on the Website of the Human Genetic Variation Society (HGVS). These individual efforts, which often relate phenotype to genotype, are a valuable source of information for clinicians, patients, and their families, as well as for basic research. The initiators of the Human Variome Project recently recognized that having access to some of the immense resources of unpublished information already present in diagnostic laboratories would provide critical data to help manage genetic disorders. However, there are significant ethical issues involved in sharing these data worldwide. An international working group presents second-generation guidelines addressing ethical issues relating to the curation of human LSDBs that provide information via a Web-based interface. It is intended that these should help current and future curators and may also inform the future decisions of ethics committees and legislators. These guidelines have been reviewed by the Ethics Committee of the Human Genome Organization (HUGO). Hum Mutat 31:–6, 2010. © 2010 Wiley-Liss, Inc. PMID:20683926

  20. R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring.

    PubMed

    Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès

    2016-01-01

    Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the

  1. R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring

    PubMed Central

    Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès

    2016-01-01

    Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the

  2. NEMiD: A Web-Based Curated Microbial Diversity Database with Geo-Based Plotting

    PubMed Central

    Bhattacharjee, Kaushik; Joshi, Santa Ram

    2014-01-01

    The majority of the Earth's microbes remain unknown, and that their potential utility cannot be exploited until they are discovered and characterized. They provide wide scope for the development of new strains as well as biotechnological uses. The documentation and bioprospection of microorganisms carry enormous significance considering their relevance to human welfare. This calls for an urgent need to develop a database with emphasis on the microbial diversity of the largest untapped reservoirs in the biosphere. The data annotated in the North-East India Microbial database (NEMiD) were obtained by the isolation and characterization of microbes from different parts of the Eastern Himalayan region. The database was constructed as a relational database management system (RDBMS) for data storage in MySQL in the back-end on a Linux server and implemented in an Apache/PHP environment. This database provides a base for understanding the soil microbial diversity pattern in this megabiodiversity hotspot and indicates the distribution patterns of various organisms along with identification. The NEMiD database is freely available at www.mblabnehu.info/nemid/. PMID:24714636

  3. NEMiD: a web-based curated microbial diversity database with geo-based plotting.

    PubMed

    Bhattacharjee, Kaushik; Joshi, Santa Ram

    2014-01-01

    The majority of the Earth's microbes remain unknown, and that their potential utility cannot be exploited until they are discovered and characterized. They provide wide scope for the development of new strains as well as biotechnological uses. The documentation and bioprospection of microorganisms carry enormous significance considering their relevance to human welfare. This calls for an urgent need to develop a database with emphasis on the microbial diversity of the largest untapped reservoirs in the biosphere. The data annotated in the North-East India Microbial database (NEMiD) were obtained by the isolation and characterization of microbes from different parts of the Eastern Himalayan region. The database was constructed as a relational database management system (RDBMS) for data storage in MySQL in the back-end on a Linux server and implemented in an Apache/PHP environment. This database provides a base for understanding the soil microbial diversity pattern in this megabiodiversity hotspot and indicates the distribution patterns of various organisms along with identification. The NEMiD database is freely available at www.mblabnehu.info/nemid/. PMID:24714636

  4. TWRS information locator database system administrator`s manual

    SciTech Connect

    Knutson, B.J., Westinghouse Hanford

    1996-09-13

    This document is a guide for use by the Tank Waste Remediation System (TWRS) Information Locator Database (ILD) System Administrator. The TWRS ILD System is an inventory of information used in the TWRS Systems Engineering process to represent the TWRS Technical Baseline. The inventory is maintained in the form of a relational database developed in Paradox 4.5.

  5. Classifying the bacterial gut microbiota of termites and cockroaches: A curated phylogenetic reference database (DictDb).

    PubMed

    Mikaelyan, Aram; Köhler, Tim; Lampert, Niclas; Rohland, Jeffrey; Boga, Hamadi; Meuser, Katja; Brune, Andreas

    2015-10-01

    Recent developments in sequencing technology have given rise to a large number of studies that assess bacterial diversity and community structure in termite and cockroach guts based on large amplicon libraries of 16S rRNA genes. Although these studies have revealed important ecological and evolutionary patterns in the gut microbiota, classification of the short sequence reads is limited by the taxonomic depth and resolution of the reference databases used in the respective studies. Here, we present a curated reference database for accurate taxonomic analysis of the bacterial gut microbiota of dictyopteran insects. The Dictyopteran gut microbiota reference Database (DictDb) is based on the Silva database but was significantly expanded by the addition of clones from 11 mostly unexplored termite and cockroach groups, which increased the inventory of bacterial sequences from dictyopteran guts by 26%. The taxonomic depth and resolution of DictDb was significantly improved by a general revision of the taxonomic guide tree for all important lineages, including a detailed phylogenetic analysis of the Treponema and Alistipes complexes, the Fibrobacteres, and the TG3 phylum. The performance of this first documented version of DictDb (v. 3.0) using the revised taxonomic guide tree in the classification of short-read libraries obtained from termites and cockroaches was highly superior to that of the current Silva and RDP databases. DictDb uses an informative nomenclature that is consistent with the literature also for clades of uncultured bacteria and provides an invaluable tool for anyone exploring the gut community structure of termites and cockroaches. PMID:26283320

  6. The Coral Trait Database, a curated database of trait information for coral species from the global oceans.

    PubMed

    Madin, Joshua S; Anderson, Kristen D; Andreasen, Magnus Heide; Bridge, Tom C L; Cairns, Stephen D; Connolly, Sean R; Darling, Emily S; Diaz, Marcela; Falster, Daniel S; Franklin, Erik C; Gates, Ruth D; Hoogenboom, Mia O; Huang, Danwei; Keith, Sally A; Kosnik, Matthew A; Kuo, Chao-Yang; Lough, Janice M; Lovelock, Catherine E; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M; Pochon, Xavier; Pratchett, Morgan S; Putnam, Hollie M; Roberts, T Edward; Stat, Michael; Wallace, Carden C; Widman, Elizabeth; Baird, Andrew H

    2016-01-01

    Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism's function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research. PMID:27023900

  7. The Coral Trait Database, a curated database of trait information for coral species from the global oceans

    PubMed Central

    Madin, Joshua S.; Anderson, Kristen D.; Andreasen, Magnus Heide; Bridge, Tom C.L.; Cairns, Stephen D.; Connolly, Sean R.; Darling, Emily S.; Diaz, Marcela; Falster, Daniel S.; Franklin, Erik C.; Gates, Ruth D.; Hoogenboom, Mia O.; Huang, Danwei; Keith, Sally A.; Kosnik, Matthew A.; Kuo, Chao-Yang; Lough, Janice M.; Lovelock, Catherine E.; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M.; Pochon, Xavier; Pratchett, Morgan S.; Putnam, Hollie M.; Roberts, T. Edward; Stat, Michael; Wallace, Carden C.; Widman, Elizabeth; Baird, Andrew H.

    2016-01-01

    Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism’s function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research. PMID:27023900

  8. Database Changes (Post-Publication). ERIC Processing Manual, Section X.

    ERIC Educational Resources Information Center

    Brandhorst, Ted, Ed.

    The purpose of this section is to specify the procedure for making changes to the ERIC database after the data involved have been announced in the abstract journals RIE or CIJE. As a matter of general ERIC policy, a document or journal article is not re-announced or re-entered into the database as a new accession for the purpose of accomplishing a…

  9. Nuclear Energy Infrastructure Database Description and User’s Manual

    SciTech Connect

    Heidrich, Brenden

    2015-11-01

    In 2014, the Deputy Assistant Secretary for Science and Technology Innovation initiated the Nuclear Energy (NE)–Infrastructure Management Project by tasking the Nuclear Science User Facilities, formerly the Advanced Test Reactor National Scientific User Facility, to create a searchable and interactive database of all pertinent NE-supported and -related infrastructure. This database, known as the Nuclear Energy Infrastructure Database (NEID), is used for analyses to establish needs, redundancies, efficiencies, distributions, etc., to best understand the utility of NE’s infrastructure and inform the content of infrastructure calls. The Nuclear Science User Facilities developed the database by utilizing data and policy direction from a variety of reports from the U.S. Department of Energy, the National Research Council, the International Atomic Energy Agency, and various other federal and civilian resources. The NEID currently contains data on 802 research and development instruments housed in 377 facilities at 84 institutions in the United States and abroad. The effort to maintain and expand the database is ongoing. Detailed information on many facilities must be gathered from associated institutions and added to complete the database. The data must be validated and kept current to capture facility and instrumentation status as well as to cover new acquisitions and retirements. This document provides a short tutorial on the navigation of the NEID web portal at NSUF-Infrastructure.INL.gov.

  10. DREMECELS: A Curated Database for Base Excision and Mismatch Repair Mechanisms Associated Human Malignancies

    PubMed Central

    Shukla, Ankita; Singh, Tiratha Raj

    2016-01-01

    DNA repair mechanisms act as a warrior combating various damaging processes that ensue critical malignancies. DREMECELS was designed considering the malignancies with frequent alterations in DNA repair pathways, that is, colorectal and endometrial cancers, associated with Lynch syndrome (also known as HNPCC). Since lynch syndrome carries high risk (~40–60%) for both cancers, therefore we decided to cover all three diseases in this portal. Although a large population is presently affected by these malignancies, many resources are available for various cancer types but no database archives information on the genes specifically for only these cancers and disorders. The database contains 156 genes and two repair mechanisms, base excision repair (BER) and mismatch repair (MMR). Other parameters include some of the regulatory processes that have roles in these disease progressions due to incompetent repair mechanisms, specifically BER and MMR. However, our unique database mainly provides qualitative and quantitative information on these cancer types along with methylation, drug sensitivity, miRNAs, copy number variation (CNV) and somatic mutations data. This database would serve the scientific community by providing integrated information on these disease types, thus sustaining diagnostic and therapeutic processes. This repository would serve as an excellent accompaniment for researchers and biomedical professionals and facilitate in understanding such critical diseases. DREMECELS is publicly available at http://www.bioinfoindia.org/dremecels. PMID:27276067

  11. dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes

    PubMed Central

    Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P. S.

    2016-01-01

    We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem. PMID:26777304

  12. BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models

    PubMed Central

    2010-01-01

    Background Quantitative models of biochemical and cellular systems are used to answer a variety of questions in the biological sciences. The number of published quantitative models is growing steadily thanks to increasing interest in the use of models as well as the development of improved software systems and the availability of better, cheaper computer hardware. To maximise the benefits of this growing body of models, the field needs centralised model repositories that will encourage, facilitate and promote model dissemination and reuse. Ideally, the models stored in these repositories should be extensively tested and encoded in community-supported and standardised formats. In addition, the models and their components should be cross-referenced with other resources in order to allow their unambiguous identification. Description BioModels Database http://www.ebi.ac.uk/biomodels/ is aimed at addressing exactly these needs. It is a freely-accessible online resource for storing, viewing, retrieving, and analysing published, peer-reviewed quantitative models of biochemical and cellular systems. The structure and behaviour of each simulation model distributed by BioModels Database are thoroughly checked; in addition, model elements are annotated with terms from controlled vocabularies as well as linked to relevant data resources. Models can be examined online or downloaded in various formats. Reaction network diagrams generated from the models are also available in several formats. BioModels Database also provides features such as online simulation and the extraction of components from large scale models into smaller submodels. Finally, the system provides a range of web services that external software systems can use to access up-to-date data from the database. Conclusions BioModels Database has become a recognised reference resource for systems biology. It is being used by the community in a variety of ways; for example, it is used to benchmark different simulation

  13. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database.

    PubMed

    Khoury, George A; Baliban, Richard C; Floudas, Christodoulos A

    2011-09-13

    Post-translational modifications (PTMs) broadly contribute to the recent explosion of proteomic data and possess a complexity surpassing that of protein design. PTMs are the chemical modification of a protein after its translation, and have wide effects broadening its range of functionality. Based on previous estimates, it is widely believed that more than half of proteins are glycoproteins. Whereas mutations can only occur once per position, different forms of post-translational modifications may occur in tandem. With the number and abundances of modifications constantly being discovered, there is no method to readily assess their relative levels. Here we report the relative abundances of each PTM found experimentally and putatively, from high-quality, manually curated, proteome-wide data, and show that at best, less than one-fifth of proteins are glycosylated. We make available to the academic community a continuously updated resource (http://selene.princeton.edu/PTMCuration) containing the statistics so scientists can assess "how many" of each PTM exists. PMID:22034591

  14. TRENDS: A flight test relational database user's guide and reference manual

    NASA Technical Reports Server (NTRS)

    Bondi, M. J.; Bjorkman, W. S.; Cross, J. L.

    1994-01-01

    This report is designed to be a user's guide and reference manual for users intending to access rotocraft test data via TRENDS, the relational database system which was developed as a tool for the aeronautical engineer with no programming background. This report has been written to assist novice and experienced TRENDS users. TRENDS is a complete system for retrieving, searching, and analyzing both numerical and narrative data, and for displaying time history and statistical data in graphical and numerical formats. This manual provides a 'guided tour' and a 'user's guide' for the new and intermediate-skilled users. Examples for the use of each menu item within TRENDS is provided in the Menu Reference section of the manual, including full coverage for TIMEHIST, one of the key tools. This manual is written around the XV-15 Tilt Rotor database, but does include an appendix on the UH-60 Blackhawk database. This user's guide and reference manual establishes a referrable source for the research community and augments NASA TM-101025, TRENDS: The Aeronautical Post-Test, Database Management System, Jan. 1990, written by the same authors.

  15. ODIN. Online Database Information Network: ODIN Policy & Procedure Manual.

    ERIC Educational Resources Information Center

    Townley, Charles T.; And Others

    Policies and procedures are outlined for the Online Database Information Network (ODIN), a cooperative of libraries in south-central Pennsylvania, which was organized to improve library services through technology. The first section covers organization and goals, members, and responsibilities of the administrative council and libraries. Patrons…

  16. The FOCUS Database: The Nation's Premier Resource in Dropout Prevention. Instruction Manual.

    ERIC Educational Resources Information Center

    National Dropout Prevention Center, Clemson, SC.

    This booklet is an instruction manual for those using the FOCUS database, an information source on dropout prevention of the National Dropout Prevention Center. An introduction lists the FOCUS files, which include Program Profiles, Calendar of Events, Resource Materials Library, Organizations, and Consultants and Speakers. Also given is the…

  17. The development of an Ada programming support environment database: SEAD (Software Engineering and Ada Database), user's manual

    NASA Technical Reports Server (NTRS)

    Liaw, Morris; Evesson, Donna

    1988-01-01

    This is a manual for users of the Software Engineering and Ada Database (SEAD). SEAD was developed to provide an information resource to NASA and NASA contractors with respect to Ada-based resources and activities that are available or underway either in NASA or elsewhere in the worldwide Ada community. The sharing of such information will reduce the duplication of effort while improving quality in the development of future software systems. The manual describes the organization of the data in SEAD, the user interface from logging in to logging out, and concludes with a ten chapter tutorial on how to use the information in SEAD. Two appendices provide quick reference for logging into SEAD and using the keyboard of an IBM 3270 or VT100 computer terminal.

  18. Development of an Ada programming support environment database SEAD (Software Engineering and Ada Database) administration manual

    NASA Technical Reports Server (NTRS)

    Liaw, Morris; Evesson, Donna

    1988-01-01

    Software Engineering and Ada Database (SEAD) was developed to provide an information resource to NASA and NASA contractors with respect to Ada-based resources and activities which are available or underway either in NASA or elsewhere in the worldwide Ada community. The sharing of such information will reduce duplication of effort while improving quality in the development of future software systems. SEAD data is organized into five major areas: information regarding education and training resources which are relevant to the life cycle of Ada-based software engineering projects such as those in the Space Station program; research publications relevant to NASA projects such as the Space Station Program and conferences relating to Ada technology; the latest progress reports on Ada projects completed or in progress both within NASA and throughout the free world; Ada compilers and other commercial products that support Ada software development; and reusable Ada components generated both within NASA and from elsewhere in the free world. This classified listing of reusable components shall include descriptions of tools, libraries, and other components of interest to NASA. Sources for the data include technical newletters and periodicals, conference proceedings, the Ada Information Clearinghouse, product vendors, and project sponsors and contractors.

  19. Genetic Variations and Diseases in UniProtKB/Swiss-Prot: The Ins and Outs of Expert Manual Curation

    PubMed Central

    Famiglietti, Maria Livia; Estreicher, Anne; Gos, Arnaud; Bolleman, Jerven; Géhant, Sébastien; Breuza, Lionel; Bridge, Alan; Poux, Sylvain; Redaschi, Nicole; Bougueleret, Lydie; Xenarios, Ioannis

    2014-01-01

    During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype. PMID:24848695

  20. PFR²: a curated database of planktonic foraminifera 18S ribosomal DNA as a resource for studies of plankton ecology, biogeography and evolution.

    PubMed

    Morard, Raphaël; Darling, Kate F; Mahé, Frédéric; Audic, Stéphane; Ujiié, Yurika; Weiner, Agnes K M; André, Aurore; Seears, Heidi A; Wade, Christopher M; Quillévéré, Frédéric; Douady, Christophe J; Escarguel, Gilles; de Garidel-Thoron, Thibault; Siccha, Michael; Kucera, Michal; de Vargas, Colomban

    2015-11-01

    Planktonic foraminifera (Rhizaria) are ubiquitous marine pelagic protists producing calcareous shells with conspicuous morphology. They play an important role in the marine carbon cycle, and their exceptional fossil record serves as the basis for biochronostratigraphy and past climate reconstructions. A major worldwide sampling effort over the last two decades has resulted in the establishment of multiple large collections of cryopreserved individual planktonic foraminifera samples. Thousands of 18S rDNA partial sequences have been generated, representing all major known morphological taxa across their worldwide oceanic range. This comprehensive data coverage provides an opportunity to assess patterns of molecular ecology and evolution in a holistic way for an entire group of planktonic protists. We combined all available published and unpublished genetic data to build PFR(2), the Planktonic foraminifera Ribosomal Reference database. The first version of the database includes 3322 reference 18S rDNA sequences belonging to 32 of the 47 known morphospecies of extant planktonic foraminifera, collected from 460 oceanic stations. All sequences have been rigorously taxonomically curated using a six-rank annotation system fully resolved to the morphological species level and linked to a series of metadata. The PFR(2) website, available at http://pfr2.sb-roscoff.fr, allows downloading the entire database or specific sections, as well as the identification of new planktonic foraminiferal sequences. Its novel, fully documented curation process integrates advances in morphological and molecular taxonomy. It allows for an increase in its taxonomic resolution and assures that integrity is maintained by including a complete contingency tracking of annotations and assuring that the annotations remain internally consistent. PMID:25828689

  1. Scaling drug indication curation through crowdsourcing

    PubMed Central

    Khare, Ritu; Burger, John D.; Aberdeen, John S.; Tresner-Kirsch, David W.; Corrales, Theodore J.; Hirchman, Lynette; Lu, Zhiyong

    2015-01-01

    Motivated by the high cost of human curation of biological databases, there is an increasing interest in using computational approaches to assist human curators and accelerate the manual curation process. Towards the goal of cataloging drug indications from FDA drug labels, we recently developed LabeledIn, a human-curated drug indication resource for 250 clinical drugs. Its development required over 40 h of human effort across 20 weeks, despite using well-defined annotation guidelines. In this study, we aim to investigate the feasibility of scaling drug indication annotation through a crowdsourcing technique where an unknown network of workers can be recruited through the technical environment of Amazon Mechanical Turk (MTurk). To translate the expert-curation task of cataloging indications into human intelligence tasks (HITs) suitable for the average workers on MTurk, we first simplify the complex task such that each HIT only involves a worker making a binary judgment of whether a highlighted disease, in context of a given drug label, is an indication. In addition, this study is novel in the crowdsourcing interface design where the annotation guidelines are encoded into user options. For evaluation, we assess the ability of our proposed method to achieve high-quality annotations in a time-efficient and cost-effective manner. We posted over 3000 HITs drawn from 706 drug labels on MTurk. Within 8 h of posting, we collected 18 775 judgments from 74 workers, and achieved an aggregated accuracy of 96% on 450 control HITs (where gold-standard answers are known), at a cost of $1.75 per drug label. On the basis of these results, we conclude that our crowdsourcing approach not only results in significant cost and time saving, but also leads to accuracy comparable to that of domain experts. Database URL: ftp://ftp.ncbi.nlm.nih.gov/pub/lu/LabeledIn/Crowdsourcing/. PMID:25797061

  2. Protein sequence databases.

    PubMed

    Apweiler, Rolf; Bairoch, Amos; Wu, Cathy H

    2004-02-01

    A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium. PMID:15036160

  3. The needs for chemistry standards, database tools and data curation at the chemical-biology interface (SLAS meeting)

    EPA Science Inventory

    This presentation will highlight known challenges with the production of high quality chemical databases and outline recent efforts made to address these challenges. Specific examples will be provided illustrating these challenges within the U.S. Environmental Protection Agency ...

  4. Egas: a collaborative and interactive document curation platform.

    PubMed

    Campos, David; Lourenço, Jóni; Matos, Sérgio; Oliveira, José Luís

    2014-01-01

    With the overwhelming amount of biomedical textual information being produced, several manual curation efforts have been set up to extract and store concepts and their relationships into structured resources. As manual annotation is a demanding and expensive task, computerized solutions were developed to perform such tasks automatically. However, high-end information extraction techniques are still not widely used by biomedical research communities, mainly because of the lack of standards and limitations in usability. Interactive annotation tools intend to fill this gap, taking advantage of automatic techniques and existing knowledge bases to assist expert curators in their daily tasks. This article presents Egas, a web-based platform for biomedical text mining and assisted curation with highly usable interfaces for manual and automatic in-line annotation of concepts and relations. A comprehensive set of de facto standard knowledge bases are integrated and indexed to provide straightforward concept normalization features. Real-time collaboration and conversation functionalities allow discussing details of the annotation task as well as providing instant feedback of curator's interactions. Egas also provides interfaces for on-demand management of the annotation task settings and guidelines, and supports standard formats and literature services to import and export documents. By taking advantage of Egas, we participated in the BioCreative IV interactive annotation task, targeting the assisted identification of protein-protein interactions described in PubMed abstracts related to neuropathological disorders. When evaluated by expert curators, it obtained positive scores in terms of usability, reliability and performance. These results, together with the provided innovative features, place Egas as a state-of-the-art solution for fast and accurate curation of information, facilitating the task of creating and updating knowledge bases and annotated resources. Database

  5. PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy.

    PubMed

    Decelle, Johan; Romac, Sarah; Stern, Rowena F; Bendif, El Mahdi; Zingone, Adriana; Audic, Stéphane; Guiry, Michael D; Guillou, Laure; Tessier, Désiré; Le Gall, Florence; Gourvil, Priscillia; Dos Santos, Adriana L; Probert, Ian; Vaulot, Daniel; de Vargas, Colomban; Christen, Richard

    2015-11-01

    Photosynthetic eukaryotes have a critical role as the main producers in most ecosystems of the biosphere. The ongoing environmental metabarcoding revolution opens the perspective for holistic ecosystems biological studies of these organisms, in particular the unicellular microalgae that often lack distinctive morphological characters and have complex life cycles. To interpret environmental sequences, metabarcoding necessarily relies on taxonomically curated databases containing reference sequences of the targeted gene (or barcode) from identified organisms. To date, no such reference framework exists for photosynthetic eukaryotes. In this study, we built the PhytoREF database that contains 6490 plastidial 16S rDNA reference sequences that originate from a large diversity of eukaryotes representing all known major photosynthetic lineages. We compiled 3333 amplicon sequences available from public databases and 879 sequences extracted from plastidial genomes, and generated 411 novel sequences from cultured marine microalgal strains belonging to different eukaryotic lineages. A total of 1867 environmental Sanger 16S rDNA sequences were also included in the database. Stringent quality filtering and a phylogeny-based taxonomic classification were applied for each 16S rDNA sequence. The database mainly focuses on marine microalgae, but sequences from land plants (representing half of the PhytoREF sequences) and freshwater taxa were also included to broaden the applicability of PhytoREF to different aquatic and terrestrial habitats. PhytoREF, accessible via a web interface (http://phytoref.fr), is a new resource in molecular ecology to foster the discovery, assessment and monitoring of the diversity of photosynthetic eukaryotes using high-throughput sequencing. PMID:25740460

  6. Data Curation

    ERIC Educational Resources Information Center

    Mallon, Melissa, Ed.

    2012-01-01

    In their Top Trends of 2012, the Association of College and Research Libraries (ACRL) named data curation as one of the issues to watch in academic libraries in the near future (ACRL, 2012, p. 312). Data curation can be summarized as "the active and ongoing management of data through its life cycle of interest and usefulness to scholarship,…

  7. Tox-Database.net: a curated resource for data describing chemical triggered in vitro cardiac ion channels inhibition

    PubMed Central

    2012-01-01

    Background Drugs safety issues are now recognized as being factors generating the most reasons for drug withdrawals at various levels of development and at the post-approval stage. Among them cardiotoxicity remains the main reason, despite the substantial effort put into in vitro and in vivo testing, with the main focus put on hERG channel inhibition as the hypothesized surrogate of drug proarrhythmic potency. The large interest in the IKr current has resulted in the development of predictive tools and informative databases describing a drug's susceptibility to interactions with the hERG channel, although there are no similar, publicly available sets of data describing other ionic currents driven by the human cardiomyocyte ionic channels, which are recognized as an overlooked drug safety target. Discussion The aim of this database development and publication was to provide a scientifically useful, easily usable and clearly verifiable set of information describing not only IKr (hERG), but also other human cardiomyocyte specific ionic channels inhibition data (IKs, INa, ICa). Summary The broad range of data (chemical space and in vitro settings) and the easy to use user interface makes tox-database.net a useful tool for interested scientists. Database URL http://tox-database.net. PMID:22947121

  8. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

    PubMed

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  9. A Comparison of Computer-Based Bibliographic Database Searching vs. Manual Bibliographic Searching. Final CARE Grant Report #5964.

    ERIC Educational Resources Information Center

    Pritchard, Eileen E.; Rockman, Ilene F.

    The purpose of this study was to improve upon previous research investigations by analyzing the elements of cost effectiveness, precision, recall, citation overlap, and hours searching in a comparison between computerized database searching and manual searching, using the setting of an academic library environment with a diverse group of students…

  10. Scaling drug indication curation through crowdsourcing.

    PubMed

    Khare, Ritu; Burger, John D; Aberdeen, John S; Tresner-Kirsch, David W; Corrales, Theodore J; Hirchman, Lynette; Lu, Zhiyong

    2015-01-01

    Motivated by the high cost of human curation of biological databases, there is an increasing interest in using computational approaches to assist human curators and accelerate the manual curation process. Towards the goal of cataloging drug indications from FDA drug labels, we recently developed LabeledIn, a human-curated drug indication resource for 250 clinical drugs. Its development required over 40 h of human effort across 20 weeks, despite using well-defined annotation guidelines. In this study, we aim to investigate the feasibility of scaling drug indication annotation through a crowdsourcing technique where an unknown network of workers can be recruited through the technical environment of Amazon Mechanical Turk (MTurk). To translate the expert-curation task of cataloging indications into human intelligence tasks (HITs) suitable for the average workers on MTurk, we first simplify the complex task such that each HIT only involves a worker making a binary judgment of whether a highlighted disease, in context of a given drug label, is an indication. In addition, this study is novel in the crowdsourcing interface design where the annotation guidelines are encoded into user options. For evaluation, we assess the ability of our proposed method to achieve high-quality annotations in a time-efficient and cost-effective manner. We posted over 3000 HITs drawn from 706 drug labels on MTurk. Within 8 h of posting, we collected 18 775 judgments from 74 workers, and achieved an aggregated accuracy of 96% on 450 control HITs (where gold-standard answers are known), at a cost of $1.75 per drug label. On the basis of these results, we conclude that our crowdsourcing approach not only results in significant cost and time saving, but also leads to accuracy comparable to that of domain experts. PMID:25797061

  11. miRGate: a curated database of human, mouse and rat miRNA–mRNA targets

    PubMed Central

    Andrés-León, Eduardo; González Peña, Daniel; Gómez-López, Gonzalo; Pisano, David G.

    2015-01-01

    MicroRNAs (miRNAs) are small non-coding elements involved in the post-transcriptional down-regulation of gene expression through base pairing with messenger RNAs (mRNAs). Through this mechanism, several miRNA–mRNA pairs have been described as critical in the regulation of multiple cellular processes, including early embryonic development and pathological conditions. Many of these pairs (such as miR-15 b/BCL2 in apoptosis or BART-6/BCL6 in diffuse large B-cell lymphomas) were experimentally discovered and/or computationally predicted. Available tools for target prediction are usually based on sequence matching, thermodynamics and conservation, among other approaches. Nevertheless, the main issue on miRNA–mRNA pair prediction is the little overlapping results among different prediction methods, or even with experimentally validated pairs lists, despite the fact that all rely on similar principles. To circumvent this problem, we have developed miRGate, a database containing novel computational predicted miRNA–mRNA pairs that are calculated using well-established algorithms. In addition, it includes an updated and complete dataset of sequences for both miRNA and mRNAs 3′-Untranslated region from human (including human viruses), mouse and rat, as well as experimentally validated data from four well-known databases. The underlying methodology of miRGate has been successfully applied to independent datasets providing predictions that were convincingly validated by functional assays. miRGate is an open resource available at http://mirgate.bioinfo.cnio.es. For programmatic access, we have provided a representational state transfer web service application programming interface that allows accessing the database at http://mirgate.bioinfo.cnio.es/API/ Database URL: http://mirgate.bioinfo.cnio.es PMID:25858286

  12. How much does curation cost?

    PubMed Central

    2016-01-01

    NIH administrators have recently expressed concerns about the cost of curation for biological databases. However, they did not articulate the exact costs of curation. Here we calculate the cost of biocuration of articles for the EcoCyc database as $219 per article over a 5-year period. That cost is 6–15% of the cost of open-access publication fees for publishing biomedical articles, and we estimate that cost is 0.088% of the cost of the overall research project that generated the experimental results. Thus, curation costs are small in an absolute sense, and represent a miniscule fraction of the cost of the research. PMID:27504008

  13. 76 FR 30997 - National Transit Database: Amendments to Urbanized Area Annual Reporting Manual

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-27

    ... Federal Register (73 FR 7361) inviting comments on proposed amendments to the 2011 Annual Manual. This... Federal Register (75 FR 192) inviting comments on proposed amendments to the 2011 Annual Manual. FTA... Manual AGENCY: Federal Transit Administration (FTA), DOT. ACTION: Notice of Amendments to 2011...

  14. A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework.

    PubMed

    Bandrowski, A E; Cachat, J; Li, Y; Müller, H M; Sternberg, P W; Ciccarese, P; Clark, T; Marenco, L; Wang, R; Astakhov, V; Grethe, J S; Martone, M E

    2012-01-01

    The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is 'hidden' from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to

  15. Egas: a collaborative and interactive document curation platform

    PubMed Central

    Campos, David; Lourenço, Jóni; Matos, Sérgio; Oliveira, José Luís

    2014-01-01

    With the overwhelming amount of biomedical textual information being produced, several manual curation efforts have been set up to extract and store concepts and their relationships into structured resources. As manual annotation is a demanding and expensive task, computerized solutions were developed to perform such tasks automatically. However, high-end information extraction techniques are still not widely used by biomedical research communities, mainly because of the lack of standards and limitations in usability. Interactive annotation tools intend to fill this gap, taking advantage of automatic techniques and existing knowledge bases to assist expert curators in their daily tasks. This article presents Egas, a web-based platform for biomedical text mining and assisted curation with highly usable interfaces for manual and automatic in-line annotation of concepts and relations. A comprehensive set of de facto standard knowledge bases are integrated and indexed to provide straightforward concept normalization features. Real-time collaboration and conversation functionalities allow discussing details of the annotation task as well as providing instant feedback of curator’s interactions. Egas also provides interfaces for on-demand management of the annotation task settings and guidelines, and supports standard formats and literature services to import and export documents. By taking advantage of Egas, we participated in the BioCreative IV interactive annotation task, targeting the assisted identification of protein–protein interactions described in PubMed abstracts related to neuropathological disorders. When evaluated by expert curators, it obtained positive scores in terms of usability, reliability and performance. These results, together with the provided innovative features, place Egas as a state-of-the-art solution for fast and accurate curation of information, facilitating the task of creating and updating knowledge bases and annotated resources

  16. Solid waste projection model: Database version 1. 0 technical reference manual

    SciTech Connect

    Carr, F.; Bowman, A.

    1990-11-01

    The Solid Waste Projection Model (SWPM) system is an analytical tool developed by Pacific Northwest Laboratory (PNL) for Westinghouse Hanford Company (WHC). The SWPM system provides a modeling and analysis environment that supports decisions in the process of evaluating various solid waste management alternatives. This document, one of a series describing the SWPM system, contains detailed information regarding the software and data structures utilized in developing the SWPM Version 1.0 Database. This document is intended for use by experienced database specialists and supports database maintenance, utility development, and database enhancement. Those interested in using the SWPM database should refer to the SWPM Database User's Guide. 14 figs., 6 tabs.

  17. Curation of characterized glycoside hydrolases of fungal origin.

    PubMed

    Murphy, Caitlin; Powlowski, Justin; Wu, Min; Butler, Greg; Tsang, Adrian

    2011-01-01

    Fungi produce a wide range of extracellular enzymes to break down plant cell walls, which are composed mainly of cellulose, lignin and hemicellulose. Among them are the glycoside hydrolases (GH), the largest and most diverse family of enzymes active on these substrates. To facilitate research and development of enzymes for the conversion of cell-wall polysaccharides into fermentable sugars, we have manually curated a comprehensive set of characterized fungal glycoside hydrolases. Characterized glycoside hydrolases were retrieved from protein and enzyme databases, as well as literature repositories. A total of 453 characterized glycoside hydrolases have been cataloged. They come from 131 different fungal species, most of which belong to the phylum Ascomycota. These enzymes represent 46 different GH activities and cover 44 of the 115 CAZy GH families. In addition to enzyme source and enzyme family, available biochemical properties such as temperature and pH optima, specific activity, kinetic parameters and substrate specificities were recorded. To simplify comparative studies, enzyme and species abbreviations have been standardized, Gene Ontology terms assigned and reference to supporting evidence provided. The annotated genes have been organized in a searchable, online database called mycoCLAP (Characterized Lignocellulose-Active Proteins of fungal origin). It is anticipated that this manually curated collection of biochemically characterized fungal proteins will be used to enhance functional annotation of novel GH genes. Database URL: http://mycoCLAP.fungalgenomics.ca/. PMID:21622642

  18. Curation of characterized glycoside hydrolases of Fungal origin

    PubMed Central

    Murphy, Caitlin; Powlowski, Justin; Wu, Min; Butler, Greg; Tsang, Adrian

    2011-01-01

    Fungi produce a wide range of extracellular enzymes to break down plant cell walls, which are composed mainly of cellulose, lignin and hemicellulose. Among them are the glycoside hydrolases (GH), the largest and most diverse family of enzymes active on these substrates. To facilitate research and development of enzymes for the conversion of cell-wall polysaccharides into fermentable sugars, we have manually curated a comprehensive set of characterized fungal glycoside hydrolases. Characterized glycoside hydrolases were retrieved from protein and enzyme databases, as well as literature repositories. A total of 453 characterized glycoside hydrolases have been cataloged. They come from 131 different fungal species, most of which belong to the phylum Ascomycota. These enzymes represent 46 different GH activities and cover 44 of the 115 CAZy GH families. In addition to enzyme source and enzyme family, available biochemical properties such as temperature and pH optima, specific activity, kinetic parameters and substrate specificities were recorded. To simplify comparative studies, enzyme and species abbreviations have been standardized, Gene Ontology terms assigned and reference to supporting evidence provided. The annotated genes have been organized in a searchable, online database called mycoCLAP (Characterized Lignocellulose-Active Proteins of fungal origin). It is anticipated that this manually curated collection of biochemically characterized fungal proteins will be used to enhance functional annotation of novel GH genes. Database URL: http://mycoCLAP.fungalgenomics.ca/ PMID:21622642

  19. Tracking and coordinating an international curation effort for the CCDS Project

    PubMed Central

    Harte, Rachel A.; Farrell, Catherine M.; Loveland, Jane E.; Suner, Marie-Marthe; Wilming, Laurens; Aken, Bronwen; Barrell, Daniel; Frankish, Adam; Wallin, Craig; Searle, Steve; Diekhans, Mark; Harrow, Jennifer; Pruitt, Kim D.

    2012-01-01

    The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a ‘gold standard’ definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines. Database URL: http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi PMID:22434842

  20. Canto: an online tool for community literature curation

    PubMed Central

    Rutherford, Kim M.; Harris, Midori A.; Lock, Antonia; Oliver, Stephen G.; Wood, Valerie

    2014-01-01

    Motivation: Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species. Availability: Canto code and documentation are available under an Open Source license from http://curation.pombase.org/. Canto is a component of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/). Contact: helpdesk@pombase.org PMID:24574118

  1. Solid Waste Projection Model: Database (Version 1.4). Technical reference manual

    SciTech Connect

    Blackburn, C.; Cillan, T.

    1993-09-01

    The Solid Waste Projection Model (SWPM) system is an analytical tool developed by Pacific Northwest Laboratory (PNL) for Westinghouse Hanford Company (WHC). The SWPM system provides a modeling and analysis environment that supports decisions in the process of evaluating various solid waste management alternatives. This document, one of a series describing the SWPM system, contains detailed information regarding the software and data structures utilized in developing the SWPM Version 1.4 Database. This document is intended for use by experienced database specialists and supports database maintenance, utility development, and database enhancement. Those interested in using the SWPM database should refer to the SWPM Database User`s Guide. This document is available from the PNL Task M Project Manager (D. L. Stiles, 509-372-4358), the PNL Task L Project Manager (L. L. Armacost, 509-372-4304), the WHC Restoration Projects Section Manager (509-372-1443), or the WHC Waste Characterization Manager (509-372-1193).

  2. NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases.

    PubMed

    Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin; Senger, Philipp

    2015-01-01

    Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article's supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in

  3. Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts

    PubMed Central

    Neves, Mariana; Damaschun, Alexander; Mah, Nancy; Lekschas, Fritz; Seltmann, Stefanie; Stachelscheid, Harald; Fontaine, Jean-Fred; Kurtz, Andreas; Leser, Ulf

    2013-01-01

    Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ∼50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org/ PMID:23599415

  4. A visual review of the interactome of LRRK2: Using deep-curated molecular interaction data to represent biology.

    PubMed

    Porras, Pablo; Duesbury, Margaret; Fabregat, Antonio; Ueffing, Marius; Orchard, Sandra; Gloeckner, Christian Johannes; Hermjakob, Henning

    2015-04-01

    Molecular interaction databases are essential resources that enable access to a wealth of information on associations between proteins and other biomolecules. Network graphs generated from these data provide an understanding of the relationships between different proteins in the cell, and network analysis has become a widespread tool supporting -omics analysis. Meaningfully representing this information remains far from trivial and different databases strive to provide users with detailed records capturing the experimental details behind each piece of interaction evidence. A targeted curation approach is necessary to transfer published data generated by primarily low-throughput techniques into interaction databases. In this review we present an example highlighting the value of both targeted curation and the subsequent effective visualization of detailed features of manually curated interaction information. We have curated interactions involving LRRK2, a protein of largely unknown function linked to familial forms of Parkinson's disease, and hosted the data in the IntAct database. This LRRK2-specific dataset was then used to produce different visualization examples highlighting different aspects of the data: the level of confidence in the interaction based on orthogonal evidence, those interactions found under close-to-native conditions, and the enzyme-substrate relationships in different in vitro enzymatic assays. Finally, pathway annotation taken from the Reactome database was overlaid on top of interaction networks to bring biological functional context to interaction maps. PMID:25648416

  5. A visual review of the interactome of LRRK2: Using deep-curated molecular interaction data to represent biology

    PubMed Central

    Porras, Pablo; Duesbury, Margaret; Fabregat, Antonio; Ueffing, Marius; Orchard, Sandra; Gloeckner, Christian Johannes; Hermjakob, Henning

    2015-01-01

    Molecular interaction databases are essential resources that enable access to a wealth of information on associations between proteins and other biomolecules. Network graphs generated from these data provide an understanding of the relationships between different proteins in the cell, and network analysis has become a widespread tool supporting –omics analysis. Meaningfully representing this information remains far from trivial and different databases strive to provide users with detailed records capturing the experimental details behind each piece of interaction evidence. A targeted curation approach is necessary to transfer published data generated by primarily low-throughput techniques into interaction databases. In this review we present an example highlighting the value of both targeted curation and the subsequent effective visualization of detailed features of manually curated interaction information. We have curated interactions involving LRRK2, a protein of largely unknown function linked to familial forms of Parkinson's disease, and hosted the data in the IntAct database. This LRRK2-specific dataset was then used to produce different visualization examples highlighting different aspects of the data: the level of confidence in the interaction based on orthogonal evidence, those interactions found under close-to-native conditions, and the enzyme–substrate relationships in different in vitro enzymatic assays. Finally, pathway annotation taken from the Reactome database was overlaid on top of interaction networks to bring biological functional context to interaction maps. PMID:25648416

  6. SELCTV SYSTEM MANUAL FOR SELCTV AND REFER DATABASES AND THE SELCTV DATA MANAGEMENT PROGRAM

    EPA Science Inventory

    The SELCTV database is a compilation of the side effects of pesticides on arthropod predators and parasitoids that provide biological control of pest arthropods in the agricultural ecosystem. he primary source of side effects data is the published scientific literature; reference...

  7. GSOSTATS Database: USAF Synchronous Satellite Catalog Data Conversion Software. User's Guide and Software Maintenance Manual, Version 2.1

    NASA Technical Reports Server (NTRS)

    Mallasch, Paul G.; Babic, Slavoljub

    1994-01-01

    The United States Air Force (USAF) provides NASA Lewis Research Center with monthly reports containing the Synchronous Satellite Catalog and the associated Two Line Mean Element Sets. The USAF Synchronous Satellite Catalog supplies satellite orbital parameters collected by an automated monitoring system and provided to Lewis Research Center as text files on magnetic tape. Software was developed to facilitate automated formatting, data normalization, cross-referencing, and error correction of Synchronous Satellite Catalog files before loading into the NASA Geosynchronous Satellite Orbital Statistics Database System (GSOSTATS). This document contains the User's Guide and Software Maintenance Manual with information necessary for installation, initialization, start-up, operation, error recovery, and termination of the software application. It also contains implementation details, modification aids, and software source code adaptations for use in future revisions.

  8. Cataloging the biomedical world of pain through semi-automated curation of molecular interactions.

    PubMed

    Jamieson, Daniel G; Roberts, Phoebe M; Robertson, David L; Sidders, Ben; Nenadic, Goran

    2013-01-01

    The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie 'pain', a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can

  9. Cataloging the biomedical world of pain through semi-automated curation of molecular interactions

    PubMed Central

    Jamieson, Daniel G.; Roberts, Phoebe M.; Robertson, David L.; Sidders, Ben; Nenadic, Goran

    2013-01-01

    The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie ‘pain’, a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can

  10. User manual for CSP{_}VANA: A check standards measurement and database program for microwave network analyzers

    SciTech Connect

    Duda, L.E.

    1997-10-01

    Vector network analyzers are a convenient way to measure scattering parameters of a variety of microwave devices. However, these instruments, unlike oscilloscopes for example, require a relatively high degree of user knowledge and expertise. Due to the complexity of the instrument and of the calibration process, there are many ways in which an incorrect measurement may be produced. The Microwave Project, which is part of SNL`s Primary Standards laboratory, routinely uses check standards to verify that the network analyzer is operating properly. In the past, these measurements were recorded manually and, sometimes, interpretation of the results was problematic. To aid the measurement assurance process, a software program was developed to automatically measure a check standard and compare the new measurements with a historical database of measurements of the same device. The program acquires new measurement data from selected check standards, plots the new data against the mean and standard deviation of prior data for the same check standard, and updates the database files for the check standard. The program is entirely menu-driven requiring little additional work by the user. This report describes the function of the software, including a discussion of its capabilities, and the way in which the software is used in the lab.

  11. Human events reference for ATHEANA (HERA) database description and preliminary user`s manual

    SciTech Connect

    Auflick, J.L.; Hahn, H.A.; Pond, D.J.

    1998-05-27

    The Technique for Human Error Analysis (ATHEANA) is a newly developed human reliability analysis (HRA) methodology that aims to facilitate better representation and integration of human performance into probabilistic risk assessment (PRA) modeling and quantification by analyzing risk-significant operating experience in the context of existing behavioral science models. The fundamental premise of ATHEANA is that error-forcing contexts (EFCs), which refer to combinations of equipment/material conditions and performance shaping factors (PSFs), set up or create the conditions under which unsafe actions (UAs) can occur. Because ATHEANA relies heavily on the analysis of operational events that have already occurred as a mechanism for generating creative thinking about possible EFCs, a database, called the Human Events Reference for ATHEANA (HERA), has been developed to support the methodology. This report documents the initial development efforts for HERA.

  12. Human Events Reference for ATHEANA (HERA) Database Description and Preliminary User's Manual

    SciTech Connect

    Auflick, J.L.

    1999-08-12

    The Technique for Human Error Analysis (ATHEANA) is a newly developed human reliability analysis (HRA) methodology that aims to facilitate better representation and integration of human performance into probabilistic risk assessment (PRA) modeling and quantification by analyzing risk-significant operating experience in the context of existing behavioral science models. The fundamental premise of ATHEANA is that error forcing contexts (EFCs), which refer to combinations of equipment/material conditions and performance shaping factors (PSFs), set up or create the conditions under which unsafe actions (UAs) can occur. Because ATHEANA relies heavily on the analysis of operational events that have already occurred as a mechanism for generating creative thinking about possible EFCs, a database (db) of analytical operational events, called the Human Events Reference for ATHEANA (HERA), has been developed to support the methodology. This report documents the initial development efforts for HERA.

  13. The Transporter Classification Database

    PubMed Central

    Saier, Milton H.; Reddy, Vamsee S.; Tamang, Dorjee G.; Västermark, Åke

    2014-01-01

    The Transporter Classification Database (TCDB; http://www.tcdb.org) serves as a common reference point for transport protein research. The database contains more than 10 000 non-redundant proteins that represent all currently recognized families of transmembrane molecular transport systems. Proteins in TCDB are organized in a five level hierarchical system, where the first two levels are the class and subclass, the second two are the family and subfamily, and the last one is the transport system. Superfamilies that contain multiple families are included as hyperlinks to the five tier TC hierarchy. TCDB includes proteins from all types of living organisms and is the only transporter classification system that is both universal and recognized by the International Union of Biochemistry and Molecular Biology. It has been expanded by manual curation, contains extensive text descriptions providing structural, functional, mechanistic and evolutionary information, is supported by unique software and is interconnected to many other relevant databases. TCDB is of increasing usefulness to the international scientific community and can serve as a model for the expansion of database technologies. This manuscript describes an update of the database descriptions previously featured in NAR database issues. PMID:24225317

  14. Implications of compositionality in the gene ontology for its curation and usage.

    PubMed

    Ogren, Philip V; Cohen, K Bretonnel; Hunter, Lawrence

    2005-01-01

    In this paper we argue that a richer underlying representational model for the Gene Ontology that captures the implicit compositional structure of GO terms could have a positive impact on two activities crucial to the success of GO: ontology curation and database annotation. We show that many of the new terms added to GO in a one-year span appear to be compositional variations of other terms. We found that 90.2% of the 3,652 new terms added between July 2003 and July 2004 exhibited characteristics of compositionality. We also examine annotations available from the GO Consortium website that are either manually curated or automatically generated. We found that 74.5% and 63.2% of GO terms are seldom, if ever, used in manual and automatic annotations, respectively. We show that there are features that tend to distinguish terms that are used from those that are not. In order to characterize the effect of compositionality on the combinatorial properties of GO, we employ finite state automata that represent sets of GO terms. This representational tool demonstrates how ontologies can grow very fast, and also shows that small conceptual changes can directly result in a large number of changes to the terminology. We argue that the curation and annotation findings we report are influenced by the combinatorial properties that present themselves in an ontology that does not have a model that properly captures the compositional structure of its terms. PMID:15759624

  15. Curation of Frozen Samples

    NASA Technical Reports Server (NTRS)

    Fletcher, L. A.; Allen, C. C.; Bastien, R.

    2008-01-01

    NASA's Johnson Space Center (JSC) and the Astromaterials Curator are charged by NPD 7100.10D with the curation of all of NASA s extraterrestrial samples, including those from future missions. This responsibility includes the development of new sample handling and preparation techniques; therefore, the Astromaterials Curator must begin developing procedures to preserve, prepare and ship samples at sub-freezing temperatures in order to enable future sample return missions. Such missions might include the return of future frozen samples from permanently-shadowed lunar craters, the nuclei of comets, the surface of Mars, etc. We are demonstrating the ability to curate samples under cold conditions by designing, installing and testing a cold curation glovebox. This glovebox will allow us to store, document, manipulate and subdivide frozen samples while quantifying and minimizing contamination throughout the curation process.

  16. NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases

    PubMed Central

    Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin

    2015-01-01

    Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article’s supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in

  17. The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community.

    PubMed

    Rhee, Seung Yon; Beavis, William; Berardini, Tanya Z; Chen, Guanghong; Dixon, David; Doyle, Aisling; Garcia-Hernandez, Margarita; Huala, Eva; Lander, Gabriel; Montoya, Mary; Miller, Neil; Mueller, Lukas A; Mundodi, Suparna; Reiser, Leonore; Tacklind, Julie; Weems, Dan C; Wu, Yihe; Xu, Iris; Yoo, Daniel; Yoon, Jungwon; Zhang, Peifen

    2003-01-01

    Arabidopsis thaliana is the most widely-studied plant today. The concerted efforts of over 11 000 researchers and 4000 organizations around the world are generating a rich diversity and quantity of information and materials. This information is made available through a comprehensive on-line resource called the Arabidopsis Information Resource (TAIR) (http://arabidopsis.org), which is accessible via commonly used web browsers and can be searched and downloaded in a number of ways. In the last two years, efforts have been focused on increasing data content and diversity, functionally annotating genes and gene products with controlled vocabularies, and improving data retrieval, analysis and visualization tools. New information include sequence polymorphisms including alleles, germplasms and phenotypes, Gene Ontology annotations, gene families, protein information, metabolic pathways, gene expression data from microarray experiments and seed and DNA stocks. New data visualization and analysis tools include SeqViewer, which interactively displays the genome from the whole chromosome down to 10 kb of nucleotide sequence and AraCyc, a metabolic pathway database and map tool that allows overlaying expression data onto the pathway diagrams. Finally, we have recently incorporated seed and DNA stock information from the Arabidopsis Biological Resource Center (ABRC) and implemented a shopping-cart style on-line ordering system. PMID:12519987

  18. Curating the Shelves

    ERIC Educational Resources Information Center

    Schiano, Deborah

    2013-01-01

    Curation: to gather, organize, and present resources in a way that meets information needs and interests, makes sense for virtual as well as physical resources. A Northern New Jersey middle school library made the decision to curate its physical resources according to the needs of its users, and, in so doing, created a shelving system that is,…

  19. ZFIN: enhancements and updates to the zebrafish model organism database

    PubMed Central

    Bradford, Yvonne; Conlin, Tom; Dunn, Nathan; Fashena, David; Frazer, Ken; Howe, Douglas G.; Knight, Jonathan; Mani, Prita; Martin, Ryan; Moxon, Sierra A. T.; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruef, Barbara J.; Ruzicka, Leyla; Bauer Schaper, Holle; Schaper, Kevin; Shao, Xiang; Singer, Amy; Sprague, Judy; Sprunger, Brock; Van Slyke, Ceri; Westerfield, Monte

    2011-01-01

    ZFIN, the Zebrafish Model Organism Database, http://zfin.org, serves as the central repository and web-based resource for zebrafish genetic, genomic, phenotypic and developmental data. ZFIN manually curates comprehensive data for zebrafish genes, phenotypes, genotypes, gene expression, antibodies, anatomical structures and publications. A wide-ranging collection of web-based search forms and tools facilitates access to integrated views of these data promoting analysis and scientific discovery. Data represented in ZFIN are derived from three primary sources: curation of zebrafish publications, individual research laboratories and collaborations with bioinformatics organizations. Data formats include text, images and graphical representations. ZFIN is a dynamic resource with data added daily as part of our ongoing curation process. Software updates are frequent. Here, we describe recent additions to ZFIN including (i) enhanced access to images, (ii) genomic features, (iii) genome browser, (iv) transcripts, (v) antibodies and (vi) a community wiki for protocols and antibodies. PMID:21036866

  20. DIP: The Database of Interacting Proteins

    DOE Data Explorer

    The DIP Database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. By interaction, the DIP Database creators mean that two amino acid chains were experimentally identified to bind to each other. The database lists such pairs to aid those studying a particular protein-protein interaction but also those investigating entire regulatory and signaling pathways as well as those studying the organisation and complexity of the protein interaction network at the cellular level. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data. It is a relational database that can be searched by protein, sequence, motif, article information, and pathBLAST. The website also serves as an access point to a number of projects related to DIP, such as LiveDIP, The Database of Ligand-Receptor Partners (DLRP) and JDIP. Users have free and open access to DIP after login. [Taken from the DIP Guide and the DIP website] (Specialized Interface) (Registration Required)

  1. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

    PubMed Central

    O'Leary, Nuala A.; Wright, Mathew W.; Brister, J. Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M.; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S.; Kodali, Vamsi K.; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M.; Murphy, Michael R.; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H.; Rausch, Daniel; Riddick, Lillian D.; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S.; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E.; Vatsan, Anjana R.; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J.; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D.; Pruitt, Kim D.

    2016-01-01

    The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. PMID:26553804

  2. Mining clinical attributes of genomic variants through assisted literature curation in Egas.

    PubMed

    Matos, Sérgio; Campos, David; Pinho, Renato; Silva, Raquel M; Mort, Matthew; Cooper, David N; Oliveira, José Luís

    2016-01-01

    The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/Database URL: https://demo.bmd-software.com/egas/. PMID:27278817

  3. Mining clinical attributes of genomic variants through assisted literature curation in Egas

    PubMed Central

    Matos, Sérgio; Campos, David; Pinho, Renato; Silva, Raquel M.; Mort, Matthew; Cooper, David N.; Oliveira, José Luís

    2016-01-01

    The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/. Database URL: https://demo.bmd-software.com/egas/ PMID:27278817

  4. Rice Annotation Database (RAD): a contig-oriented database for map-based rice genomics.

    PubMed

    Ito, Yuichi; Arikawa, Kohji; Antonio, Baltazar A; Ohta, Isamu; Naito, Shinji; Mukai, Yoshiyuki; Shimano, Atsuko; Masukawa, Masatoshi; Shibata, Michie; Yamamoto, Mayu; Ito, Yukiyo; Yokoyama, Junri; Sakai, Yasumichi; Sakata, Katsumi; Nagamura, Yoshiaki; Namiki, Nobukazu; Matsumoto, Takashi; Higo, Kenichi; Sasaki, Takuji

    2005-01-01

    A contig-oriented database for annotation of the rice genome has been constructed to facilitate map-based rice genomics. The Rice Annotation Database has the following functional features: (i) extensive effort of manual annotations of P1-derived artificial chromosome/bacterial artificial chromosome clones can be merged at chromosome and contig-level; (ii) concise visualization of the annotation information such as the predicted genes, results of various prediction programs (RiceHMM, Genscan, Genscan+, Fgenesh, GeneMark, etc.), homology to expressed sequence tag, full-length cDNA and protein; (iii) user-friendly clone / gene query system; (iv) download functions for nucleotide, amino acid and coding sequences; (v) analysis of various features of the genome (GC-content, average value, etc.); and (vi) genome-wide homology search (BLAST) of contig- and chromosome-level genome sequence to allow comparative analysis with the genome sequence of other organisms. As of October 2004, the database contains a total of 215 Mb sequence with relevant annotation results including 30 000 manually curated genes. The database can provide the latest information on manual annotation as well as a comprehensive structural analysis of various features of the rice genome. The database can be accessed at http://rad.dna.affrc.go.jp/. PMID:15608281

  5. HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    PubMed

    Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D

    2016-01-01

    The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease. PMID:26578596

  6. MGDB: a comprehensive database of genes involved in melanoma

    PubMed Central

    Zhang, Di; Zhu, Rongrong; Zhang, Hanqian; Zheng, Chun-Hou; Xia, Junfeng

    2015-01-01

    The Melanoma Gene Database (MGDB) is a manually curated catalog of molecular genetic data relating to genes involved in melanoma. The main purpose of this database is to establish a network of melanoma related genes and to facilitate the mechanistic study of melanoma tumorigenesis. The entries describing the relationships between melanoma and genes in the current release were manually extracted from PubMed abstracts, which contains cumulative to date 527 human melanoma genes (422 protein-coding and 105 non-coding genes). Each melanoma gene was annotated in seven different aspects (General Information, Expression, Methylation, Mutation, Interaction, Pathway and Drug). In addition, manually curated literature references have also been provided to support the inclusion of the gene in MGDB and establish its association with melanoma. MGDB has a user-friendly web interface with multiple browse and search functions. We hoped MGDB will enrich our knowledge about melanoma genetics and serve as a useful complement to the existing public resources. Database URL: http://bioinfo.ahu.edu.cn:8080/Melanoma/index.jsp PMID:26424083

  7. MGDB: a comprehensive database of genes involved in melanoma.

    PubMed

    Zhang, Di; Zhu, Rongrong; Zhang, Hanqian; Zheng, Chun-Hou; Xia, Junfeng

    2015-01-01

    The Melanoma Gene Database (MGDB) is a manually curated catalog of molecular genetic data relating to genes involved in melanoma. The main purpose of this database is to establish a network of melanoma related genes and to facilitate the mechanistic study of melanoma tumorigenesis. The entries describing the relationships between melanoma and genes in the current release were manually extracted from PubMed abstracts, which contains cumulative to date 527 human melanoma genes (422 protein-coding and 105 non-coding genes). Each melanoma gene was annotated in seven different aspects (General Information, Expression, Methylation, Mutation, Interaction, Pathway and Drug). In addition, manually curated literature references have also been provided to support the inclusion of the gene in MGDB and establish its association with melanoma. MGDB has a user-friendly web interface with multiple browse and search functions. We hoped MGDB will enrich our knowledge about melanoma genetics and serve as a useful complement to the existing public resources. Database URL: http://bioinfo.ahu.edu.cn:8080/Melanoma/index.jsp. PMID:26424083

  8. Southern African Treatment Resistance Network (SATuRN) RegaDB HIV drug resistance and clinical management database: supporting patient management, surveillance and research in southern Africa.

    PubMed

    Manasa, Justen; Lessells, Richard; Rossouw, Theresa; Naidu, Kevindra; Van Vuuren, Cloete; Goedhals, Dominique; van Zyl, Gert; Bester, Armand; Skingsley, Andrew; Stott, Katharine; Danaviah, Siva; Chetty, Terusha; Singh, Lavanya; Moodley, Pravi; Iwuji, Collins; McGrath, Nuala; Seebregts, Christopher J; de Oliveira, Tulio

    2014-01-01

    Substantial amounts of data have been generated from patient management and academic exercises designed to better understand the human immunodeficiency virus (HIV) epidemic and design interventions to control it. A number of specialized databases have been designed to manage huge data sets from HIV cohort, vaccine, host genomic and drug resistance studies. Besides databases from cohort studies, most of the online databases contain limited curated data and are thus sequence repositories. HIV drug resistance has been shown to have a great potential to derail the progress made thus far through antiretroviral therapy. Thus, a lot of resources have been invested in generating drug resistance data for patient management and surveillance purposes. Unfortunately, most of the data currently available relate to subtype B even though >60% of the epidemic is caused by HIV-1 subtype C. A consortium of clinicians, scientists, public health experts and policy markers working in southern Africa came together and formed a network, the Southern African Treatment and Resistance Network (SATuRN), with the aim of increasing curated HIV-1 subtype C and tuberculosis drug resistance data. This article describes the HIV-1 data curation process using the SATuRN Rega database. The data curation is a manual and time-consuming process done by clinical, laboratory and data curation specialists. Access to the highly curated data sets is through applications that are reviewed by the SATuRN executive committee. Examples of research outputs from the analysis of the curated data include trends in the level of transmitted drug resistance in South Africa, analysis of the levels of acquired resistance among patients failing therapy and factors associated with the absence of genotypic evidence of drug resistance among patients failing therapy. All these studies have been important for informing first- and second-line therapy. This database is a free password-protected open source database available on

  9. Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12

    PubMed Central

    Gama-Castro, Socorro; López-Fuentes, Alejandra; Balderas-Martínez, Yalbi Itzel; Clematide, Simon; Ellendorff, Tilia Renate; Santos-Zavaleta, Alberto; Marques-Madeira, Hernani; Collado-Vides, Julio

    2014-01-01

    Given the current explosion of data within original publications generated in the field of genomics, a recognized bottleneck is the transfer of such knowledge into comprehensive databases. We have for years organized knowledge on transcriptional regulation reported in the original literature of Escherichia coli K-12 into RegulonDB (http://regulondb.ccg.unam.mx), our database that is currently supported by >5000 papers. Here, we report a first step towards the automatic biocuration of growth conditions in this corpus. Using the OntoGene text-mining system (http://www.ontogene.org), we extracted and manually validated regulatory interactions and growth conditions in a new approach based on filters that enable the curator to select informative sentences from preprocessed full papers. Based on a set of 48 papers dealing with oxidative stress by OxyR, we were able to retrieve 100% of the OxyR regulatory interactions present in RegulonDB, including the transcription factors and their effect on target genes. Our strategy was designed to extract, as we did, their growth conditions. This result provides a proof of concept for a more direct and efficient curation process, and enables us to define the strategy of the subsequent steps to be implemented for a semi-automatic curation of original literature dealing with regulation of gene expression in bacteria. This project will enhance the efficiency and quality of the curation of knowledge present in the literature of gene regulation, and contribute to a significant increase in the encoding of the regulatory network of E. coli. RegulonDB Database URL: http://regulondb.ccg.unam.mx OntoGene URL: http://www.ontogene.org PMID:24903516

  10. Qrator: A web-based curation tool for glycan structures

    PubMed Central

    Eavenson, Matthew; Kochut, Krys J; Miller, John A; Ranzinger, René; Tiemeyer, Michael; Aoki, Kazuhiro; York, William S

    2015-01-01

    Most currently available glycan structure databases use their own proprietary structure representation schema and contain numerous annotation errors. These cause problems when glycan databases are used for the annotation or mining of data generated in the laboratory. Due to the complexity of glycan structures, curating these databases is often a tedious and labor-intensive process. However, rigorously validating glycan structures can be made easier with a curation workflow that incorporates a structure-matching algorithm that compares candidate glycans to a canonical tree that embodies structural features consistent with established mechanisms for the biosynthesis of a particular class of glycans. To this end, we have implemented Qrator, a web-based application that uses a combination of external literature and database references, user annotations and canonical trees to assist and guide researchers in making informed decisions while curating glycans. Using this application, we have started the curation of large numbers of N-glycans, O-glycans and glycosphingolipids. Our curation workflow allows creating and extending canonical trees for these classes of glycans, which have subsequently been used to improve the curation workflow. PMID:25165068

  11. Qrator: a web-based curation tool for glycan structures.

    PubMed

    Eavenson, Matthew; Kochut, Krys J; Miller, John A; Ranzinger, René; Tiemeyer, Michael; Aoki, Kazuhiro; York, William S

    2015-01-01

    Most currently available glycan structure databases use their own proprietary structure representation schema and contain numerous annotation errors. These cause problems when glycan databases are used for the annotation or mining of data generated in the laboratory. Due to the complexity of glycan structures, curating these databases is often a tedious and labor-intensive process. However, rigorously validating glycan structures can be made easier with a curation workflow that incorporates a structure-matching algorithm that compares candidate glycans to a canonical tree that embodies structural features consistent with established mechanisms for the biosynthesis of a particular class of glycans. To this end, we have implemented Qrator, a web-based application that uses a combination of external literature and database references, user annotations and canonical trees to assist and guide researchers in making informed decisions while curating glycans. Using this application, we have started the curation of large numbers of N-glycans, O-glycans and glycosphingolipids. Our curation workflow allows creating and extending canonical trees for these classes of glycans, which have subsequently been used to improve the curation workflow. PMID:25165068

  12. The Immune Epitope Database 2.0

    PubMed Central

    Vita, Randi; Zarebski, Laura; Greenbaum, Jason A.; Emami, Hussein; Hoof, Ilka; Salimi, Nima; Damle, Rohini; Sette, Alessandro; Peters, Bjoern

    2010-01-01

    The Immune Epitope Database (IEDB, www.iedb.org) provides a catalog of experimentally characterized B and T cell epitopes, as well as data on Major Histocompatibility Complex (MHC) binding and MHC ligand elution experiments. The database represents the molecular structures recognized by adaptive immune receptors and the experimental contexts in which these molecules were determined to be immune epitopes. Epitopes recognized in humans, nonhuman primates, rodents, pigs, cats and all other tested species are included. Both positive and negative experimental results are captured. Over the course of 4 years, the data from 180 978 experiments were curated manually from the literature, which covers ∼99% of all publicly available information on peptide epitopes mapped in infectious agents (excluding HIV) and 93% of those mapped in allergens. In addition, data that would otherwise be unavailable to the public from 129 186 experiments were submitted directly by investigators. The curation of epitopes related to autoimmunity is expected to be completed by the end of 2010. The database can be queried by epitope structure, source organism, MHC restriction, assay type or host organism, among other criteria. The database structure, as well as its querying, browsing and reporting interfaces, was completely redesigned for the IEDB 2.0 release, which became publicly available in early 2009. PMID:19906713

  13. The BioGRID interaction database: 2013 update.

    PubMed

    Chatr-Aryamontri, Andrew; Breitkreutz, Bobby-Joe; Heinicke, Sven; Boucher, Lorrie; Winter, Andrew; Stark, Chris; Nixon, Julie; Ramage, Lindsay; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Breitkreutz, Ashton; Sellam, Adnane; Chen, Daici; Chang, Christie; Rust, Jennifer; Livstone, Michael; Oughtred, Rose; Dolinski, Kara; Tyers, Mike

    2013-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: http//thebiogrid.org) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species. As of September 2012, BioGRID houses more than 500 000 manually annotated interactions from more than 30 model organisms. BioGRID maintains complete curation coverage of the literature for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe and the model plant Arabidopsis thaliana. A number of themed curation projects in areas of biomedical importance are also supported. BioGRID has established collaborations and/or shares data records for the annotation of interactions and phenotypes with most major model organism databases, including Saccharomyces Genome Database, PomBase, WormBase, FlyBase and The Arabidopsis Information Resource. BioGRID also actively engages with the text-mining community to benchmark and deploy automated tools to expedite curation workflows. BioGRID data are freely accessible through both a user-defined interactive interface and in batch downloads in a wide variety of formats, including PSI-MI2.5 and tab-delimited files. BioGRID records can also be interrogated and analyzed with a series of new bioinformatics tools, which include a post-translational modification viewer, a graphical viewer, a REST service and a Cytoscape plugin. PMID:23203989

  14. Reactome: a database of reactions, pathways and biological processes.

    PubMed

    Croft, David; O'Kelly, Gavin; Wu, Guanming; Haw, Robin; Gillespie, Marc; Matthews, Lisa; Caudy, Michael; Garapati, Phani; Gopinath, Gopal; Jassal, Bijay; Jupe, Steven; Kalatskaya, Irina; Mahajan, Shahana; May, Bruce; Ndegwa, Nelson; Schmidt, Esther; Shamovsky, Veronica; Yung, Christina; Birney, Ewan; Hermjakob, Henning; D'Eustachio, Peter; Stein, Lincoln

    2011-01-01

    Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice. PMID:21067998

  15. BacMet: antibacterial biocide and metal resistance genes database

    PubMed Central

    Pal, Chandan; Bengtsson-Palme, Johan; Rensing, Christopher; Kristiansson, Erik; Larsson, D. G. Joakim

    2014-01-01

    Antibiotic resistance has become a major human health concern due to widespread use, misuse and overuse of antibiotics. In addition to antibiotics, antibacterial biocides and metals can contribute to the development and maintenance of antibiotic resistance in bacterial communities through co-selection. Information on metal and biocide resistance genes, including their sequences and molecular functions, is, however, scattered. Here, we introduce BacMet (http://bacmet.biomedicine.gu.se)—a manually curated database of antibacterial biocide- and metal-resistance genes based on an in-depth review of the scientific literature. The BacMet database contains 470 experimentally verified resistance genes. In addition, the database also contains 25 477 potential resistance genes collected from public sequence repositories. All resistance genes in the BacMet database have been organized according to their molecular function and induced resistance phenotype. PMID:24304895

  16. Curating NASA's Past, Present, and Future Astromaterial Sample Collections

    NASA Technical Reports Server (NTRS)

    Zeigler, R. A.; Allton, J. H.; Evans, C. A.; Fries, M. D.; McCubbin, F. M.; Nakamura-Messenger, K.; Righter, K.; Zolensky, M.; Stansbery, E. K.

    2016-01-01

    The Astromaterials Acquisition and Curation Office at NASA Johnson Space Center (hereafter JSC curation) is responsible for curating all of NASA's extraterrestrial samples. JSC presently curates 9 different astromaterials collections in seven different clean-room suites: (1) Apollo Samples (ISO (International Standards Organization) class 6 + 7); (2) Antarctic Meteorites (ISO 6 + 7); (3) Cosmic Dust Particles (ISO 5); (4) Microparticle Impact Collection (ISO 7; formerly called Space-Exposed Hardware); (5) Genesis Solar Wind Atoms (ISO 4); (6) Stardust Comet Particles (ISO 5); (7) Stardust Interstellar Particles (ISO 5); (8) Hayabusa Asteroid Particles (ISO 5); (9) OSIRIS-REx Spacecraft Coupons and Witness Plates (ISO 7). Additional cleanrooms are currently being planned to house samples from two new collections, Hayabusa 2 (2021) and OSIRIS-REx (2023). In addition to the labs that house the samples, we maintain a wide variety of infra-structure facilities required to support the clean rooms: HEPA-filtered air-handling systems, ultrapure dry gaseous nitrogen systems, an ultrapure water system, and cleaning facilities to provide clean tools and equipment for the labs. We also have sample preparation facilities for making thin sections, microtome sections, and even focused ion-beam sections. We routinely monitor the cleanliness of our clean rooms and infrastructure systems, including measurements of inorganic or organic contamination, weekly airborne particle counts, compositional and isotopic monitoring of liquid N2 deliveries, and daily UPW system monitoring. In addition to the physical maintenance of the samples, we track within our databases the current and ever changing characteristics (weight, location, etc.) of more than 250,000 individually numbered samples across our various collections, as well as more than 100,000 images, and countless "analog" records that record the sample processing records of each individual sample. JSC Curation is co-located with JSC

  17. MicroScope--an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data.

    PubMed

    Vallenet, David; Belda, Eugeni; Calteau, Alexandra; Cruveiller, Stéphane; Engelen, Stefan; Lajus, Aurélie; Le Fèvre, François; Longin, Cyrille; Mornico, Damien; Roche, David; Rouy, Zoé; Salvignol, Gregory; Scarpelli, Claude; Thil Smith, Adam Alexander; Weiman, Marion; Médigue, Claudine

    2013-01-01

    MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest. PMID:23193269

  18. NeXO Web: the NeXO ontology database and visualization platform

    PubMed Central

    Dutkowski, Janusz; Ono, Keiichiro; Kramer, Michael; Yu, Michael; Pratt, Dexter; Demchak, Barry; Ideker, Trey

    2014-01-01

    The Network-extracted Ontology (NeXO) is a gene ontology inferred directly from large-scale molecular networks. While most ontologies are constructed through manual expert curation, NeXO uses a principled computational approach which integrates evidence from hundreds of thousands of individual gene and protein interactions to construct a global hierarchy of cellular components and processes. Here, we describe the development of the NeXO Web platform (http://www.nexontology.org)—an online database and graphical user interface for visualizing, browsing and performing term enrichment analysis using NeXO and the gene ontology. The platform applies state-of-the-art web technology and visualization techniques to provide an intuitive framework for investigating biological machinery captured by both data-driven and manually curated ontologies. PMID:24271398

  19. PomBase 2015: updates to the fission yeast database

    PubMed Central

    McDowall, Mark D.; Harris, Midori A.; Lock, Antonia; Rutherford, Kim; Staines, Daniel M.; Bähler, Jürg; Kersey, Paul J.; Oliver, Stephen G.; Wood, Valerie

    2015-01-01

    PomBase (http://www.pombase.org) is the model organism database for the fission yeast Schizosaccharomyces pombe. PomBase provides a central hub for the fission yeast community, supporting both exploratory and hypothesis-driven research. It provides users easy access to data ranging from the sequence level, to molecular and phenotypic annotations, through to the display of genome-wide high-throughput studies. Recent improvements to the site extend annotation specificity, improve usability and allow for monthly data updates. Both in-house curators and community researchers provide manually curated data to PomBase. The genome browser provides access to published high-throughput data sets and the genomes of three additional Schizosaccharomyces species (Schizosaccharomyces cryophilus, Schizosaccharomyces japonicus and Schizosaccharomyces octosporus). PMID:25361970

  20. Cazymes Analysis Toolkit (CAT): Webservice for searching and analyzing carbohydrateactive enzymes in a newly sequenced organism using CAZy database

    SciTech Connect

    Karpinets, Tatiana V; Park, Byung; Syed, Mustafa H; Uberbacher, Edward C; Leuze, Michael Rex

    2010-01-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire non-redundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains (DUF) and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit (CAT), and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

  1. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies

    PubMed Central

    Cserhati, Matyas F.; Pandey, Sanjit; Beaudoin, James J.; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S.

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33 017 407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. Database URL: http://nntc-dcc.unmc.edu PMID:26228431

  2. [Proliferative vitreoretinopathy: curative treatment].

    PubMed

    Chiquet, C; Rouberol, F

    2014-10-01

    Proliferative vitreoretinopathy (PVR), which causes contractile fibrocellular membranes that may prevent retinal reattachment, remains one of the most severe complications of rhegmatogenous retinal detachment (RD), with an incidence of 5-11%, and one of the most frequent causes of surgical failure (50-75%). Its severity is due to the complexity of the surgery required to treat patients, and to its uncertain anatomic and functional prognosis. Curative treatment of PVR includes vitrectomy, sometimes associated with phacoemulsification or scleral buckling; systematic peeling of epiretinal membranes, occasionally retinectomy; and systematic retinopexy by endolaser photocoagulation. The current preferred internal tamponade is silicone oil. Silicone oils of various densities are undergoing comparative studies. PMID:24997865

  3. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases

    PubMed Central

    Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    Motivation First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. Results We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material. PMID:27489953

  4. The Protein-DNA Interface database

    PubMed Central

    2010-01-01

    The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 Å or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface. We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes. PMID:20482798

  5. DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters

    PubMed Central

    Ichikawa, Natsuko; Sasagawa, Machi; Yamamoto, Mika; Komaki, Hisayuki; Yoshida, Yumi; Yamazaki, Shuji; Fujita, Nobuyuki

    2013-01-01

    This article introduces DoBISCUIT (Database of BIoSynthesis clusters CUrated and InTegrated, http://www.bio.nite.go.jp/pks/), a literature-based, manually curated database of gene clusters for secondary metabolite biosynthesis. Bacterial secondary metabolites often show pharmacologically important activities and can serve as lead compounds and/or candidates for drug development. Biosynthesis of each secondary metabolite is catalyzed by a number of enzymes, usually encoded by a gene cluster. Although many scientific papers describe such gene clusters, the gene information is not always described in a comprehensive manner and the related information is rarely integrated. DoBISCUIT integrates the latest literature information and provides standardized gene/module/domain descriptions related to the gene clusters. PMID:23185043

  6. The Structure–Function Linkage Database

    PubMed Central

    Akiva, Eyal; Brown, Shoshana; Almonacid, Daniel E.; Barber, Alan E.; Custer, Ashley F.; Hicks, Michael A.; Huang, Conrad C.; Lauck, Florian; Mashiyama, Susan T.; Meng, Elaine C.; Mischel, David; Morris, John H.; Ojha, Sunil; Schnoes, Alexandra M.; Stryke, Doug; Yunes, Jeffrey M.; Ferrin, Thomas E.; Holliday, Gemma L.; Babbitt, Patricia C.

    2014-01-01

    The Structure–Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure–function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies ‘look alike’, making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity. PMID:24271399

  7. Biocuration at the Saccharomyces genome database.

    PubMed

    Skrzypek, Marek S; Nash, Robert S

    2015-08-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. PMID:25997651

  8. Biocuration at the Saccharomyces Genome Database

    PubMed Central

    Skrzypek, Marek S.; Nash, Robert S.

    2015-01-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. PMID:25997651

  9. DDRprot: a database of DNA damage response-related proteins

    PubMed Central

    Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M.

    2016-01-01

    The DNA Damage Response (DDR) signalling network is an essential system that protects the genome’s integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used. Database URL: http://ddr.cbbio.es. PMID:27577567

  10. DDRprot: a database of DNA damage response-related proteins.

    PubMed

    Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M

    2016-01-01

    The DNA Damage Response (DDR) signalling network is an essential system that protects the genome's integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.Database URL: http://ddr.cbbio.es. PMID:27577567

  11. Building a genome database using an object-oriented approach.

    PubMed

    Barbasiewicz, Anna; Liu, Lin; Lang, B Franz; Burger, Gertraud

    2002-01-01

    GOBASE is a relational database that integrates data associated with mitochondria and chloroplasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions in records retrieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language. PMID:12542407

  12. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies.

    PubMed

    Cserhati, Matyas F; Pandey, Sanjit; Beaudoin, James J; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33,017,407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. PMID:26228431

  13. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015

    PubMed Central

    Davis, Allan Peter; Grondin, Cynthia J.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; King, Benjamin L.; Wiegers, Thomas C.; Mattingly, Carolyn J.

    2015-01-01

    Ten years ago, the Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) was developed out of a need to formalize, harmonize and centralize the information on numerous genes and proteins responding to environmental toxic agents across diverse species. CTD's initial approach was to facilitate comparisons of nucleotide and protein sequences of toxicologically significant genes by curating these sequences and electronically annotating them with chemical terms from their associated references. Since then, however, CTD has vastly expanded its scope to robustly represent a triad of chemical–gene, chemical–disease and gene–disease interactions that are manually curated from the scientific literature by professional biocurators using controlled vocabularies, ontologies and structured notation. Today, CTD includes 24 million toxicogenomic connections relating chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, Gene Ontology annotations, pathways and interaction modules. In this 10th year anniversary update, we outline the evolution of CTD, including our increased data content, new ‘Pathway View’ visualization tool, enhanced curation practices, pilot chemical–phenotype results and impending exposure data set. The prototype database originally described in our first report has transformed into a sophisticated resource used actively today to help scientists develop and test hypotheses about the etiologies of environmentally influenced diseases. PMID:25326323

  14. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015.

    PubMed

    Davis, Allan Peter; Grondin, Cynthia J; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; King, Benjamin L; Wiegers, Thomas C; Mattingly, Carolyn J

    2015-01-01

    Ten years ago, the Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) was developed out of a need to formalize, harmonize and centralize the information on numerous genes and proteins responding to environmental toxic agents across diverse species. CTD's initial approach was to facilitate comparisons of nucleotide and protein sequences of toxicologically significant genes by curating these sequences and electronically annotating them with chemical terms from their associated references. Since then, however, CTD has vastly expanded its scope to robustly represent a triad of chemical-gene, chemical-disease and gene-disease interactions that are manually curated from the scientific literature by professional biocurators using controlled vocabularies, ontologies and structured notation. Today, CTD includes 24 million toxicogenomic connections relating chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, Gene Ontology annotations, pathways and interaction modules. In this 10th year anniversary update, we outline the evolution of CTD, including our increased data content, new 'Pathway View' visualization tool, enhanced curation practices, pilot chemical-phenotype results and impending exposure data set. The prototype database originally described in our first report has transformed into a sophisticated resource used actively today to help scientists develop and test hypotheses about the etiologies of environmentally influenced diseases. PMID:25326323

  15. Pancreatic Cancer Database

    PubMed Central

    Thomas, Joji Kurian; Kim, Min-Sik; Balakrishnan, Lavanya; Nanjappa, Vishalakshi; Raju, Rajesh; Marimuthu, Arivusudar; Radhakrishnan, Aneesha; Muthusamy, Babylakshmi; Khan, Aafaque Ahmad; Sakamuri, Sruthi; Tankala, Shantal Gupta; Singal, Mukul; Nair, Bipin; Sirdeshmukh, Ravi; Chatterjee, Aditi; Prasad, T S Keshava; Maitra, Anirban; Gowda, Harsha; Hruban, Ralph H; Pandey, Akhilesh

    2014-01-01

    Pancreatic cancer is the fourth leading cause of cancer-related death in the world. The etiology of pancreatic cancer is heterogeneous with a wide range of alterations that have already been reported at the level of the genome, transcriptome, and proteome. The past decade has witnessed a large number of experimental studies using high-throughput technology platforms to identify genes whose expression at the transcript or protein levels is altered in pancreatic cancer. Based on expression studies, a number of molecules have also been proposed as potential biomarkers for diagnosis and prognosis of this deadly cancer. Currently, there are no repositories which provide an integrative view of multiple Omics data sets from published research on pancreatic cancer. Here, we describe the development of a web-based resource, Pancreatic Cancer Database (http://www.pancreaticcancerdatabase.org), as a unified platform for pancreatic cancer research. PCD contains manually curated information pertaining to quantitative alterations in miRNA, mRNA, and proteins obtained from small-scale as well as high-throughput studies of pancreatic cancer tissues and cell lines. We believe that PCD will serve as an integrative platform for scientific community involved in pancreatic cancer research. PMID:24839966

  16. The Protein Ensemble Database.

    PubMed

    Varadi, Mihaly; Tompa, Peter

    2015-01-01

    The scientific community's major conceptual notion of structural biology has recently shifted in emphasis from the classical structure-function paradigm due to the emergence of intrinsically disordered proteins (IDPs). As opposed to their folded cousins, these proteins are defined by the lack of a stable 3D fold and a high degree of inherent structural heterogeneity that is closely tied to their function. Due to their flexible nature, solution techniques such as small-angle X-ray scattering (SAXS), nuclear magnetic resonance (NMR) spectroscopy and fluorescence resonance energy transfer (FRET) are particularly well-suited for characterizing their biophysical properties. Computationally derived structural ensembles based on such experimental measurements provide models of the conformational sampling displayed by these proteins, and they may offer valuable insights into the functional consequences of inherent flexibility. The Protein Ensemble Database (http://pedb.vib.be) is the first openly accessible, manually curated online resource storing the ensemble models, protocols used during the calculation procedure, and underlying primary experimental data derived from SAXS and/or NMR measurements. By making this previously inaccessible data freely available to researchers, this novel resource is expected to promote the development of more advanced modelling methodologies, facilitate the design of standardized calculation protocols, and consequently lead to a better understanding of how function arises from the disordered state. PMID:26387108

  17. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  18. CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

    PubMed

    Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C

    2010-12-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi. PMID:20696711

  19. VHLdb: A database of von Hippel-Lindau protein interactors and mutations.

    PubMed

    Tabaro, Francesco; Minervini, Giovanni; Sundus, Faiza; Quaglia, Federica; Leonardi, Emanuela; Piovesan, Damiano; Tosatto, Silvio C E

    2016-01-01

    Mutations in von Hippel-Lindau tumor suppressor protein (pVHL) predispose to develop tumors affecting specific target organs, such as the retina, epididymis, adrenal glands, pancreas and kidneys. Currently, more than 400 pVHL interacting proteins are either described in the literature or predicted in public databases. This data is scattered among several different sources, slowing down the comprehension of pVHL's biological role. Here we present VHLdb, a novel database collecting available interaction and mutation data on pVHL to provide novel integrated annotations. In VHLdb, pVHL interactors are organized according to two annotation levels, manual and automatic. Mutation data are easily accessible and a novel visualization tool has been implemented. A user-friendly feedback function to improve database content through community-driven curation is also provided. VHLdb presently contains 478 interactors, of which 117 have been manually curated, and 1,074 mutations. This makes it the largest available database for pVHL-related information. VHLdb is available from URL: http://vhldb.bio.unipd.it/. PMID:27511743

  20. MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins

    PubMed Central

    Potenza, Emilio; Domenico, Tomás Di; Walsh, Ian; Tosatto, Silvio C.E.

    2015-01-01

    MobiDB (http://mobidb.bio.unipd.it/) is a database of intrinsically disordered and mobile proteins. Intrinsically disordered regions are key for the function of numerous proteins. Here we provide a new version of MobiDB, a centralized source aimed at providing the most complete picture on different flavors of disorder in protein structures covering all UniProt sequences (currently over 80 million). The database features three levels of annotation: manually curated, indirect and predicted. Manually curated data is extracted from the DisProt database. Indirect data is inferred from PDB structures that are considered an indication of intrinsic disorder. The 10 predictors currently included (three ESpritz flavors, two IUPred flavors, two DisEMBL flavors, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein in absence of more reliable data. The new version also features a consensus annotation and classification for long disordered regions. In order to complement the disorder annotations, MobiDB features additional annotations from external sources. Annotations from the UniProt database include post-translational modifications and linear motifs. Pfam annotations are displayed in graphical form and are link-enabled, allowing the user to visit the corresponding Pfam page for further information. Experimental protein–protein interactions from STRING are also classified for disorder content. PMID:25361972

  1. VHLdb: A database of von Hippel-Lindau protein interactors and mutations

    PubMed Central

    Tabaro, Francesco; Minervini, Giovanni; Sundus, Faiza; Quaglia, Federica; Leonardi, Emanuela; Piovesan, Damiano; Tosatto, Silvio C. E.

    2016-01-01

    Mutations in von Hippel-Lindau tumor suppressor protein (pVHL) predispose to develop tumors affecting specific target organs, such as the retina, epididymis, adrenal glands, pancreas and kidneys. Currently, more than 400 pVHL interacting proteins are either described in the literature or predicted in public databases. This data is scattered among several different sources, slowing down the comprehension of pVHL’s biological role. Here we present VHLdb, a novel database collecting available interaction and mutation data on pVHL to provide novel integrated annotations. In VHLdb, pVHL interactors are organized according to two annotation levels, manual and automatic. Mutation data are easily accessible and a novel visualization tool has been implemented. A user-friendly feedback function to improve database content through community-driven curation is also provided. VHLdb presently contains 478 interactors, of which 117 have been manually curated, and 1,074 mutations. This makes it the largest available database for pVHL-related information. VHLdb is available from URL: http://vhldb.bio.unipd.it/. PMID:27511743

  2. JSC Stardust Curation Team

    NASA Technical Reports Server (NTRS)

    Zolensky, Michael E.

    2000-01-01

    STARDUST, a NASA Discovery-class mission, is the first to return samples from a comet. Grains from comet Wild 2's coma-the gas and dust envelope that surrounds the nucleus-will be collected as well as interstellar dust. The mission which launched on February 7, 1999 will encounter the comet on January 10, 2004. As the spacecraft passes through the coma, a tray of silica aerogel will be exposed, and coma grains will impact there and become captured. Following the collection, the aerogel tray is closed for return to Earth in 2006. A dust impact mass spectrometer on board the STARDUST spacecraft will be used to gather spectra. of dust during the entire mission, including the coma passage. This instrument will be the best chance to obtain data on volatile grains, which will not be well-collected in the aerogel. The dust impact mass spectrometer will also be used to study the composition of interstellar grains. In the past 5 years, analysis of data from dust detectors aboard the Ulysses and Galileo spacecraft have revealed that there is a stream of interstellar dust flowing through our solar system. These grains will be captured during the cruise phase of the STARDUST mission, as the spacecraft travels toward the comet. The sample return capsule will parachute to Earth in February 2006, and will land in western Utah. Once on y the ground, the sample return capsule will be placed into a dry nitrogen environment and flown to the curation lab at JSC.

  3. FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana

    PubMed Central

    Bouché, Frédéric; Lobet, Guillaume; Tocquin, Pierre; Périlleux, Claire

    2016-01-01

    Flowering is a hot topic in Plant Biology and important progress has been made in Arabidopsis thaliana toward unraveling the genetic networks involved. The increasing complexity and the explosion of literature however require development of new tools for information management and update. We therefore created an evolutive and interactive database of flowering time genes, named FLOR-ID (Flowering-Interactive Database), which is freely accessible at http://www.flor-id.org. The hand-curated database contains information on 306 genes and links to 1595 publications gathering the work of >4500 authors. Gene/protein functions and interactions within the flowering pathways were inferred from the analysis of related publications, included in the database and translated into interactive manually drawn snapshots. PMID:26476447

  4. FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana.

    PubMed

    Bouché, Frédéric; Lobet, Guillaume; Tocquin, Pierre; Périlleux, Claire

    2016-01-01

    Flowering is a hot topic in Plant Biology and important progress has been made in Arabidopsis thaliana toward unraveling the genetic networks involved. The increasing complexity and the explosion of literature however require development of new tools for information management and update. We therefore created an evolutive and interactive database of flowering time genes, named FLOR-ID (Flowering-Interactive Database), which is freely accessible at http://www.flor-id.org. The hand-curated database contains information on 306 genes and links to 1595 publications gathering the work of >4500 authors. Gene/protein functions and interactions within the flowering pathways were inferred from the analysis of related publications, included in the database and translated into interactive manually drawn snapshots. PMID:26476447

  5. Curation and Computational Design of Bioenergy-Related Metabolic Pathways

    SciTech Connect

    Karp, Peter D.

    2014-09-12

    Pathway Tools is a systems-biology software package written by SRI International (SRI) that produces Pathway/Genome Databases (PGDBs) for organisms with a sequenced genome. Pathway Tools also provides a wide range of capabilities for analyzing predicted metabolic networks and user-generated omics data. More than 5,000 academic, industrial, and government groups have licensed Pathway Tools. This user community includes researchers at all three DOE bioenergy centers, as well as academic and industrial metabolic engineering (ME) groups. An integral part of the Pathway Tools software is MetaCyc, a large, multiorganism database of metabolic pathways and enzymes that SRI and its academic collaborators manually curate. This project included two main goals: I. Enhance the MetaCyc content of bioenergy-related enzymes and pathways. II. Develop computational tools for engineering metabolic pathways that satisfy specified design goals, in particular for bioenergy-related pathways. In part I, SRI proposed to significantly expand the coverage of bioenergy-related metabolic information in MetaCyc, followed by the generation of organism-specific PGDBs for all energy-relevant organisms sequenced at the DOE Joint Genome Institute (JGI). Part I objectives included: 1: Expand the content of MetaCyc to include bioenergy-related enzymes and pathways. 2: Enhance the Pathway Tools software to enable display of complex polymer degradation processes. 3: Create new PGDBs for the energy-related organisms sequenced by JGI, update existing PGDBs with new MetaCyc content, and make these data available to JBEI via the BioCyc website. In part II, SRI proposed to develop an efficient computational tool for the engineering of metabolic pathways. Part II objectives included: 4: Develop computational tools for generating metabolic pathways that satisfy specified design goals, enabling users to specify parameters such as starting and ending compounds, and preferred or disallowed intermediate compounds

  6. TimeLineCurator: Interactive Authoring of Visual Timelines from Unstructured Text.

    PubMed

    Fulda, Johanna; Brehmel, Matthew; Munzner, Tamara

    2016-01-01

    We present TimeLineCurator, a browser-based authoring tool that automatically extracts event data from temporal references in unstructured text documents using natural language processing and encodes them along a visual timeline. Our goal is to facilitate the timeline creation process for journalists and others who tell temporal stories online. Current solutions involve manually extracting and formatting event data from source documents, a process that tends to be tedious and error prone. With TimeLineCurator, a prospective timeline author can quickly identify the extent of time encompassed by a document, as well as the distribution of events occurring along this timeline. Authors can speculatively browse possible documents to quickly determine whether they are appropriate sources of timeline material. TimeLineCurator provides controls for curating and editing events on a timeline, the ability to combine timelines from multiple source documents, and export curated timelines for online deployment. We evaluate TimeLineCurator through a benchmark comparison of entity extraction error against a manual timeline curation process, a preliminary evaluation of the user experience of timeline authoring, a brief qualitative analysis of its visual output, and a discussion of prospective use cases suggested by members of the target author communities following its deployment. PMID:26529709

  7. Current Challenges in Development of a Database of Three-Dimensional Chemical Structures

    PubMed Central

    Maeda, Miki H.

    2015-01-01

    We are developing a database named 3DMET, a three-dimensional structure database of natural metabolites. There are two major impediments to the creation of 3D chemical structures from a set of planar structure drawings: the limited accuracy of computer programs and insufficient human resources for manual curation. We have tested some 2D–3D converters to convert 2D structure files from external databases. These automatic conversion processes yielded an excessive number of improper conversions. To ascertain the quality of the conversions, we compared IUPAC Chemical Identifier and canonical SMILES notations before and after conversion. Structures whose notations correspond to each other were regarded as a correct conversion in our present work. We found that chiral inversion is the most serious factor during the improper conversion. In the current stage of our database construction, published books or articles have been resources for additions to our database. Chemicals are usually drawn as pictures on the paper. To save human resources, an optical structure reader was introduced. The program was quite useful but some particular errors were observed during our operation. We hope our trials for producing correct 3D structures will help other developers of chemical programs and curators of chemical databases. PMID:26075200

  8. Current Challenges in Development of a Database of Three-Dimensional Chemical Structures.

    PubMed

    Maeda, Miki H

    2015-01-01

    We are developing a database named 3DMET, a three-dimensional structure database of natural metabolites. There are two major impediments to the creation of 3D chemical structures from a set of planar structure drawings: the limited accuracy of computer programs and insufficient human resources for manual curation. We have tested some 2D-3D converters to convert 2D structure files from external databases. These automatic conversion processes yielded an excessive number of improper conversions. To ascertain the quality of the conversions, we compared IUPAC Chemical Identifier and canonical SMILES notations before and after conversion. Structures whose notations correspond to each other were regarded as a correct conversion in our present work. We found that chiral inversion is the most serious factor during the improper conversion. In the current stage of our database construction, published books or articles have been resources for additions to our database. Chemicals are usually drawn as pictures on the paper. To save human resources, an optical structure reader was introduced. The program was quite useful but some particular errors were observed during our operation. We hope our trials for producing correct 3D structures will help other developers of chemical programs and curators of chemical databases. PMID:26075200

  9. Lynx: a database and knowledge extraction engine for integrative medicine.

    PubMed

    Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T Conrad; Maltsev, Natalia

    2014-01-01

    We have developed Lynx (http://lynx.ci.uchicago.edu)--a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces. PMID:24270788

  10. Comprehensive coverage of cardiovascular disease data in the disease portals at the Rat Genome Database.

    PubMed

    Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary

    2016-08-01

    Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. PMID:27287925

  11. Xenbase: expansion and updates of the Xenopus model organism database

    PubMed Central

    James-Zorn, Christina; Ponferrada, Virgilio G.; Jarabek, Chris J.; Burns, Kevin A.; Segerdell, Erik J.; Lee, Jacqueline; Snyder, Kevin; Bhattacharyya, Bishnu; Karpinka, J. Brad; Fortriede, Joshua; Bowes, Jeff B.; Zorn, Aaron M.; Vize, Peter D.

    2013-01-01

    Xenbase (http://www.xenbase.org) is a model organism database that provides genomic, molecular, cellular and developmental biology content to biomedical researchers working with the frog, Xenopus and Xenopus data to workers using other model organisms. As an amphibian Xenopus serves as a useful evolutionary bridge between invertebrates and more complex vertebrates such as birds and mammals. Xenbase content is collated from a variety of external sources using automated and semi-automated pipelines then processed via a combination of automated and manual annotation. A link-matching system allows for the wide variety of synonyms used to describe biological data on unique features, such as a gene or an anatomical entity, to be used by the database in an equivalent manner. Recent updates to the database include the Xenopus laevis genome, a new Xenopus tropicalis genome build, epigenomic data, collections of RNA and protein sequences associated with genes, more powerful gene expression searches, a community and curated wiki, an extensive set of manually annotated gene expression patterns and a new database module that contains data on over 700 antibodies that are useful for exploring Xenopus cell and developmental biology. PMID:23125366

  12. The ribosomal database project.

    PubMed Central

    Larsen, N; Olsen, G J; Maidak, B L; McCaughey, M J; Overbeek, R; Macke, T J; Marsh, T L; Woese, C R

    1993-01-01

    The Ribosomal Database Project (RDP) is a curated database that offers ribosome data along with related programs and services. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams and various software packages for handling, analyzing and displaying alignments and trees. The data are available via ftp and electronic mail. Certain analytic services are also provided by the electronic mail server. PMID:8332524

  13. Curation of the genome annotation of Pichia pastoris (Komagataella phaffii) CBS7435 from gene level to protein function.

    PubMed

    Valli, Minoska; Tatto, Nadine E; Peymann, Armin; Gruber, Clemens; Landes, Nils; Ekker, Heinz; Thallinger, Gerhard G; Mattanovich, Diethard; Gasser, Brigitte; Graf, Alexandra B

    2016-09-01

    As manually curated and non-automated BLAST analysis of the published Pichia pastoris genome sequences revealed many differences between the gene annotations of the strains GS115 and CBS7435, RNA-Seq analysis, supported by proteomics, was performed to improve the genome annotation. Detailed analysis of sequence alignment and protein domain predictions were made to extend the functional genome annotation to all P. pastoris sequences. This allowed the identification of 492 new ORFs, 4916 hypothetical UTRs and the correction of 341 incorrect ORF predictions, which were mainly due to the presence of upstream ATG or erroneous intron predictions. Moreover, 175 previously erroneously annotated ORFs need to be removed from the annotation. In total, we have annotated 5325 ORFs. Regarding the functionality of those genes, we improved all gene and protein descriptions. Thereby, the percentage of ORFs with functional annotation was increased from 48% to 73%. Furthermore, we defined functional groups, covering 25 biological cellular processes of interest, by grouping all genes that are part of the defined process. All data are presented in the newly launched genome browser and database available at www.pichiagenome.org In summary, we present a wide spectrum of curation of the P. pastoris genome annotation from gene level to protein function. PMID:27388471

  14. HistoneDB 2.0: a histone database with variants--an integrated resource to explore histones and their variants.

    PubMed

    Draizen, Eli J; Shaytan, Alexey K; Mariño-Ramírez, Leonardo; Talbert, Paul B; Landsman, David; Panchenko, Anna R

    2016-01-01

    Compaction of DNA into chromatin is a characteristic feature of eukaryotic organisms. The core (H2A, H2B, H3, H4) and linker (H1) histone proteins are responsible for this compaction through the formation of nucleosomes and higher order chromatin aggregates. Moreover, histones are intricately involved in chromatin functioning and provide a means for genome dynamic regulation through specific histone variants and histone post-translational modifications. 'HistoneDB 2.0--with variants' is a comprehensive database of histone protein sequences, classified by histone types and variants. All entries in the database are supplemented by rich sequence and structural annotations with many interactive tools to explore and compare sequences of different variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant. HistoneDB 2.0 is a resource for the interactive comparative analysis of histone protein sequences and their implications for chromatin function. Database URL: http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0. PMID:26989147

  15. Data mining in the MetaCyc family of pathway databases.

    PubMed

    Karp, Peter D; Paley, Suzanne; Altman, Tomer

    2013-01-01

    Pathway databases collect the bioreactions and molecular interactions that define the processes of life. The MetaCyc family of pathway databases consists of thousands of databases that were derived through computational inference of metabolic pathways from the MetaCyc pathway/genome database (PGDB). In some cases, these DBs underwent subsequent manual curation. Curated pathway DBs are now available for most of the major model organisms. Databases in the MetaCyc family are managed using the Pathway Tools software. This chapter presents methods for performing data mining on the MetaCyc family of pathway DBs. We discuss the major data access mechanisms for the family, which include data files in multiple formats; application programming interfaces (APIs) for the Lisp, Java, and Perl languages; and web services. We present an overview of the Pathway Tools schema, an understanding of which is needed to query the DBs. The chapter also presents several interactive data mining tools within Pathway Tools for performing omics data analysis. PMID:23192547

  16. Data Mining in the MetaCyc Family of Pathway Databases

    PubMed Central

    Karp, Peter D.; Paley, Suzanne; Altman, Tomer

    2013-01-01

    Pathway databases collect the bioreactions and molecular interactions that define the processes of life. The MetaCyc family of pathway databases consists of thousands of databases that were derived through computational inference of metabolic pathways from the MetaCyc Pathway/Genome Database (PGDB). In some cases these DBs underwent subsequent manual curation. Curated pathway DBs are now available for most of the major model organisms. Databases in the MetaCyc family are managed using the Pathway Tools software. This chapter presents methods for performing data mining on the MetaCyc family of pathway DBs. We discuss the major data access mechanisms for the family, which include data files in multiple formats; application programming interfaces (APIs) for the Lisp, Java, and Perl languages; and web services. We present an overview of the Pathway Tools schema, an understanding of which is needed to query the DBs. The chapter also presents several interactive data mining tools within Pathway Tools for performing omics data analysis. PMID:23192547

  17. The Rice Annotation Project Database (RAP-DB): 2008 update.

    PubMed

    Tanaka, Tsuyoshi; Antonio, Baltazar A; Kikuchi, Shoshi; Matsumoto, Takashi; Nagamura, Yoshiaki; Numa, Hisataka; Sakai, Hiroaki; Wu, Jianzhong; Itoh, Takeshi; Sasaki, Takuji; Aono, Ryo; Fujii, Yasuyuki; Habara, Takuya; Harada, Erimi; Kanno, Masako; Kawahara, Yoshihiro; Kawashima, Hiroaki; Kubooka, Hiromi; Matsuya, Akihiro; Nakaoka, Hajime; Saichi, Naomi; Sanbonmatsu, Ryoko; Sato, Yoshiharu; Shinso, Yuji; Suzuki, Mami; Takeda, Jun-ichi; Tanino, Motohiko; Todokoro, Fusano; Yamaguchi, Kaori; Yamamoto, Naoyuki; Yamasaki, Chisato; Imanishi, Tadashi; Okido, Toshihisa; Tada, Masahito; Ikeo, Kazuho; Tateno, Yoshio; Gojobori, Takashi; Lin, Yao-Cheng; Wei, Fu-Jin; Hsing, Yue-ie; Zhao, Qiang; Han, Bin; Kramer, Melissa R; McCombie, Richard W; Lonsdale, David; O'Donovan, Claire C; Whitfield, Eleanor J; Apweiler, Rolf; Koyanagi, Kanako O; Khurana, Jitendra P; Raghuvanshi, Saurabh; Singh, Nagendra K; Tyagi, Akhilesh K; Haberer, Georg; Fujisawa, Masaki; Hosokawa, Satomi; Ito, Yukiyo; Ikawa, Hiroshi; Shibata, Michie; Yamamoto, Mayu; Bruskiewich, Richard M; Hoen, Douglas R; Bureau, Thomas E; Namiki, Nobukazu; Ohyanagi, Hajime; Sakai, Yasumichi; Nobushima, Satoshi; Sakata, Katsumi; Barrero, Roberto A; Sato, Yutaka; Souvorov, Alexandre; Smith-White, Brian; Tatusova, Tatiana; An, Suyoung; An, Gynheung; OOta, Satoshi; Fuks, Galina; Fuks, Galina; Messing, Joachim; Christie, Karen R; Lieberherr, Damien; Kim, HyeRan; Zuccolo, Andrea; Wing, Rod A; Nobuta, Kan; Green, Pamela J; Lu, Cheng; Meyers, Blake C; Chaparro, Cristian; Piegu, Benoit; Panaud, Olivier; Echeverria, Manuel

    2008-01-01

    The Rice Annotation Project Database (RAP-DB) was created to provide the genome sequence assembly of the International Rice Genome Sequencing Project (IRGSP), manually curated annotation of the sequence, and other genomics information that could be useful for comprehensive understanding of the rice biology. Since the last publication of the RAP-DB, the IRGSP genome has been revised and reassembled. In addition, a large number of rice-expressed sequence tags have been released, and functional genomics resources have been produced worldwide. Thus, we have thoroughly updated our genome annotation by manual curation of all the functional descriptions of rice genes. The latest version of the RAP-DB contains a variety of annotation data as follows: clone positions, structures and functions of 31 439 genes validated by cDNAs, RNA genes detected by massively parallel signature sequencing (MPSS) technology and sequence similarity, flanking sequences of mutant lines, transposable elements, etc. Other annotation data such as Gnomon can be displayed along with those of RAP for comparison. We have also developed a new keyword search system to allow the user to access useful information. The RAP-DB is available at: http://rapdb.dna.affrc.go.jp/ and http://rapdb.lab.nig.ac.jp/. PMID:18089549

  18. Managing biological networks by using text mining and computer-aided curation

    NASA Astrophysics Data System (ADS)

    Yu, Seok Jong; Cho, Yongseong; Lee, Min-Ho; Lim, Jongtae; Yoo, Jaesoo

    2015-11-01

    In order to understand a biological mechanism in a cell, a researcher should collect a huge number of protein interactions with experimental data from experiments and the literature. Text mining systems that extract biological interactions from papers have been used to construct biological networks for a few decades. Even though the text mining of literature is necessary to construct a biological network, few systems with a text mining tool are available for biologists who want to construct their own biological networks. We have developed a biological network construction system called BioKnowledge Viewer that can generate a biological interaction network by using a text mining tool and biological taggers. It also Boolean simulation software to provide a biological modeling system to simulate the model that is made with the text mining tool. A user can download PubMed articles and construct a biological network by using the Multi-level Knowledge Emergence Model (KMEM), MetaMap, and A Biomedical Named Entity Recognizer (ABNER) as a text mining tool. To evaluate the system, we constructed an aging-related biological network that consist 9,415 nodes (genes) by using manual curation. With network analysis, we found that several genes, including JNK, AP-1, and BCL-2, were highly related in aging biological network. We provide a semi-automatic curation environment so that users can obtain a graph database for managing text mining results that are generated in the server system and can navigate the network with BioKnowledge Viewer, which is freely available at http://bioknowledgeviewer.kisti.re.kr.

  19. Teacher Training in Curative Education.

    ERIC Educational Resources Information Center

    Juul, Kristen D.; Maier, Manfred

    1992-01-01

    This article considers the application of the philosophical and educational principles of Rudolf Steiner, called "anthroposophy," to the training of teachers and curative educators in the Waldorf schools. Special emphasis is on the Camphill movement which focuses on therapeutic schools and communities for children with special needs. (DB)

  20. Cognitive Curations of Collaborative Curricula

    ERIC Educational Resources Information Center

    Ackerman, Amy S.

    2015-01-01

    Assuming the role of learning curators, 22 graduate students (in-service teachers) addressed authentic problems (challenges) within their respective classrooms by selecting digital tools as part of implementation of interdisciplinary lesson plans. Students focused on formative assessment tools as a means to gather evidence to make improvements in…

  1. How should the completeness and quality of curated nanomaterial data be evaluated?

    PubMed

    Marchese Robinson, Richard L; Lynch, Iseult; Peijnenburg, Willie; Rumble, John; Klaessig, Fred; Marquardt, Clarissa; Rauscher, Hubert; Puzyn, Tomasz; Purian, Ronit; Åberg, Christoffer; Karcher, Sandra; Vriens, Hanne; Hoet, Peter; Hoover, Mark D; Hendren, Christine Ogilvie; Harper, Stacey L

    2016-05-21

    Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated? PMID:27143028

  2. TOWARDS PATHWAY CURATION THROUGH LITERATURE MINING – A CASE STUDY USING PHARMGKB

    PubMed Central

    RAVIKUMAR, K.E.; WAGHOLIKAR, KAVISHWAR B.; LIU, HONGFANG

    2014-01-01

    The creation of biological pathway knowledge bases is largely driven by manual effort to curate based on evidences from the scientific literature. It is highly challenging for the curators to keep up with the literature. Text mining applications have been developed in the last decade to assist human curators to speed up the curation pace where majority of them aim to identify the most relevant papers for curation with little attempt to directly extract the pathway information from text. In this paper, we describe a rule-based literature mining system to extract pathway information from text. We evaluated the system using curated pharmacokinetic (PK) and pharmacodynamic (PD) pathways in PharmGKB. The system achieved an F-measure of 63.11% and 34.99% for entity extraction and event extraction respectively against all PubMed abstracts cited in PharmGKB. It may be possible to improve the system performance by incorporating using statistical machine learning approaches. This study also helped us gain insights into the barriers towards automated event extraction from text for pathway curation. PMID:24297561

  3. Towards pathway curation through literature mining--a case study using PharmGKB.

    PubMed

    Ravikumar, K E; Wagholikar, Kavishwar B; Liu, Hongfang

    2014-01-01

    The creation of biological pathway knowledge bases is largely driven by manual effort to curate based on evidences from the scientific literature. It is highly challenging for the curators to keep up with the literature. Text mining applications have been developed in the last decade to assist human curators to speed up the curation pace where majority of them aim to identify the most relevant papers for curation with little attempt to directly extract the pathway information from text. In this paper, we describe a rule-based literature mining system to extract pathway information from text. We evaluated the system using curated pharmacokinetic (PK) and pharmacodynamic (PD) pathways in PharmGKB. The system achieved an F-measure of 63.11% and 34.99% for entity extraction and event extraction respectively against all PubMed abstracts cited in PharmGKB. It may be possible to improve the system performance by incorporating using statistical machine learning approaches. This study also helped us gain insights into the barriers towards automated event extraction from text for pathway curation. PMID:24297561

  4. NPInter v3.0: an upgraded database of noncoding RNA-associated interactions.

    PubMed

    Hao, Yajing; Wu, Wei; Li, Hui; Yuan, Jiao; Luo, Jianjun; Zhao, Yi; Chen, Runsheng

    2016-01-01

    Despite the fact that a large quantity of noncoding RNAs (ncRNAs) have been identified, their functions remain unclear. To enable researchers to have a better understanding of ncRNAs' functions, we updated the NPInter database to version 3.0, which contains experimentally verified interactions between ncRNAs (excluding tRNAs and rRNAs), especially long noncoding RNAs (lncRNAs) and other biomolecules (proteins, mRNAs, miRNAs and genomic DNAs). In NPInter v3.0, interactions pertaining to ncRNAs are not only manually curated from scientific literature but also curated from high-throughput technologies. In addition, we also curated lncRNA-miRNA interactions fromin silicopredictions supported by AGO CLIP-seq data. When compared with NPInter v2.0, the interactions are more informative (with additional information on tissues or cell lines, binding sites, conservation, co-expression values and other features) and more organized (with divisions on data sets by data sources, tissues or cell lines, experiments and other criteria). NPInter v3.0 expands the data set to 491,416 interactions in 188 tissues (or cell lines) from 68 kinds of experimental technologies. NPInter v3.0 also improves the user interface and adds new web services, including a local UCSC Genome Browser to visualize binding sites. Additionally, NPInter v3.0 defined a high-confidence set of interactions and predicted the functions of lncRNAs in human and mouse based on the interactions curated in the database. NPInter v3.0 is available athttp://www.bioinfo.org/NPInter/Database URL:http://www.bioinfo.org/NPInter/. PMID:27087310

  5. NPInter v3.0: an upgraded database of noncoding RNA-associated interactions

    PubMed Central

    Hao, Yajing; Wu, Wei; Li, Hui; Yuan, Jiao; Luo, Jianjun; Zhao, Yi; Chen, Runsheng

    2016-01-01

    Despite the fact that a large quantity of noncoding RNAs (ncRNAs) have been identified, their functions remain unclear. To enable researchers to have a better understanding of ncRNAs’ functions, we updated the NPInter database to version 3.0, which contains experimentally verified interactions between ncRNAs (excluding tRNAs and rRNAs), especially long noncoding RNAs (lncRNAs) and other biomolecules (proteins, mRNAs, miRNAs and genomic DNAs). In NPInter v3.0, interactions pertaining to ncRNAs are not only manually curated from scientific literature but also curated from high-throughput technologies. In addition, we also curated lncRNA–miRNA interactions from in silico predictions supported by AGO CLIP-seq data. When compared with NPInter v2.0, the interactions are more informative (with additional information on tissues or cell lines, binding sites, conservation, co-expression values and other features) and more organized (with divisions on data sets by data sources, tissues or cell lines, experiments and other criteria). NPInter v3.0 expands the data set to 491,416 interactions in 188 tissues (or cell lines) from 68 kinds of experimental technologies. NPInter v3.0 also improves the user interface and adds new web services, including a local UCSC Genome Browser to visualize binding sites. Additionally, NPInter v3.0 defined a high-confidence set of interactions and predicted the functions of lncRNAs in human and mouse based on the interactions curated in the database. NPInter v3.0 is available at http://www.bioinfo.org/NPInter/. Database URL: http://www.bioinfo.org/NPInter/ PMID:27087310

  6. Kalium: a database of potassium channel toxins from scorpion venom.

    PubMed

    Kuzmenkov, Alexey I; Krylov, Nikolay A; Chugunov, Anton O; Grishin, Eugene V; Vassilevski, Alexander A

    2016-01-01

    Kalium (http://kaliumdb.org/) is a manually curated database that accumulates data on potassium channel toxins purified from scorpion venom (KTx). This database is an open-access resource, and provides easy access to pages of other databases of interest, such as UniProt, PDB, NCBI Taxonomy Browser, and PubMed. General achievements of Kalium are a strict and easy regulation of KTx classification based on the unified nomenclature supported by researchers in the field, removal of peptides with partial sequence and entries supported by transcriptomic information only, classification of β-family toxins, and addition of a novel λ-family. Molecules presented in the database can be processed by the Clustal Omega server using a one-click option. Molecular masses of mature peptides are calculated and available activity data are compiled for all KTx. We believe that Kalium is not only of high interest to professional toxinologists, but also of general utility to the scientific community.Database URL:http://kaliumdb.org/. PMID:27087309

  7. Kalium: a database of potassium channel toxins from scorpion venom

    PubMed Central

    Kuzmenkov, Alexey I.; Krylov, Nikolay A.; Chugunov, Anton O.; Grishin, Eugene V.; Vassilevski, Alexander A.

    2016-01-01

    Kalium (http://kaliumdb.org/) is a manually curated database that accumulates data on potassium channel toxins purified from scorpion venom (KTx). This database is an open-access resource, and provides easy access to pages of other databases of interest, such as UniProt, PDB, NCBI Taxonomy Browser, and PubMed. General achievements of Kalium are a strict and easy regulation of KTx classification based on the unified nomenclature supported by researchers in the field, removal of peptides with partial sequence and entries supported by transcriptomic information only, classification of β-family toxins, and addition of a novel λ-family. Molecules presented in the database can be processed by the Clustal Omega server using a one-click option. Molecular masses of mature peptides are calculated and available activity data are compiled for all KTx. We believe that Kalium is not only of high interest to professional toxinologists, but also of general utility to the scientific community. Database URL: http://kaliumdb.org/ PMID:27087309

  8. Clean and Cold Sample Curation

    NASA Technical Reports Server (NTRS)

    Allen, C. C.; Agee, C. B.; Beer, R.; Cooper, B. L.

    2000-01-01

    Curation of Mars samples includes both samples that are returned to Earth, and samples that are collected, examined, and archived on Mars. Both kinds of curation operations will require careful planning to ensure that the samples are not contaminated by the instruments that are used to collect and contain them. In both cases, sample examination and subdivision must take place in an environment that is organically, inorganically, and biologically clean. Some samples will need to be prepared for analysis under ultra-clean or cryogenic conditions. Inorganic and biological cleanliness are achievable separately by cleanroom and biosafety lab techniques. Organic cleanliness to the <50 ng/sq cm level requires material control and sorbent removal - techniques being applied in our Class 10 cleanrooms and sample processing gloveboxes.

  9. CTdatabase: a knowledge-base of high-throughput and curated data on cancer-testis antigens.

    PubMed

    Almeida, Luiz Gonzaga; Sakabe, Noboru J; deOliveira, Alice R; Silva, Maria Cristina C; Mundstein, Alex S; Cohen, Tzeela; Chen, Yao-Tseng; Chua, Ramon; Gurung, Sita; Gnjatic, Sacha; Jungbluth, Achim A; Caballero, Otávia L; Bairoch, Amos; Kiesler, Eva; White, Sarah L; Simpson, Andrew J G; Old, Lloyd J; Camargo, Anamaria A; Vasconcelos, Ana Tereza R

    2009-01-01

    The potency of the immune response has still to be harnessed effectively to combat human cancers. However, the discovery of T-cell targets in melanomas and other tumors has raised the possibility that cancer vaccines can be used to induce a therapeutically effective immune response against cancer. The targets, cancer-testis (CT) antigens, are immunogenic proteins preferentially expressed in normal gametogenic tissues and different histological types of tumors. Therapeutic cancer vaccines directed against CT antigens are currently in late-stage clinical trials testing whether they can delay or prevent recurrence of lung cancer and melanoma following surgical removal of primary tumors. CT antigens constitute a large, but ill-defined, family of proteins that exhibit a remarkably restricted expression. Currently, there is a considerable amount of information about these proteins, but the data are scattered through the literature and in several bioinformatic databases. The database presented here, CTdatabase (http://www.cta.lncc.br), unifies this knowledge to facilitate both the mining of the existing deluge of data, and the identification of proteins alleged to be CT antigens, but that do not have their characteristic restricted expression pattern. CTdatabase is more than a repository of CT antigen data, since all the available information was carefully curated and annotated with most data being specifically processed for CT antigens and stored locally. Starting from a compilation of known CT antigens, CTdatabase provides basic information including gene names and aliases, RefSeq accession numbers, genomic location, known splicing variants, gene duplications and additional family members. Gene expression at the mRNA level in normal and tumor tissues has been collated from publicly available data obtained by several different technologies. Manually curated data related to mRNA and protein expression, and antigen-specific immune responses in cancer patients are also

  10. CTdatabase: a knowledge-base of high-throughput and curated data on cancer-testis antigens

    PubMed Central

    Almeida, Luiz Gonzaga; Sakabe, Noboru J.; deOliveira, Alice R.; Silva, Maria Cristina C.; Mundstein, Alex S.; Cohen, Tzeela; Chen, Yao-Tseng; Chua, Ramon; Gurung, Sita; Gnjatic, Sacha; Jungbluth, Achim A.; Caballero, Otávia L.; Bairoch, Amos; Kiesler, Eva; White, Sarah L.; Simpson, Andrew J. G.; Old, Lloyd J.; Camargo, Anamaria A.; Vasconcelos, Ana Tereza R.

    2009-01-01

    The potency of the immune response has still to be harnessed effectively to combat human cancers. However, the discovery of T-cell targets in melanomas and other tumors has raised the possibility that cancer vaccines can be used to induce a therapeutically effective immune response against cancer. The targets, cancer-testis (CT) antigens, are immunogenic proteins preferentially expressed in normal gametogenic tissues and different histological types of tumors. Therapeutic cancer vaccines directed against CT antigens are currently in late-stage clinical trials testing whether they can delay or prevent recurrence of lung cancer and melanoma following surgical removal of primary tumors. CT antigens constitute a large, but ill-defined, family of proteins that exhibit a remarkably restricted expression. Currently, there is a considerable amount of information about these proteins, but the data are scattered through the literature and in several bioinformatic databases. The database presented here, CTdatabase (http://www.cta.lncc.br), unifies this knowledge to facilitate both the mining of the existing deluge of data, and the identification of proteins alleged to be CT antigens, but that do not have their characteristic restricted expression pattern. CTdatabase is more than a repository of CT antigen data, since all the available information was carefully curated and annotated with most data being specifically processed for CT antigens and stored locally. Starting from a compilation of known CT antigens, CTdatabase provides basic information including gene names and aliases, RefSeq accession numbers, genomic location, known splicing variants, gene duplications and additional family members. Gene expression at the mRNA level in normal and tumor tissues has been collated from publicly available data obtained by several different technologies. Manually curated data related to mRNA and protein expression, and antigen-specific immune responses in cancer patients are also

  11. ORegAnno 3.0: a community-driven resource for curated regulatory annotation.

    PubMed

    Lesurf, Robert; Cotto, Kelsy C; Wang, Grace; Griffith, Malachi; Kasaian, Katayoon; Jones, Steven J M; Montgomery, Stephen B; Griffith, Obi L

    2016-01-01

    The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/. PMID:26578589

  12. ORegAnno 3.0: a community-driven resource for curated regulatory annotation

    PubMed Central

    Lesurf, Robert; Cotto, Kelsy C.; Wang, Grace; Griffith, Malachi; Kasaian, Katayoon; Jones, Steven J. M.; Montgomery, Stephen B.; Griffith, Obi L.

    2016-01-01

    The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/. PMID:26578589

  13. WormBase 2014: new views of curated biology

    PubMed Central

    Harris, Todd W.; Baran, Joachim; Bieri, Tamberlyn; Cabunoc, Abigail; Chan, Juancarlos; Chen, Wen J.; Davis, Paul; Done, James; Grove, Christian; Howe, Kevin; Kishore, Ranjana; Lee, Raymond; Li, Yuling; Muller, Hans-Michael; Nakamura, Cecilia; Ozersky, Philip; Paulini, Michael; Raciti, Daniela; Schindelman, Gary; Tuli, Mary Ann; Auken, Kimberly Van; Wang, Daniel; Wang, Xiaodong; Williams, Gary; Wong, J. D.; Yook, Karen; Schedl, Tim; Hodgkin, Jonathan; Berriman, Matthew; Kersey, Paul; Spieth, John; Stein, Lincoln; Sternberg, Paul W.

    2014-01-01

    WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest. PMID:24194605

  14. NESdb: a database of NES-containing CRM1 cargoes.

    PubMed

    Xu, Darui; Grishin, Nick V; Chook, Yuh Min

    2012-09-01

    The leucine-rich nuclear export signal (NES) is the only known class of targeting signal that directs macromolecules out of the cell nucleus. NESs are short stretches of 8-15 amino acids with regularly spaced hydrophobic residues that bind the export karyopherin CRM1. NES-containing proteins are involved in numerous cellular and disease processes. We compiled a database named NESdb that contains 221 NES-containing CRM1 cargoes that were manually curated from the published literature. Each NESdb entry is annotated with information about sequence and structure of both the NES and the cargo protein, as well as information about experimental evidence of NES-mapping and CRM1-mediated nuclear export. NESdb will be updated regularly and will serve as an important resource for nuclear export signals. NESdb is freely available to nonprofit organizations at http://prodata.swmed.edu/LRNes. PMID:22833564

  15. NESdb: a database of NES-containing CRM1 cargoes

    PubMed Central

    Xu, Darui; Grishin, Nick V.; Chook, Yuh Min

    2012-01-01

    The leucine-rich nuclear export signal (NES) is the only known class of targeting signal that directs macromolecules out of the cell nucleus. NESs are short stretches of 8–15 amino acids with regularly spaced hydrophobic residues that bind the export karyopherin CRM1. NES-containing proteins are involved in numerous cellular and disease processes. We compiled a database named NESdb that contains 221 NES-containing CRM1 cargoes that were manually curated from the published literature. Each NESdb entry is annotated with information about sequence and structure of both the NES and the cargo protein, as well as information about experimental evidence of NES-mapping and CRM1-mediated nuclear export. NESdb will be updated regularly and will serve as an important resource for nuclear export signals. NESdb is freely available to nonprofit organizations at http://prodata.swmed.edu/LRNes. PMID:22833564

  16. Rfam 12.0: updates to the RNA families database.

    PubMed

    Nawrocki, Eric P; Burge, Sarah W; Bateman, Alex; Daub, Jennifer; Eberhardt, Ruth Y; Eddy, Sean R; Floden, Evan W; Gardner, Paul P; Jones, Thomas A; Tate, John; Finn, Robert D

    2015-01-01

    The Rfam database (available at http://rfam.xfam.org) is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families. PMID:25392425

  17. The MetaboLights repository: curation challenges in metabolomics.

    PubMed

    Salek, Reza M; Haug, Kenneth; Conesa, Pablo; Hastings, Janna; Williams, Mark; Mahendraker, Tejasvi; Maguire, Eamonn; González-Beltrán, Alejandra N; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Steinbeck, Christoph

    2013-01-01

    MetaboLights is the first general-purpose open-access curated repository for metabolomic studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Increases in the number of depositions, number of samples per study and the file size of data submitted to MetaboLights present a challenge for the objective of ensuring high-quality and standardized data in the context of diverse metabolomic workflows and data representations. Here, we describe the MetaboLights curation pipeline, its challenges and its practical application in quality control of complex data depositions. Database URL: http://www.ebi.ac.uk/metabolights. PMID:23630246

  18. The MetaboLights repository: curation challenges in metabolomics

    PubMed Central

    Salek, Reza M.; Haug, Kenneth; Conesa, Pablo; Hastings, Janna; Williams, Mark; Mahendraker, Tejasvi; Maguire, Eamonn; González-Beltrán, Alejandra N.; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Steinbeck, Christoph

    2013-01-01

    MetaboLights is the first general-purpose open-access curated repository for metabolomic studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Increases in the number of depositions, number of samples per study and the file size of data submitted to MetaboLights present a challenge for the objective of ensuring high-quality and standardized data in the context of diverse metabolomic workflows and data representations. Here, we describe the MetaboLights curation pipeline, its challenges and its practical application in quality control of complex data depositions. Database URL: http://www.ebi.ac.uk/metabolights PMID:23630246

  19. LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs.

    PubMed

    Ma, Lina; Li, Ang; Zou, Dong; Xu, Xingjian; Xia, Lin; Yu, Jun; Bajic, Vladimir B; Zhang, Zhang

    2015-01-01

    Long non-coding RNAs (lncRNAs) perform a diversity of functions in numerous important biological processes and are implicated in many human diseases. In this report we present lncRNAWiki (http://lncrna.big.ac.cn), a wiki-based platform that is open-content and publicly editable and aimed at community-based curation and collection of information on human lncRNAs. Current related databases are dependent primarily on curation by experts, making it laborious to annotate the exponentially accumulated information on lncRNAs, which inevitably requires collective efforts in community-based curation of lncRNAs. Unlike existing databases, lncRNAWiki features comprehensive integration of information on human lncRNAs obtained from multiple different resources and allows not only existing lncRNAs to be edited, updated and curated by different users but also the addition of newly identified lncRNAs by any user. It harnesses community collective knowledge in collecting, editing and annotating human lncRNAs and rewards community-curated efforts by providing explicit authorship based on quantified contributions. LncRNAWiki relies on the underling knowledge of scientific community for collective and collaborative curation of human lncRNAs and thus has the potential to serve as an up-to-date and comprehensive knowledgebase for human lncRNAs. PMID:25399417

  20. MET network in PubMed: a text-mined network visualization and curation system

    PubMed Central

    Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian

    2016-01-01

    Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET. Database URL: http://btm.tmu.edu.tw/metastasisway PMID:27242035

  1. MET network in PubMed: a text-mined network visualization and curation system.

    PubMed

    Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian

    2016-01-01

    Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway. PMID:27242035

  2. Methods and strategies for gene structure curation in WormBase

    PubMed Central

    Williams, G.W.; Davis, P.A.; Rogers, A.S.; Bieri, T.; Ozersky, P.; Spieth, J.

    2011-01-01

    The Caenorhabditis elegans genome sequence was published over a decade ago; this was the first published genome of a multi-cellular organism and now the WormBase project has had a decade of experience in curating this genome's sequence and gene structures. In one of its roles as a central repository for nematode biology, WormBase continues to refine the gene structure annotations using sequence similarity and other computational methods, as well as information from the literature- and community-submitted annotations. We describe the various methods of gene structure curation that have been tried by WormBase and the problems associated with each of them. We also describe the current strategy for gene structure curation, and introduce the WormBase ‘curation tool’, which integrates different data sources in order to identify new and correct gene structures. Database URL: http://www.wormbase.org/ PMID:21543339

  3. MeioBase: a comprehensive database for meiosis.

    PubMed

    Li, Hao; Meng, Fanrui; Guo, Chunce; Wang, Yingxiang; Xie, Xiaojing; Zhu, Tiansheng; Zhou, Shuigeng; Ma, Hong; Shan, Hongyan; Kong, Hongzhi

    2014-01-01

    Meiosis is a special type of cell division process necessary for the sexual reproduction of all eukaryotes. The ever expanding meiosis research calls for an effective and specialized database that is not readily available yet. To fill this gap, we have developed a knowledge database MeioBase (http://meiosis.ibcas.ac.cn), which is comprised of two core parts, Resources and Tools. In the Resources part, a wealth of meiosis data collected by curation and manual review from published literatures and biological databases are integrated and organized into various sections, such as Cytology, Pathway, Species, Interaction, and Expression. In the Tools part, some useful tools have been integrated into MeioBase, such as Search, Download, Blast, Comparison, My Favorites, Submission, and Advice. With a simplified and efficient web interface, users are able to search against the database with gene model IDs or keywords, and batch download the data for local investigation. We believe that MeioBase can greatly facilitate the researches related to meiosis. PMID:25566299

  4. DBatVir: the database of bat-associated viruses.

    PubMed

    Chen, Lihong; Liu, Bo; Yang, Jian; Jin, Qi

    2014-01-01

    Emerging infectious diseases remain a significant threat to public health. Most emerging infectious disease agents in humans are of zoonotic origin. Bats are important reservoir hosts of many highly lethal zoonotic viruses and have been implicated in numerous emerging infectious disease events in recent years. It is essential to enhance our knowledge and understanding of the genetic diversity of the bat-associated viruses to prevent future outbreaks. To facilitate further research, we constructed the database of bat-associated viruses (DBatVir). Known viral sequences detected in bat samples were manually collected and curated, along with the related metadata, such as the sampling time, location, bat species and specimen type. Additional information concerning the bats, including common names, diet type, geographic distribution and phylogeny were integrated into the database to bridge the gap between virologists and zoologists. The database currently covers >4100 bat-associated animal viruses of 23 viral families detected from 196 bat species in 69 countries worldwide. It provides an overview and snapshot of the current research regarding bat-associated viruses, which is essential now that the field is rapidly expanding. With a user-friendly interface and integrated online bioinformatics tools, DBatVir provides a convenient and powerful platform for virologists and zoologists to analyze the virome diversity of bats, as well as for epidemiologists and public health researchers to monitor and track current and future bat-related infectious diseases. Database URL: http://www.mgc.ac.cn/DBatVir/. PMID:24647629

  5. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs.

    PubMed

    Quek, Xiu Cheng; Thomson, Daniel W; Maag, Jesper L V; Bartonicek, Nenad; Signal, Bethany; Clark, Michael B; Gloss, Brian S; Dinger, Marcel E

    2015-01-01

    Despite the prevalence of long noncoding RNA (lncRNA) genes in eukaryotic genomes, only a small proportion have been examined for biological function. lncRNAdb, available at http://lncrnadb.org, provides users with a comprehensive, manually curated reference database of 287 eukaryotic lncRNAs that have been described independently in the scientific literature. In addition to capturing a great proportion of the recent literature describing functions for individual lncRNAs, lncRNAdb now offers an improved user interface enabling greater accessibility to sequence information, expression data and the literature. The new features in lncRNAdb include the integration of Illumina Body Atlas expression profiles, nucleotide sequence information, a BLAST search tool and easy export of content via direct download or a REST API. lncRNAdb is now endorsed by RNAcentral and is in compliance with the International Nucleotide Sequence Database Collaboration. PMID:25332394

  6. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs

    PubMed Central

    Quek, Xiu Cheng; Thomson, Daniel W.; Maag, Jesper L.V.; Bartonicek, Nenad; Signal, Bethany; Clark, Michael B.; Gloss, Brian S.; Dinger, Marcel E.

    2015-01-01

    Despite the prevalence of long noncoding RNA (lncRNA) genes in eukaryotic genomes, only a small proportion have been examined for biological function. lncRNAdb, available at http://lncrnadb.org, provides users with a comprehensive, manually curated reference database of 287 eukaryotic lncRNAs that have been described independently in the scientific literature. In addition to capturing a great proportion of the recent literature describing functions for individual lncRNAs, lncRNAdb now offers an improved user interface enabling greater accessibility to sequence information, expression data and the literature. The new features in lncRNAdb include the integration of Illumina Body Atlas expression profiles, nucleotide sequence information, a BLAST search tool and easy export of content via direct download or a REST API. lncRNAdb is now endorsed by RNAcentral and is in compliance with the International Nucleotide Sequence Database Collaboration. PMID:25332394

  7. Terminology Manual.

    ERIC Educational Resources Information Center

    Felber, Helmut

    A product of the International Information Center for Terminology (Infoterm), this manual is designed to serve as a reference tool for practitioners active in terminology work and documentation. The manual explores the basic ideas of the Vienna School of Terminology and explains developments in the area of applied computer aided terminography…

  8. Resource Manual

    ERIC Educational Resources Information Center

    Human Development Institute, 2008

    2008-01-01

    This manual was designed primarily for use by individuals with developmental disabilities and related conditions. The main focus of this manual is to provide easy-to-read information concerning available resources, and to provide immediate contact information for the purpose of applying for resources and/or locating additional information. The…

  9. Management Manual.

    ERIC Educational Resources Information Center

    San Joaquin Delta Community Coll. District, CA.

    This manual articulates the rights, responsibilities, entitlements, and conditions of employment of management personnel at San Joaquin Delta College (SJDC). The manual first presents SJDC's mission statement and then discusses the college's management goals and priorities. An examination of SJDC's administrative organization and a list of…

  10. VIEWCACHE: An incremental pointer-base access method for distributed databases. Part 1: The universal index system design document. Part 2: The universal index system low-level design document. Part 3: User's guide. Part 4: Reference manual. Part 5: UIMS test suite

    NASA Technical Reports Server (NTRS)

    Kelley, Steve; Roussopoulos, Nick; Sellis, Timos

    1992-01-01

    The goal of the Universal Index System (UIS), is to provide an easy-to-use and reliable interface to many different kinds of database systems. The impetus for this system was to simplify database index management for users, thus encouraging the use of indexes. As the idea grew into an actual system design, the concept of increasing database performance by facilitating the use of time-saving techniques at the user level became a theme for the project. This Final Report describes the Design, the Implementation of UIS, and its Language Interfaces. It also includes the User's Guide and the Reference Manual.

  11. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  12. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database

    PubMed Central

    Winsor, Geoffrey L.; Griffiths, Emma J.; Lo, Raymond; Dhillon, Bhavjinder K.; Shay, Julie A.; Brinkman, Fiona S. L.

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  13. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    PubMed

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics. PMID:23299411

  14. Data Curation Is for Everyone! The Case for Master's and Baccalaureate Institutional Engagement with Data Curation

    ERIC Educational Resources Information Center

    Shorish, Yasmeen

    2012-01-01

    This article describes the fundamental challenges to data curation, how these challenges may be compounded for smaller institutions, and how data management is an essential and manageable component of data curation. Data curation is often discussed within the confines of large research universities. As a result, master's and baccalaureate…

  15. The Ribosomal Database Project.

    PubMed Central

    Maidak, B L; Larsen, N; McCaughey, M J; Overbeek, R; Olsen, G J; Fogel, K; Blandy, J; Woese, C R

    1994-01-01

    The Ribosomal Database Project (RDP) is a curated database that offers ribosome-related data, analysis services, and associated computer programs. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (rdp.life.uiuc.edu), electronic mail (server/rdp.life.uiuc.edu) and gopher (rdpgopher.life.uiuc.edu). The electronic mail server also provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for chimeric nature of newly sequenced rRNAs, and automated alignment. PMID:7524021

  16. Manual de Carpinteria (Carpentry Manual).

    ERIC Educational Resources Information Center

    TomSing, Luisa B.

    This manual is part of a Mexican series of instructional materials designed for Spanish speaking adults who are in the process of becoming literate or have recently become literate in their native language. The manual describes a carpentry course that is structured to appeal to the student as a self-directing adult. The following units are…

  17. Maize Genetics and Genomics Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The 2007 report for MaizeGDB lists the new hires who will focus on curation/outreach and the genome sequence, respectively. Currently all sequence in the database comes from a PlantGDB pipeline and is presented with deep links to external resources such as PlantGDB, Dana Farber, GenBank, the Arizona...

  18. Pathway Interaction Database (PID) —

    Cancer.gov

    The National Cancer Institute (NCI) in collaboration with Nature Publishing Group has established the Pathway Interaction Database (PID) in order to provide a highly structured, curated collection of information about known biomolecular interactions and key cellular processes assembled into signaling pathways.

  19. Biological Databases for Behavioral Neurobiology

    PubMed Central

    Baker, Erich J.

    2014-01-01

    Databases are, at their core, abstractions of data and their intentionally derived relationships. They serve as a central organizing metaphor and repository, supporting or augmenting nearly all bioinformatics. Behavioral domains provide a unique stage for contemporary databases, as research in this area spans diverse data types, locations, and data relationships. This chapter provides foundational information on the diversity and prevalence of databases, how data structures support the various needs of behavioral neuroscience analysis and interpretation. The focus is on the classes of databases, data curation, and advanced applications in bioinformatics using examples largely drawn from research efforts in behavioral neuroscience. PMID:23195119

  20. ZFIN, the Zebrafish Model Organism Database: increased support for mutants and transgenics.

    PubMed

    Howe, Douglas G; Bradford, Yvonne M; Conlin, Tom; Eagle, Anne E; Fashena, David; Frazer, Ken; Knight, Jonathan; Mani, Prita; Martin, Ryan; Moxon, Sierra A Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruef, Barbara J; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Sprunger, Brock; Van Slyke, Ceri E; Westerfield, Monte

    2013-01-01

    ZFIN, the Zebrafish Model Organism Database (http://zfin.org), is the central resource for zebrafish genetic, genomic, phenotypic and developmental data. ZFIN curators manually curate and integrate comprehensive data involving zebrafish genes, mutants, transgenics, phenotypes, genotypes, gene expressions, morpholinos, antibodies, anatomical structures and publications. Integrated views of these data, as well as data gathered through collaborations and data exchanges, are provided through a wide selection of web-based search forms. Among the vertebrate model organisms, zebrafish are uniquely well suited for rapid and targeted generation of mutant lines. The recent rapid production of mutants and transgenic zebrafish is making management of data associated with these resources particularly important to the research community. Here, we describe recent enhancements to ZFIN aimed at improving our support for mutant and transgenic lines, including (i) enhanced mutant/transgenic search functionality; (ii) more expressive phenotype curation methods; (iii) new downloads files and archival data access; (iv) incorporation of new data loads from laboratories undertaking large-scale generation of mutant or transgenic lines and (v) new GBrowse tracks for transgenic insertions, genes with antibodies and morpholinos. PMID:23074187

  1. Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR.

    PubMed

    Van Auken, Kimberly; Fey, Petra; Berardini, Tanya Z; Dodson, Robert; Cooper, Laurel; Li, Donghui; Chan, Juancarlos; Li, Yuling; Basu, Siddhartha; Muller, Hans-Michael; Chisholm, Rex; Huala, Eva; Sternberg, Paul W

    2012-01-01

    WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology. PMID:23160413

  2. Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR

    PubMed Central

    Van Auken, Kimberly; Fey, Petra; Berardini, Tanya Z.; Dodson, Robert; Cooper, Laurel; Li, Donghui; Chan, Juancarlos; Li, Yuling; Basu, Siddhartha; Muller, Hans-Michael; Chisholm, Rex; Huala, Eva; Sternberg, Paul W.

    2012-01-01

    WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology. PMID:23160413

  3. Hayabusa Sample Curation in the JAXA's Planetary Material Curation Facility

    NASA Astrophysics Data System (ADS)

    Okada, T.; Abe, M.; Fujimura, A.; Yada, T.; Ishibashi, Y.; Uesugi, M.; Yuzuru, K.; Yakame, S.; Nakamura, T.; Noguchi, T.; Okazaki, R.; Zolensky, M.; Sandford, S.; Ueno, M.; Mukai, T.; Yoshikawa, M.; Kawaguchi, J.

    2011-12-01

    Hayabusa has successfully returned its reentry capsule in Australia on June 13th, 2010. As detailed previously [1], a series of processes have been held in the JAXA's Planetary Material Curation Facility to introduce the sample container of reentry capsule into the pure nitrogen filled clean chamber without influence by water or oxygen, retrieve fine particles found inside the container, characterize them with scanning electron microscope (SEM) with energy dispersive X-ray spectroscopy (EDX), classify them into mineral or rock types, and store them for future analysis. Some of those particles are delivered for initial analysis to catalogue them [2-10]. The facility is demanded to develop new methodologies or train techniques to pick up the recovered samples much finer than originally expected One of them is the electrostatic micro-probe for pickups, and .a trial started to slice the fine samples for detailed analysis of extra-fine structures. Electrostatic nano-probe to be used in SEM is also considered and developed.. To maximize the scientific outputs, the analyses must go on .based on more advanced methodology or sophisticated ideas. So far we have identified those samples as materials from S-class asteroid 25143 Itokawa due to their consistency with results by remote near-infrared and X-rsy spectroscopy: about 1500 ultra-fine particles (mostly smaller than 10 microns) caught by Teflon spatula scooping, and about 100 fine particles (mostly 20-200 microns) collected by compulsory fall onto silica glass plates. Future schedule for sample distribution must be planned. The initial analyses are still in progress, and we will distribute some more of particles recovered. Then some part of the particles will be distributed to NASA, based on the Memorandum of Understanding (MOU) between Japan and U.S.A. for the Hayabusa mission. Finally, in the near future an international Announcement of Opportunity (AO) for sample analyses will be open to any interested researchers In

  4. T3DB: an integrated database for bacterial type III secretion system

    PubMed Central

    2012-01-01

    Background Type III Secretion System (T3SS), which plays important roles in pathogenesis or symbiosis, is widely expressed in a variety of gram negative bacteria. However, lack of unique nomenclature for T3SS genes has hindered T3SS related research. It is necessary to set up a knowledgebase integrating T3SS-related research data to facilitate the communication between different research groups interested in different bacteria. Description A T3SS-related Database (T3DB) was developed. T3DB serves as an integrated platform for sequence collection, function annotation, and ortholog classification for T3SS related apparatus, effector, chaperone and regulatory genes. The collection of T3SS-containing bacteria, T3SS-related genes, function annotation, and the ortholog information were all manually curated from literature. BPBAac, a highly efficient T3SS effector prediction tool, was also implemented. Conclusions T3DB is the first systematic platform integrating well-annotated T3SS-related gene and protein information to facilitate T3SS and bacterial pathogenecity related research. The newly constructed T3 ortholog clusters may faciliate effective communication between different research groups and will promote de novo discoveries. Besides, the manually-curated high-quality effector and chaperone data are useful for feature analysis and evolutionary studies of these important proteins. PMID:22545727

  5. MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database

    PubMed Central

    Davis, Allan Peter; Wiegers, Thomas C.; Rosenstein, Michael C.; Mattingly, Carolyn J.

    2012-01-01

    The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators manually curate a triad of chemical–gene, chemical–disease and gene–disease relationships from the scientific literature. The CTD curation paradigm uses controlled vocabularies for chemicals, genes and diseases. To curate disease information, CTD first had to identify a source of controlled terms. Two resources seemed to be good candidates: the Online Mendelian Inheritance in Man (OMIM) and the ‘Diseases’ branch of the National Library of Medicine's Medical Subject Headers (MeSH). To maximize the advantages of both, CTD biocurators undertook a novel initiative to map the flat list of OMIM disease terms into the hierarchical nature of the MeSH vocabulary. The result is CTD’s ‘merged disease vocabulary’ (MEDIC), a unique resource that integrates OMIM terms, synonyms and identifiers with MeSH terms, synonyms, definitions, identifiers and hierarchical relationships. MEDIC is both a deep and broad vocabulary, composed of 9700 unique diseases described by more than 67 000 terms (including synonyms). It is freely available to download in various formats from CTD. While neither a true ontology nor a perfect solution, this vocabulary has nonetheless proved to be extremely successful and practical for our biocurators in generating over 2.5 million disease-associated toxicogenomic relationships in CTD. Other external databases have also begun to adopt MEDIC for their disease vocabulary. Here, we describe the construction, implementation, maintenance and use of MEDIC to raise awareness of this resource and to offer it as a putative scaffold in the formal construction of an official disease ontology. Database URL: http://ctd.mdibl.org/voc.go?type=disease PMID:22434833

  6. How should the completeness and quality of curated nanomaterial data be evaluated?

    NASA Astrophysics Data System (ADS)

    Marchese Robinson, Richard L.; Lynch, Iseult; Peijnenburg, Willie; Rumble, John; Klaessig, Fred; Marquardt, Clarissa; Rauscher, Hubert; Puzyn, Tomasz; Purian, Ronit; Åberg, Christoffer; Karcher, Sandra; Vriens, Hanne; Hoet, Peter; Hoover, Mark D.; Hendren, Christine Ogilvie; Harper, Stacey L.

    2016-05-01

    Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated?Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict

  7. LeishCyc: a biochemical pathways database for Leishmania major

    PubMed Central

    Doyle, Maria A; MacRae, James I; De Souza, David P; Saunders, Eleanor C; McConville, Malcolm J; Likić, Vladimir A

    2009-01-01

    Background Leishmania spp. are sandfly transmitted protozoan parasites that cause a spectrum of diseases in more than 12 million people worldwide. Much research is now focusing on how these parasites adapt to the distinct nutrient environments they encounter in the digestive tract of the sandfly vector and the phagolysosome compartment of mammalian macrophages. While data mining and annotation of the genomes of three Leishmania species has provided an initial inventory of predicted metabolic components and associated pathways, resources for integrating this information into metabolic networks and incorporating data from transcript, protein, and metabolite profiling studies is currently lacking. The development of a reliable, expertly curated, and widely available model of Leishmania metabolic networks is required to facilitate systems analysis, as well as discovery and prioritization of new drug targets for this important human pathogen. Description The LeishCyc database was initially built from the genome sequence of Leishmania major (v5.2), based on the annotation published by the Wellcome Trust Sanger Institute. LeishCyc was manually curated to remove errors, correct automated predictions, and add information from the literature. The ongoing curation is based on public sources, literature searches, and our own experimental and bioinformatics studies. In a number of instances we have improved on the original genome annotation, and, in some ambiguous cases, collected relevant information from the literature in order to help clarify gene or protein annotation in the future. All genes in LeishCyc are linked to the corresponding entry in GeneDB (Wellcome Trust Sanger Institute). Conclusion The LeishCyc database describes Leishmania major genes, gene products, metabolites, their relationships and biochemical organization into metabolic pathways. LeishCyc provides a systematic approach to organizing the evolving information about Leishmania biochemical networks and is

  8. REFEREE: BIBLIOGRAPHIC DATABASE MANAGER, DOCUMENTATION

    EPA Science Inventory

    The publication is the user's manual for 3.xx releases of REFEREE, a general-purpose bibliographic database management program for IBM-compatible microcomputers. The REFEREE software also is available from NTIS. The manual has two main sections--Quick Tour and References Guide--a...

  9. Trust, but verify: On the importance of chemical structure curation in cheminformatics and QSAR modeling research

    PubMed Central

    Fourches, Denis; Muratov, Eugene; Tropsha, Alexander

    2010-01-01

    Molecular modelers and cheminformaticians typically analyze experimental data generated by other scientists. Consequently, when it comes to data accuracy, cheminformaticians are always at the mercy of data providers who may inadvertently publish (partially) erroneous data. Thus, dataset curation is crucial for any cheminformatics analysis such as similarity searching, clustering, QSAR modeling, virtual screening, etc., especially nowadays when the availability of chemical datasets in public domain has skyrocketed in recent years. Despite the obvious importance of this preliminary step in the computational analysis of any dataset, there appears to be no commonly accepted guidance or set of procedures for chemical data curation. The main objective of this paper is to emphasize the need for a standardized chemical data curation strategy that should be followed at the onset of any molecular modeling investigation. Herein, we discuss several simple but important steps for cleaning chemical records in a database including the removal of a fraction of the data that cannot be appropriately handled by conventional cheminformatics techniques. Such steps include the removal of inorganic and organometallic compounds, counterions, salts and mixtures; structure validation; ring aromatization; normalization of specific chemotypes; curation of tautomeric forms; and the deletion of duplicates. To emphasize the importance of data curation as a mandatory step in data analysis, we discuss several case studies where chemical curation of the original “raw” database enabled the successful modeling study (specifically, QSAR analysis) or resulted in a significant improvement of model's prediction accuracy. We also demonstrate that in some cases rigorously developed QSAR models could be even used to correct erroneous biological data associated with chemical compounds. We believe that good practices for curation of chemical records outlined in this paper will be of value to all

  10. IPD: the Immuno Polymorphism Database.

    PubMed

    Robinson, James; Marsh, Steven G E

    2007-01-01

    The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer cell immunoglobulin-like receptors (KIRs); IPD-MHC, a database of sequences of the major histocompatibility complex (MHC) of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC, share the same database structure. PMID:18449992