Science.gov

Sample records for manually curated database

  1. TRIP Database: a manually curated database of protein–protein interactions for mammalian TRP channels

    PubMed Central

    Shin, Young-Cheul; Shin, Soo-Yong; So, Insuk; Kwon, Dongseop; Jeon, Ju-Hong

    2011-01-01

    Transient receptor potential (TRP) channels are a superfamily of Ca2+-permeable cation channels that translate cellular stimuli into electrochemical signals. Aberrant activity of TRP channels has been implicated in a variety of human diseases, such as neurological disorders, cardiovascular disease and cancer. To facilitate the understanding of the molecular network by which TRP channels are associated with biological and disease processes, we have developed the TRIP (TRansient receptor potential channel-Interacting Protein) Database (http://www.trpchannel.org), a manually curated database that aims to offer comprehensive information on protein–protein interactions (PPIs) of mammalian TRP channels. The TRIP Database was created by systematically curating 277 peer-reviewed literature; the current version documents 490 PPI pairs, 28 TRP channels and 297 cellular proteins. The TRIP Database provides a detailed summary of PPI data that fit into four categories: screening, validation, characterization and functional consequence. Users can find in-depth information specified in the literature on relevant analytical methods and experimental resources, such as gene constructs and cell/tissue types. The TRIP Database has user-friendly web interfaces with helpful features, including a search engine, an interaction map and a function for cross-referencing useful external databases. Our TRIP Database will provide a valuable tool to assist in understanding the molecular regulatory network of TRP channels. PMID:20851834

  2. The curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database

    PubMed Central

    Davis, Allan Peter; Wiegers, Thomas C.; Murphy, Cynthia G.; Mattingly, Carolyn J.

    2011-01-01

    The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and convert free-text information into a structured format using official nomenclature, integrating third party controlled vocabularies for chemicals, genes, diseases and organisms, and a novel controlled vocabulary for molecular interactions. Manual curation produces a robust, richly annotated dataset of highly accurate and detailed information. Currently, CTD describes over 349 000 molecular interactions between 6800 chemicals, 20 900 genes (for 330 organisms) and 4300 diseases that have been manually curated from over 25 400 peer-reviewed articles. This manually curated data are further integrated with other third party data (e.g. Gene Ontology, KEGG and Reactome annotations) to generate a wealth of toxicogenomic relationships. Here, we describe our approach to manual curation that uses a powerful and efficient paradigm involving mnemonic codes. This strategy allows biocurators to quickly capture detailed information from articles by generating simple statements using codes to represent the relationships between data types. The paradigm is versatile, expandable, and able to accommodate new data challenges that arise. We have incorporated this strategy into a web-based curation tool to further increase efficiency and productivity, implement quality control in real-time and accommodate biocurators working remotely. Database URL: http://ctd.mdibl.org PMID:21933848

  3. Manual curation is not sufficient for annotation of genomic databases

    PubMed Central

    Baumgartner, William A.; Cohen, K. Bretonnel; Fox, Lynne M.; Acquaah-Mensah, George; Hunter, Lawrence

    2008-01-01

    Motivation Knowledge base construction has been an area of intense activity and great importance in the growth of computational biology. However, there is little or no history of work on the subject of evaluation of knowledge bases, either with respect to their contents or with respect to the processes by which they are constructed. This article proposes the application of a metric from software engineering known as the found/fixed graph to the problem of evaluating the processes by which genomic knowledge bases are built, as well as the completeness of their contents. Results Well-understood patterns of change in the found/fixed graph are found to occur in two large publicly available knowledge bases. These patterns suggest that the current manual curation processes will take far too long to complete the annotations of even just the most important model organisms, and that at their current rate of production, they will never be sufficient for completing the annotation of all currently available proteomes. Contact larry.hunter@uchsc.edu PMID:17646325

  4. A comprehensive manually curated protein–protein interaction database for the Death Domain superfamily

    PubMed Central

    Kwon, Dongseop; Yoon, Jong Hwan; Shin, Soo-Yong; Jang, Tae-Ho; Kim, Hong-Gee; So, Insuk; Jeon, Ju-Hong; Park, Hyun Ho

    2012-01-01

    The Death Domain (DD) superfamily, which is one of the largest classes of protein interaction modules, plays a pivotal role in apoptosis, inflammation, necrosis and immune cell signaling pathways. Because aberrant or inappropriate DD superfamily-mediated signaling events are associated with various human diseases, such as cancers, neurodegenerative diseases and immunological disorders, the studies in these fields are of great biological and clinical importance. To facilitate the understanding of the molecular mechanisms by which the DD superfamily is associated with biological and disease processes, we have developed the DD database (http://www.deathdomain.org), a manually curated database that aims to offer comprehensive information on protein–protein interactions (PPIs) of the DD superfamily. The DD database was created by manually curating 295 peer-reviewed studies that were published in the literature; the current version documents 175 PPI pairs among the 99 DD superfamily proteins. The DD database provides a detailed summary of the DD superfamily proteins and their PPI data. Users can find in-depth information that is specified in the literature on relevant analytical methods, experimental resources and domain structures. Our database provides a definitive and valuable tool that assists researchers in understanding the signaling network that is mediated by the DD superfamily. PMID:22135292

  5. LMPID: A manually curated database of linear motifs mediating protein–protein interactions

    PubMed Central

    Sarkar, Debasree; Jana, Tanmoy; Saha, Sudipto

    2015-01-01

    Linear motifs (LMs), used by a subset of all protein–protein interactions (PPIs), bind to globular receptors or domains and play an important role in signaling networks. LMPID (Linear Motif mediated Protein Interaction Database) is a manually curated database which provides comprehensive experimentally validated information about the LMs mediating PPIs from all organisms on a single platform. About 2200 entries have been compiled by detailed manual curation of PubMed abstracts, of which about 1000 LM entries were being annotated for the first time, as compared with the Eukaryotic LM resource. The users can submit their query through a user-friendly search page and browse the data in the alphabetical order of the bait gene names and according to the domains interacting with the LM. LMPID is freely accessible at http://bicresources.jcbose. ac.in/ssaha4/lmpid and contains 1750 unique LM instances found within 1181 baits interacting with 552 prey proteins. In summary, LMPID is an attempt to enrich the existing repertoire of resources available for studying the LMs implicated in PPIs and may help in understanding the patterns of LMs binding to a specific domain and develop prediction model to identify novel LMs specific to a domain and further able to predict inhibitors/modulators of PPI of interest. Database URL: http://bicresources.jcbose.ac.in/ssaha4/lmpid PMID:25776024

  6. Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees.

    PubMed

    Matsuya, Akihiro; Sakate, Ryuichi; Kawahara, Yoshihiro; Koyanagi, Kanako O; Sato, Yoshiharu; Fujii, Yasuyuki; Yamasaki, Chisato; Habara, Takuya; Nakaoka, Hajime; Todokoro, Fusano; Yamaguchi, Kaori; Endo, Toshinori; Oota, Satoshi; Makalowski, Wojciech; Ikeo, Kazuho; Suzuki, Yoshiyuki; Hanada, Kousuke; Hashimoto, Katsuyuki; Hirai, Momoki; Iwama, Hisakazu; Saitou, Naruya; Hiraki, Aiko T; Jin, Lihua; Kaneko, Yayoi; Kanno, Masako; Murakami, Katsuhiko; Noda, Akiko Ogura; Saichi, Naomi; Sanbonmatsu, Ryoko; Suzuki, Mami; Takeda, Jun-ichi; Tanaka, Masayuki; Gojobori, Takashi; Imanishi, Tadashi; Itoh, Takeshi

    2008-01-01

    Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Currently, with the rapid growth of transcriptome data of various species, more reliable orthology information is prerequisite for further studies. However, detection of orthologs could be erroneous if pairwise distance-based methods, such as reciprocal BLAST searches, are utilized. Thus, as a sub-database of H-InvDB, an integrated database of annotated human genes (http://h-invitational.jp/), we constructed a fully curated database of evolutionary features of human genes, called 'Evola'. In the process of the ortholog detection, computational analysis based on conserved genome synteny and transcript sequence similarity was followed by manual curation by researchers examining phylogenetic trees. In total, 18 968 human genes have orthologs among 11 vertebrates (chimpanzee, mouse, cow, chicken, zebrafish, etc.), either computationally detected or manually curated orthologs. Evola provides amino acid sequence alignments and phylogenetic trees of orthologs and homologs. In 'd(N)/d(S) view', natural selection on genes can be analyzed between human and other species. In 'Locus maps', all transcript variants and their exon/intron structures can be compared among orthologous gene loci. We expect the Evola to serve as a comprehensive and reliable database to be utilized in comparative analyses for obtaining new knowledge about human genes. Evola is available at http://www.h-invitational.jp/evola/. PMID:17982176

  7. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs

    PubMed Central

    Wang, Peng; Zhi, Hui; Zhang, Yunpeng; Liu, Yue; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Ning, Shangwei; Li, Xia

    2015-01-01

    In this study, we describe miRSponge, a manually curated database, which aims at providing an experimentally supported resource for microRNA (miRNA) sponges. Recent evidence suggests that miRNAs are themselves regulated by competing endogenous RNAs (ceRNAs) or ‘miRNA sponges’ that contain miRNA binding sites. These competitive molecules can sequester miRNAs to prevent them interacting with their natural targets to play critical roles in various biological and pathological processes. It has become increasingly important to develop a high quality database to record and store ceRNA data to support future studies. To this end, we have established the experimentally supported miRSponge database that contains data on 599 miRNA-sponge interactions and 463 ceRNA relationships from 11 species following manual curating from nearly 1200 published articles. Database classes include endogenously generated molecules including coding genes, pseudogenes, long non-coding RNAs and circular RNAs, along with exogenously introduced molecules including viral RNAs and artificial engineered sponges. Approximately 70% of the interactions were identified experimentally in disease states. miRSponge provides a user-friendly interface for convenient browsing, retrieval and downloading of dataset. A submission page is also included to allow researchers to submit newly validated miRNA sponge data. Database URL: http://www.bio-bigdata.net/miRSponge. PMID:26424084

  8. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs.

    PubMed

    Wang, Peng; Zhi, Hui; Zhang, Yunpeng; Liu, Yue; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Ning, Shangwei; Li, Xia

    2015-01-01

    In this study, we describe miRSponge, a manually curated database, which aims at providing an experimentally supported resource for microRNA (miRNA) sponges. Recent evidence suggests that miRNAs are themselves regulated by competing endogenous RNAs (ceRNAs) or 'miRNA sponges' that contain miRNA binding sites. These competitive molecules can sequester miRNAs to prevent them interacting with their natural targets to play critical roles in various biological and pathological processes. It has become increasingly important to develop a high quality database to record and store ceRNA data to support future studies. To this end, we have established the experimentally supported miRSponge database that contains data on 599 miRNA-sponge interactions and 463 ceRNA relationships from 11 species following manual curating from nearly 1200 published articles. Database classes include endogenously generated molecules including coding genes, pseudogenes, long non-coding RNAs and circular RNAs, along with exogenously introduced molecules including viral RNAs and artificial engineered sponges. Approximately 70% of the interactions were identified experimentally in disease states. miRSponge provides a user-friendly interface for convenient browsing, retrieval and downloading of dataset. A submission page is also included to allow researchers to submit newly validated miRNA sponge data. Database URL: http://www.bio-bigdata.net/miRSponge. PMID:26424084

  9. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers.

    PubMed

    Ning, Shangwei; Zhang, Jizhou; Wang, Peng; Zhi, Hui; Wang, Jianjian; Liu, Yue; Gao, Yue; Guo, Maoni; Yue, Ming; Wang, Lihua; Li, Xia

    2016-01-01

    Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) is a manually curated database of cancer-associated long non-coding RNAs (lncRNAs) with experimental support that aims to provide a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. LncRNAs represent a large category of functional RNA molecules that play a significant role in human cancers. A curated collection and summary of deregulated lncRNAs in cancer is essential to thoroughly understand the mechanisms and functions of lncRNAs. Here, we developed the Lnc2Cancer database, which contains 1057 manually curated associations between 531 lncRNAs and 86 human cancers. Each association includes lncRNA and cancer name, the lncRNA expression pattern, experimental techniques, a brief functional description, the original reference and additional annotation information. Lnc2Cancer provides a user-friendly interface to conveniently browse, retrieve and download data. Lnc2Cancer also offers a submission page for researchers to submit newly validated lncRNA-cancer associations. With the rapidly increasing interest in lncRNAs, Lnc2Cancer will significantly improve our understanding of lncRNA deregulation in cancer and has the potential to be a timely and valuable resource. PMID:26481356

  10. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers

    PubMed Central

    Ning, Shangwei; Zhang, Jizhou; Wang, Peng; Zhi, Hui; Wang, Jianjian; Liu, Yue; Gao, Yue; Guo, Maoni; Yue, Ming; Wang, Lihua; Li, Xia

    2016-01-01

    Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) is a manually curated database of cancer-associated long non-coding RNAs (lncRNAs) with experimental support that aims to provide a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. LncRNAs represent a large category of functional RNA molecules that play a significant role in human cancers. A curated collection and summary of deregulated lncRNAs in cancer is essential to thoroughly understand the mechanisms and functions of lncRNAs. Here, we developed the Lnc2Cancer database, which contains 1057 manually curated associations between 531 lncRNAs and 86 human cancers. Each association includes lncRNA and cancer name, the lncRNA expression pattern, experimental techniques, a brief functional description, the original reference and additional annotation information. Lnc2Cancer provides a user-friendly interface to conveniently browse, retrieve and download data. Lnc2Cancer also offers a submission page for researchers to submit newly validated lncRNA-cancer associations. With the rapidly increasing interest in lncRNAs, Lnc2Cancer will significantly improve our understanding of lncRNA deregulation in cancer and has the potential to be a timely and valuable resource. PMID:26481356

  11. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation

    PubMed Central

    Thangakani, A. Mary; Nagarajan, R.; Kumar, Sandeep; Sakthivel, R.; Velmurugan, D.; Gromiha, M. Michael

    2016-01-01

    Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are critical to research on protein misfolding diseases, such as Alzheimer’s and Parkinson’s, as well as biotechnological production of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from experimental studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 experimentally observed aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of additional information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Experimentally validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways. CPAD is freely available at http://www.iitm.ac.in/bioinfo/CPAD/. The potential applications of CPAD have also been discussed. PMID:27043825

  12. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation.

    PubMed

    Thangakani, A Mary; Nagarajan, R; Kumar, Sandeep; Sakthivel, R; Velmurugan, D; Gromiha, M Michael

    2016-01-01

    Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are critical to research on protein misfolding diseases, such as Alzheimer's and Parkinson's, as well as biotechnological production of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from experimental studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 experimentally observed aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of additional information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Experimentally validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways. CPAD is freely available at http://www.iitm.ac.in/bioinfo/CPAD/. The potential applications of CPAD have also been discussed. PMID:27043825

  13. PCOSDB: PolyCystic Ovary Syndrome Database for manually curated disease associated genes

    PubMed Central

    Jesintha Mary, Maniraja; Vetrivel, Umashankar; Munuswamy, Deecaraman; Melanathuru, Vijayalakshmi

    2016-01-01

    Polycystic ovary syndrome (PCOS) is a complex disorder affecting approximately 5–10 percent of all women of reproductive age. It is a multi-factorial endocrine disorder, which demonstrates menstrual disturbance, infertility, anovulation, hirsutism, hyper androgenism and others. It has been indicated that differential expression of genes, genetic level variations, and other molecular alterations interplay in PCOS and are the target sites for clinical applications. Therefore, integrating the PCOS-associated genes along with its alteration and underpinning the underlying mechanism might definitely provide valuable information to understand the disease mechanism. We manually curated the information from 234 published literatures, including gene, molecular alteration, details of association, significance of association, ethnicity, age, drug, and other annotated summaries. PCOSDB is an online resource that brings comprehensive information about the disease, and the implication of various genes and its mechanism. We present the curated information from peer reviewed literatures, and organized the information at various levels including differentially expressed genes in PCOS, genetic variations such as polymorphisms, mutations causing PCOS across various ethnicities. We have covered both significant and non-significant associations along with conflicting studies. PCOSDB v1.0 contains 208 gene reports, 427 molecular alterations, and 46 phenotypes associated with PCOS PMID:27212836

  14. EpiDBase: a manually curated database for small molecule modulators of epigenetic landscape

    PubMed Central

    Loharch, Saurabh; Bhutani, Isha; Jain, Kamal; Gupta, Pawan; Sahoo, Debendra K.; Parkesh, Raman

    2015-01-01

    We have developed EpiDBase (www.epidbase.org), an interactive database of small molecule ligands of epigenetic protein families by bringing together experimental, structural and chemoinformatic data in one place. Currently, EpiDBase encompasses 5784 unique ligands (11 422 entries) of various epigenetic markers such as writers, erasers and readers. The EpiDBase includes experimental IC50 values, ligand molecular weight, hydrogen bond donor and acceptor count, XlogP, number of rotatable bonds, number of aromatic rings, InChIKey, two-dimensional and three-dimensional (3D) chemical structures. A catalog of all epidbase ligands based on the molecular weight is also provided. A structure editor is provided for 3D visualization of ligands. EpiDBase is integrated with tools like text search, disease-specific search, advanced search, substructure, and similarity analysis. Advanced analysis can be performed using substructure and OpenBabel-based chemical similarity fingerprints. The EpiDBase is curated to identify unique molecular scaffolds. Initially, molecules were selected by removing peptides, macrocycles and other complex structures and then processed for conformational sampling by generating 3D conformers. Subsequent filtering through Zinc Is Not Commercial (ZINC: a free database of commercially available compounds for virtual screening) and Lilly MedChem regular rules retained many distinctive drug-like molecules. These molecules were then analyzed for physicochemical properties using OpenBabel descriptors and clustered using various methods such as hierarchical clustering, binning partition and multidimensional scaling. EpiDBase provides comprehensive resources for further design, development and refinement of small molecule modulators of epigenetic markers. Database URL: www.epidbase.org PMID:25776023

  15. EpiDBase: a manually curated database for small molecule modulators of epigenetic landscape.

    PubMed

    Loharch, Saurabh; Bhutani, Isha; Jain, Kamal; Gupta, Pawan; Sahoo, Debendra K; Parkesh, Raman

    2015-01-01

    We have developed EpiDBase (www.epidbase.org), an interactive database of small molecule ligands of epigenetic protein families by bringing together experimental, structural and chemoinformatic data in one place. Currently, EpiDBase encompasses 5784 unique ligands (11 422 entries) of various epigenetic markers such as writers, erasers and readers. The EpiDBase includes experimental IC(50) values, ligand molecular weight, hydrogen bond donor and acceptor count, XlogP, number of rotatable bonds, number of aromatic rings, InChIKey, two-dimensional and three-dimensional (3D) chemical structures. A catalog of all epidbase ligands based on the molecular weight is also provided. A structure editor is provided for 3D visualization of ligands. EpiDBase is integrated with tools like text search, disease-specific search, advanced search, substructure, and similarity analysis. Advanced analysis can be performed using substructure and OpenBabel-based chemical similarity fingerprints. The EpiDBase is curated to identify unique molecular scaffolds. Initially, molecules were selected by removing peptides, macrocycles and other complex structures and then processed for conformational sampling by generating 3D conformers. Subsequent filtering through Zinc Is Not Commercial (ZINC: a free database of commercially available compounds for virtual screening) and Lilly MedChem regular rules retained many distinctive drug-like molecules. These molecules were then analyzed for physicochemical properties using OpenBabel descriptors and clustered using various methods such as hierarchical clustering, binning partition and multidimensional scaling. EpiDBase provides comprehensive resources for further design, development and refinement of small molecule modulators of epigenetic markers. PMID:25776023

  16. ONRLDB—manually curated database of experimentally validated ligands for orphan nuclear receptors: insights into new drug discovery

    PubMed Central

    Nanduri, Ravikanth; Bhutani, Isha; Somavarapu, Arun Kumar; Mahajan, Sahil; Parkesh, Raman; Gupta, Pawan

    2015-01-01

    Orphan nuclear receptors are potential therapeutic targets. The Orphan Nuclear Receptor Ligand Binding Database (ONRLDB) is an interactive, comprehensive and manually curated database of small molecule ligands targeting orphan nuclear receptors. Currently, ONRLDB consists of ∼11 000 ligands, of which ∼6500 are unique. All entries include information for the ligand, such as EC50 and IC50, number of aromatic rings and rotatable bonds, XlogP, hydrogen donor and acceptor count, molecular weight (MW) and structure. ONRLDB is a cross-platform database, where either the cognate small molecule modulators of a receptor or the cognate receptors to a ligand can be searched. The database can be searched using three methods: text search, advanced search or similarity search. Substructure search, cataloguing tools, and clustering tools can be used to perform advanced analysis of the ligand based on chemical similarity fingerprints, hierarchical clustering, binning partition and multidimensional scaling. These tools, together with the Tree function provided, deliver an interactive platform and a comprehensive resource for identification of common and unique scaffolds. As demonstrated, ONRLDB is designed to allow selection of ligands based on various properties and for designing novel ligands or to improve the existing ones. Database URL: http://www.onrldb.org/ PMID:26637529

  17. 3CDB: a manually curated database of chromosome conformation capture data

    PubMed Central

    Yun, Xiaoxiao; Xia, Lili; Tang, Bixia; Zhang, Hui; Li, Feifei; Zhang, Zhihua

    2016-01-01

    Chromosome conformation capture (3C) is a biochemical technology to analyse contact frequencies between selected genomic sites in a cell population. Its recent genomic variants, e.g. Hi-C/ chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the study of nuclear organization at an unprecedented level. However, due to the inherent low resolution and ultrahigh cost of Hi-C/ChIA-PET, 3C is still the gold standard for determining interactions between given regulatory DNA elements, such as enhancers and promoters. Therefore, we developed a database of 3C determined functional chromatin interactions (3CDB; http://3cdb.big.ac.cn). To construct 3CDB, we searched PubMed and Google Scholar with carefully designed keyword combinations and retrieved more than 5000 articles from which we manually extracted 3319 interactions in 17 species. Moreover, we proposed a systematic evaluation scheme for data reliability and classified the interactions into four categories. Contact frequencies are not directly comparable as a result of various modified 3C protocols employed among laboratories. Our evaluation scheme provides a plausible solution to this long-standing problem in the field. A user-friendly web interface was designed to assist quick searches in 3CDB. We believe that 3CDB will provide fundamental information for experimental design and phylogenetic analysis, as well as bridge the gap between molecular and systems biologists who must now contend with noisy high-throughput data. Database URL: http://3cdb.big.ac.cn PMID:27081154

  18. 3CDB: a manually curated database of chromosome conformation capture data.

    PubMed

    Yun, Xiaoxiao; Xia, Lili; Tang, Bixia; Zhang, Hui; Li, Feifei; Zhang, Zhihua

    2016-01-01

    Chromosome conformation capture (3C) is a biochemical technology to analyse contact frequencies between selected genomic sites in a cell population. Its recent genomic variants, e.g. Hi-C/ chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the study of nuclear organization at an unprecedented level. However, due to the inherent low resolution and ultrahigh cost of Hi-C/ChIA-PET, 3C is still the gold standard for determining interactions between given regulatory DNA elements, such as enhancers and promoters. Therefore, we developed a database of 3C determined functional chromatin interactions (3CDB;http://3cdb.big.ac.cn). To construct 3CDB, we searched PubMed and Google Scholar with carefully designed keyword combinations and retrieved more than 5000 articles from which we manually extracted 3319 interactions in 17 species. Moreover, we proposed a systematic evaluation scheme for data reliability and classified the interactions into four categories. Contact frequencies are not directly comparable as a result of various modified 3C protocols employed among laboratories. Our evaluation scheme provides a plausible solution to this long-standing problem in the field. A user-friendly web interface was designed to assist quick searches in 3CDB. We believe that 3CDB will provide fundamental information for experimental design and phylogenetic analysis, as well as bridge the gap between molecular and systems biologists who must now contend with noisy high-throughput data.Database URL:http://3cdb.big.ac.cn. PMID:27081154

  19. A Tool for Biomarker Discovery in the Urinary Proteome: A Manually Curated Human and Animal Urine Protein Biomarker Database*

    PubMed Central

    Shao, Chen; Li, Menglin; Li, Xundou; Wei, Lilong; Zhu, Lisi; Yang, Fan; Jia, Lulu; Mu, Yi; Wang, Jiangning; Guo, Zhengguang; Zhang, Dan; Yin, Jianrui; Wang, Zhigang; Sun, Wei; Zhang, Zhengguo; Gao, Youhe

    2011-01-01

    Urine is an important source of biomarkers. A single proteomics assay can identify hundreds of differentially expressed proteins between disease and control samples; however, the ability to select biomarker candidates with the most promise for further validation study remains difficult. A bioinformatics tool that allows accurate and convenient comparison of all of the existing related studies can markedly aid the development of this area. In this study, we constructed the Urinary Protein Biomarker (UPB) database to collect existing studies of urinary protein biomarkers from published literature. To ensure the quality of data collection, all literature was manually curated. The website (http://122.70.220.102/biomarker) allows users to browse the database by disease categories and search by protein IDs in bulk. Researchers can easily determine whether a biomarker candidate has already been identified by another group for the same disease or for other diseases, which allows for the confidence and disease specificity of their biomarker candidate to be evaluated. Additionally, the pathophysiological processes of the diseases can be studied using our database with the hypothesis that diseases that share biomarkers may have the same pathophysiological processes. Because of the natural relationship between urinary proteins and the urinary system, this database may be especially suitable for studying the pathogenesis of urological diseases. Currently, the database contains 553 and 275 records compiled from 174 and 31 publications of human and animal studies, respectively. We found that biomarkers identified by different proteomic methods had a poor overlap with each other. The differences between sample preparation and separation methods, mass spectrometers, and data analysis algorithms may be influencing factors. Biomarkers identified from animal models also overlapped poorly with those from human samples, but the overlap rate was not lower than that of human proteomics

  20. Curation accuracy of model organism databases.

    PubMed

    Keseler, Ingrid M; Skrzypek, Marek; Weerasinghe, Deepika; Chen, Albert Y; Fulcher, Carol; Li, Gene-Wei; Lemmer, Kimberly C; Mladinich, Katherine M; Chow, Edmond D; Sherlock, Gavin; Karp, Peter D

    2014-01-01

    Manual extraction of information from the biomedical literature-or biocuration-is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org// PMID:24923819

  1. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot

    PubMed Central

    Lima, Tania; Auchincloss, Andrea H.; Coudert, Elisabeth; Keller, Guillaume; Michoud, Karine; Rivoire, Catherine; Bulliard, Virginie; de Castro, Edouard; Lachaize, Corinne; Baratin, Delphine; Phan, Isabelle; Bougueleret, Lydie; Bairoch, Amos

    2009-01-01

    The growth in the number of completely sequenced microbial genomes (bacterial and archaeal) has generated a need for a procedure that provides UniProtKB/Swiss-Prot-quality annotation to as many protein sequences as possible. We have devised a semi-automated system, HAMAP (High-quality Automated and Manual Annotation of microbial Proteomes), that uses manually built annotation templates for protein families to propagate annotation to all members of manually defined protein families, using very strict criteria. The HAMAP system is composed of two databases, the proteome database and the family database, and of an automatic annotation pipeline. The proteome database comprises biological and sequence information for each completely sequenced microbial proteome, and it offers several tools for CDS searches, BLAST options and retrieval of specific sets of proteins. The family database currently comprises more than 1500 manually curated protein families and their annotation templates that are used to annotate proteins that belong to one of the HAMAP families. On the HAMAP website, individual sequences as well as whole genomes can be scanned against all HAMAP families. The system provides warnings for the absence of conserved amino acid residues, unusual sequence length, etc. Thanks to the implementation of HAMAP, more than 200 000 microbial proteins have been fully annotated in UniProtKB/Swiss-Prot (HAMAP website: http://www.expasy.org/sprot/hamap). PMID:18849571

  2. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot.

    PubMed

    Lima, Tania; Auchincloss, Andrea H; Coudert, Elisabeth; Keller, Guillaume; Michoud, Karine; Rivoire, Catherine; Bulliard, Virginie; de Castro, Edouard; Lachaize, Corinne; Baratin, Delphine; Phan, Isabelle; Bougueleret, Lydie; Bairoch, Amos

    2009-01-01

    The growth in the number of completely sequenced microbial genomes (bacterial and archaeal) has generated a need for a procedure that provides UniProtKB/Swiss-Prot-quality annotation to as many protein sequences as possible. We have devised a semi-automated system, HAMAP (High-quality Automated and Manual Annotation of microbial Proteomes), that uses manually built annotation templates for protein families to propagate annotation to all members of manually defined protein families, using very strict criteria. The HAMAP system is composed of two databases, the proteome database and the family database, and of an automatic annotation pipeline. The proteome database comprises biological and sequence information for each completely sequenced microbial proteome, and it offers several tools for CDS searches, BLAST options and retrieval of specific sets of proteins. The family database currently comprises more than 1500 manually curated protein families and their annotation templates that are used to annotate proteins that belong to one of the HAMAP families. On the HAMAP website, individual sequences as well as whole genomes can be scanned against all HAMAP families. The system provides warnings for the absence of conserved amino acid residues, unusual sequence length, etc. Thanks to the implementation of HAMAP, more than 200,000 microbial proteins have been fully annotated in UniProtKB/Swiss-Prot (HAMAP website: http://www.expasy.org/sprot/hamap). PMID:18849571

  3. VaDE: a manually curated database of reproducible associations between various traits and human genomic polymorphisms.

    PubMed

    Nagai, Yoko; Takahashi, Yasuko; Imanishi, Tadashi

    2015-01-01

    Genome-wide association studies (GWASs) have identified numerous single nucleotide polymorphisms (SNPs) associated with the development of common diseases. However, it is clear that genetic risk factors of common diseases are heterogeneous among human populations. Therefore, we developed a database of genomic polymorphisms that are reproducibly associated with disease susceptibilities, drug responses and other traits for each human population: 'VarySysDB Disease Edition' (VaDE; http://bmi-tokai.jp/VaDE/). SNP-trait association data were obtained from the National Human Genome Research Institute GWAS (NHGRI GWAS) catalog and RAvariome, and we added detailed information of sample populations by curating original papers. In addition, we collected and curated original papers, and registered the detailed information of SNP-trait associations in VaDE. Then, we evaluated reproducibility of associations in each population by counting the number of significantly associated studies. VaDE provides literature-based SNP-trait association data and functional genomic region annotation for SNP functional research. SNP functional annotation data included experimental data of the ENCODE project, H-InvDB transcripts and the 1000 Genome Project. A user-friendly web interface was developed to assist quick search, easy download and fast swapping among viewers. We believe that our database will contribute to the future establishment of personalized medicine and increase our understanding of genetic factors underlying diseases. PMID:25361969

  4. BIAdb: A curated database of benzylisoquinoline alkaloids

    PubMed Central

    2010-01-01

    Background Benzylisoquinoline is the structural backbone of many alkaloids with a wide variety of structures including papaverine, noscapine, codeine, morphine, apomorphine, berberine, protopine and tubocurarine. Many benzylisoquinoline alkaloids have been reported to show therapeutic properties and to act as novel medicines. Thus it is important to collect and compile benzylisoquinoline alkaloids in order to explore their usage in medicine. Description We extract information about benzylisoquinoline alkaloids from various sources like PubChem, KEGG, KNApSAcK and manual curation from literature. This information was processed and compiled in order to create a comprehensive database of benzylisoquinoline alkaloids, called BIAdb. The current version of BIAdb contains information about 846 unique benzylisoquinoline alkaloids, with multiple entries in term of source, function leads to total number of 2504 records. One of the major features of this database is that it provides data about 627 different plant species as a source of benzylisoquinoline and 114 different types of function performed by these compounds. A large number of online tools have been integrated, which facilitate user in exploring full potential of BIAdb. In order to provide additional information, we give external links to other resources/databases. One of the important features of this database is that it is tightly integrated with Drugpedia, which allows managing data in fixed/flexible format. Conclusions A database of benzylisoquinoline compounds has been created, which provides comprehensive information about benzylisoquinoline alkaloids. This database will be very useful for those who are working in the field of drug discovery based on natural products. This database will also serve researchers working in the field of synthetic biology, as developing medicinally important alkaloids using synthetic process are one of important challenges. This database is available from http

  5. Activity, assay and target data curation and quality in the ChEMBL database.

    PubMed

    Papadatos, George; Gaulton, Anna; Hersey, Anne; Overington, John P

    2015-09-01

    The emergence of a number of publicly available bioactivity databases, such as ChEMBL, PubChem BioAssay and BindingDB, has raised awareness about the topics of data curation, quality and integrity. Here we provide an overview and discussion of the current and future approaches to activity, assay and target data curation of the ChEMBL database. This curation process involves several manual and automated steps and aims to: (1) maximise data accessibility and comparability; (2) improve data integrity and flag outliers, ambiguities and potential errors; and (3) add further curated annotations and mappings thus increasing the usefulness and accuracy of the ChEMBL data for all users and modellers in particular. Issues related to activity, assay and target data curation and integrity along with their potential impact for users of the data are discussed, alongside robust selection and filter strategies in order to avoid or minimise these, depending on the desired application. PMID:26201396

  6. DIDA: A curated and annotated digenic diseases database.

    PubMed

    Gazzo, Andrea M; Daneels, Dorien; Cilia, Elisa; Bonduelle, Maryse; Abramowicz, Marc; Van Dooren, Sonia; Smits, Guillaume; Lenaerts, Tom

    2016-01-01

    DIDA (DIgenic diseases DAtabase) is a novel database that provides for the first time detailed information on genes and associated genetic variants involved in digenic diseases, the simplest form of oligogenic inheritance. The database is accessible via http://dida.ibsquare.be and currently includes 213 digenic combinations involved in 44 different digenic diseases. These combinations are composed of 364 distinct variants, which are distributed over 136 distinct genes. The web interface provides browsing and search functionalities, as well as documentation and help pages, general database statistics and references to the original publications from which the data have been collected. The possibility to submit novel digenic data to DIDA is also provided. Creating this new repository was essential as current databases do not allow one to retrieve detailed records regarding digenic combinations. Genes, variants, diseases and digenic combinations in DIDA are annotated with manually curated information and information mined from other online resources. Next to providing a unique resource for the development of new analysis methods, DIDA gives clinical and molecular geneticists a tool to find the most comprehensive information on the digenic nature of their diseases of interest. PMID:26481352

  7. A Curated Database of Rodent Uterotrophic Bioactivity

    PubMed Central

    Kleinstreuer, Nicole C.; Ceger, Patricia C.; Allen, David G.; Strickland, Judy; Chang, Xiaoqing; Hamm, Jonathan T.; Casey, Warren M.

    2015-01-01

    Background: Novel in vitro methods are being developed to identify chemicals that may interfere with estrogen receptor (ER) signaling, but the results are difficult to put into biological context because of reliance on reference chemicals established using results from other in vitro assays and because of the lack of high-quality in vivo reference data. The Organisation for Economic Co-operation and Development (OECD)-validated rodent uterotrophic bioassay is considered the “gold standard” for identifying potential ER agonists. Objectives: We performed a comprehensive literature review to identify and evaluate data from uterotrophic studies and to analyze study variability. Methods: We reviewed 670 articles with results from 2,615 uterotrophic bioassays using 235 unique chemicals. Study descriptors, such as species/strain, route of administration, dosing regimen, lowest effect level, and test outcome, were captured in a database of uterotrophic results. Studies were assessed for adherence to six criteria that were based on uterotrophic regulatory test guidelines. Studies meeting all six criteria (458 bioassays on 118 unique chemicals) were considered guideline-like (GL) and were subsequently analyzed. Results: The immature rat model was used for 76% of the GL studies. Active outcomes were more prevalent across rat models (74% active) than across mouse models (36% active). Of the 70 chemicals with at least two GL studies, 18 (26%) had discordant outcomes and were classified as both active and inactive. Many discordant results were attributable to differences in study design (e.g., injection vs. oral dosing). Conclusions: This uterotrophic database provides a valuable resource for understanding in vivo outcome variability and for evaluating the performance of in vitro assays that measure estrogenic activity. Citation: Kleinstreuer NC, Ceger PC, Allen DG, Strickland J, Chang X, Hamm JT, Casey WM. 2016. A curated database of rodent uterotrophic bioactivity. Environ

  8. Manual classification strategies in the ECOD database

    PubMed Central

    Cheng, Hua; Liao, Yuxing; Schaeffer, R. Dustin; Grishin, Nick V.

    2015-01-01

    ECOD (Evolutionary Classification Of protein Domains) is a comprehensive and up-to-date protein structure classification database. The majority of new structures released from the PDB (Protein Data Bank) every week already have close homologs in the ECOD hierarchy and thus can be reliably partitioned into domains and classified by software without manual intervention. However, those proteins that lack confidently detectable homologs require careful analysis by experts. Although many bioinformatics resources rely on expert curation to some degree, specific examples of how this curation occurs and in what cases it is necessary are not always described. Here, we illustrate the manual classification strategy in ECOD by example, focusing on two major issues in protein classification: domain partitioning and the relationship between homology and similarity scores. Most examples show recently released and manually classified PDB structures. We discuss multi-domain proteins, discordance between sequence and structural similarities, difficulties with assessing homology with scores, and integral membrane proteins homologous to soluble proteins. By timely assimilation of newly available structures into its hierarchy, ECOD strives to provide a most accurate and updated view of the protein structure world as a result of combined computational and expert-driven analysis. PMID:25917548

  9. Assisting manual literature curation for protein–protein interactions using BioQRator

    PubMed Central

    Kwon, Dongseop; Kim, Sun; Shin, Soo-Yong; Chatr-aryamontri, Andrew; Wilbur, W. John

    2014-01-01

    The time-consuming nature of manual curation and the rapid growth of biomedical literature severely limit the number of articles that database curators can scrutinize and annotate. Hence, semi-automatic tools can be a valid support to increase annotation throughput. Although a handful of curation assistant tools are already available, to date, little has been done to formally evaluate their benefit to biocuration. Moreover, most curation tools are designed for specific problems. Thus, it is not easy to apply an annotation tool for multiple tasks. BioQRator is a publicly available web-based tool for annotating biomedical literature. It was designed to support general tasks, i.e. any task annotating entities and relationships. In the BioCreative IV edition, BioQRator was tailored for protein– protein interaction (PPI) annotation by migrating information from PIE the search. The results obtained from six curators showed that the precision on the top 10 documents doubled with PIE the search compared with PubMed search results. It was also observed that the annotation time for a full PPI annotation task decreased for a beginner-intermediate level annotator. This finding is encouraging because text-mining techniques were not directly involved in the full annotation task and BioQRator can be easily integrated with any text-mining resources. Database URL: http://www.bioqrator.org/ PMID:25052701

  10. Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

    PubMed Central

    2010-01-01

    Background Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in

  11. A computational platform to maintain and migrate manual functional annotations for BioCyc databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Model organism databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integratio...

  12. OntoMate: a text-mining tool aiding curation at the Rat Genome Database.

    PubMed

    Liu, Weisong; Laulederkind, Stanley J F; Hayman, G Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu. PMID:25619558

  13. OntoMate: a text-mining tool aiding curation at the Rat Genome Database

    PubMed Central

    Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558

  14. The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases.

    PubMed

    Orchard, Sandra; Ammari, Mais; Aranda, Bruno; Breuza, Lionel; Briganti, Leonardo; Broackes-Carter, Fiona; Campbell, Nancy H; Chavali, Gayatri; Chen, Carol; del-Toro, Noemi; Duesbury, Margaret; Dumousseau, Marine; Galeota, Eugenia; Hinz, Ursula; Iannuccelli, Marta; Jagannathan, Sruthi; Jimenez, Rafael; Khadake, Jyoti; Lagreid, Astrid; Licata, Luana; Lovering, Ruth C; Meldal, Birgit; Melidoni, Anna N; Milagros, Mila; Peluso, Daniele; Perfetto, Livia; Porras, Pablo; Raghunath, Arathi; Ricard-Blum, Sylvie; Roechert, Bernd; Stutz, Andre; Tognolli, Michael; van Roey, Kim; Cesareni, Gianni; Hermjakob, Henning

    2014-01-01

    IntAct (freely available at http://www.ebi.ac.uk/intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMEx- and MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriate levels of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium (http://www.imexconsortium.org). PMID:24234451

  15. Curating and Preserving the Big Canopy Database System: an Active Curation Approach using SEAD

    NASA Astrophysics Data System (ADS)

    Myers, J.; Cushing, J. B.; Lynn, P.; Weiner, N.; Ovchinnikova, A.; Nadkarni, N.; McIntosh, A.

    2015-12-01

    Modern research is increasingly dependent upon highly heterogeneous data and on the associated cyberinfrastructure developed to organize, analyze, and visualize that data. However, due to the complexity and custom nature of such combined data-software systems, it can be very challenging to curate and preserve them for the long term at reasonable cost and in a way that retains their scientific value. In this presentation, we describe how this challenge was met in preserving the Big Canopy Database (CanopyDB) system using an agile approach and leveraging the Sustainable Environment - Actionable Data (SEAD) DataNet project's hosted data services. The CanopyDB system was developed over more than a decade at Evergreen State College to address the needs of forest canopy researchers. It is an early yet sophisticated exemplar of the type of system that has become common in biological research and science in general, including multiple relational databases for different experiments, a custom database generation tool used to create them, an image repository, and desktop and web tools to access, analyze, and visualize this data. SEAD provides secure project spaces with a semantic content abstraction (typed content with arbitrary RDF metadata statements and relationships to other content), combined with a standards-based curation and publication pipeline resulting in packaged research objects with Digital Object Identifiers. Using SEAD, our cross-project team was able to incrementally ingest CanopyDB components (images, datasets, software source code, documentation, executables, and virtualized services) and to iteratively define and extend the metadata and relationships needed to document them. We believe that both the process, and the richness of the resultant standards-based (OAI-ORE) preservation object, hold lessons for the development of best-practice solutions for preserving scientific data in association with the tools and services needed to derive value from it.

  16. A Manual Curation Strategy to Improve Genome Annotation: Application to a Set of Haloarchael Genomes

    PubMed Central

    Pfeiffer, Friedhelm; Oesterhelt, Dieter

    2015-01-01

    Genome annotation errors are a persistent problem that impede research in the biosciences. A manual curation effort is described that attempts to produce high-quality genome annotations for a set of haloarchaeal genomes (Halobacterium salinarum and Hbt. hubeiense, Haloferax volcanii and Hfx. mediterranei, Natronomonas pharaonis and Nmn. moolapensis, Haloquadratum walsbyi strains HBSQ001 and C23, Natrialba magadii, Haloarcula marismortui and Har. hispanica, and Halohasta litchfieldiae). Genomes are checked for missing genes, start codon misassignments, and disrupted genes. Assignments of a specific function are preferably based on experimentally characterized homologs (Gold Standard Proteins). To avoid overannotation, which is a major source of database errors, we restrict annotation to only general function assignments when support for a specific substrate assignment is insufficient. This strategy results in annotations that are resistant to the plethora of errors that compromise public databases. Annotation consistency is rigorously validated for ortholog pairs from the genomes surveyed. The annotation is regularly crosschecked against the UniProt database to further improve annotations and increase the level of standardization. Enhanced genome annotations are submitted to public databases (EMBL/GenBank, UniProt), to the benefit of the scientific community. The enhanced annotations are also publically available via HaloLex. PMID:26042526

  17. VIRsiRNAdb: a curated database of experimentally validated viral siRNA/shRNA

    PubMed Central

    Thakur, Nishant; Qureshi, Abid; Kumar, Manoj

    2012-01-01

    RNAi technology has been emerging as a potential modality to inhibit viruses during past decade. In literature a few siRNA databases have been reported that focus on targeting human and mammalian genes but experimentally validated viral siRNA databases are lacking. We have developed VIRsiRNAdb, a manually curated database having comprehensive details of 1358 siRNA/shRNA targeting viral genome regions. Further, wherever available, information regarding alternative efficacies of above 300 siRNAs derived from different assays has also been incorporated. Important fields included in the database are siRNA sequence, virus subtype, target genome region, cell type, target object, experimental assay, efficacy, off-target and siRNA matching with reference viral sequences. Database also provides the users with facilities of advance search, browsing, data submission, linking to external databases and useful siRNA analysis tools especially siTarAlign which align the siRNA with reference viral genomes or user defined sequences. VIRsiRNAdb contains extensive details of siRNA/shRNA targeting 42 important human viruses including influenza virus, hepatitis B virus, HPV and SARS Corona virus. VIRsiRNAdb would prove useful for researchers in picking up the best viral siRNA for antiviral therapeutics development and also for developing better viral siRNA design tools. The database is freely available at http://crdd.osdd.net/servers/virsirnadb. PMID:22139916

  18. HPIDB 2.0: a curated database for host–pathogen interactions

    PubMed Central

    Ammari, Mais G.; Gresham, Cathy R.; McCarthy, Fiona M.; Nanduri, Bindu

    2016-01-01

    Identification and analysis of host–pathogen interactions (HPI) is essential to study infectious diseases. However, HPI data are sparse in existing molecular interaction databases, especially for agricultural host–pathogen systems. Therefore, resources that annotate, predict and display the HPI that underpin infectious diseases are critical for developing novel intervention strategies. HPIDB 2.0 (http://www.agbase.msstate.edu/hpi/main.html) is a resource for HPI data, and contains 45, 238 manually curated entries in the current release. Since the first description of the database in 2010, multiple enhancements to HPIDB data and interface services were made that are described here. Notably, HPIDB 2.0 now provides targeted biocuration of molecular interaction data. As a member of the International Molecular Exchange consortium, annotations provided by HPIDB 2.0 curators meet community standards to provide detailed contextual experimental information and facilitate data sharing. Moreover, HPIDB 2.0 provides access to rapidly available community annotations that capture minimum molecular interaction information to address immediate researcher needs for HPI network analysis. In addition to curation, HPIDB 2.0 integrates HPI from existing external sources and contains tools to infer additional HPI where annotated data are scarce. Compared to other interaction databases, our data collection approach ensures HPIDB 2.0 users access the most comprehensive HPI data from a wide range of pathogens and their hosts (594 pathogen and 70 host species, as of February 2016). Improvements also include enhanced search capacity, addition of Gene Ontology functional information, and implementation of network visualization. The changes made to HPIDB 2.0 content and interface ensure that users, especially agricultural researchers, are able to easily access and analyse high quality, comprehensive HPI data. All HPIDB 2.0 data are updated regularly, are publically available for direct

  19. HPIDB 2.0: a curated database for host-pathogen interactions.

    PubMed

    Ammari, Mais G; Gresham, Cathy R; McCarthy, Fiona M; Nanduri, Bindu

    2016-01-01

    Identification and analysis of host-pathogen interactions (HPI) is essential to study infectious diseases. However, HPI data are sparse in existing molecular interaction databases, especially for agricultural host-pathogen systems. Therefore, resources that annotate, predict and display the HPI that underpin infectious diseases are critical for developing novel intervention strategies. HPIDB 2.0 (http://www.agbase.msstate.edu/hpi/main.html) is a resource for HPI data, and contains 45, 238 manually curated entries in the current release. Since the first description of the database in 2010, multiple enhancements to HPIDB data and interface services were made that are described here. Notably, HPIDB 2.0 now provides targeted biocuration of molecular interaction data. As a member of the International Molecular Exchange consortium, annotations provided by HPIDB 2.0 curators meet community standards to provide detailed contextual experimental information and facilitate data sharing. Moreover, HPIDB 2.0 provides access to rapidly available community annotations that capture minimum molecular interaction information to address immediate researcher needs for HPI network analysis. In addition to curation, HPIDB 2.0 integrates HPI from existing external sources and contains tools to infer additional HPI where annotated data are scarce. Compared to other interaction databases, our data collection approach ensures HPIDB 2.0 users access the most comprehensive HPI data from a wide range of pathogens and their hosts (594 pathogen and 70 host species, as of February 2016). Improvements also include enhanced search capacity, addition of Gene Ontology functional information, and implementation of network visualization. The changes made to HPIDB 2.0 content and interface ensure that users, especially agricultural researchers, are able to easily access and analyse high quality, comprehensive HPI data. All HPIDB 2.0 data are updated regularly, are publically available for direct

  20. SORGOdb: Superoxide Reductase Gene Ontology curated DataBase

    PubMed Central

    2011-01-01

    Background Superoxide reductases (SOR) catalyse the reduction of superoxide anions to hydrogen peroxide and are involved in the oxidative stress defences of anaerobic and facultative anaerobic organisms. Genes encoding SOR were discovered recently and suffer from annotation problems. These genes, named sor, are short and the transfer of annotations from previously characterized neelaredoxin, desulfoferrodoxin, superoxide reductase and rubredoxin oxidase has been heterogeneous. Consequently, many sor remain anonymous or mis-annotated. Description SORGOdb is an exhaustive database of SOR that proposes a new classification based on domain architecture. SORGOdb supplies a simple user-friendly web-based database for retrieving and exploring relevant information about the proposed SOR families. The database can be queried using an organism name, a locus tag or phylogenetic criteria, and also offers sequence similarity searches using BlastP. Genes encoding SOR have been re-annotated in all available genome sequences (prokaryotic and eukaryotic (complete and in draft) genomes, updated in May 2010). Conclusions SORGOdb contains 325 non-redundant and curated SOR, from 274 organisms. It proposes a new classification of SOR into seven different classes and allows biologists to explore and analyze sor in order to establish correlations between the class of SOR and organism phenotypes. SORGOdb is freely available at http://sorgo.genouest.org/index.php. PMID:21575179

  1. Active Design Database (ADDB) user's manual

    SciTech Connect

    Schwarz, R.L.; Nations, J.A.; Rosser, J.H.

    1991-02-01

    This manual is a guide to the Active Design Database (ADDB) on the Martin Marietta Energy Systems, Inc., IBM 3084 unclassified computer. The ADDB is an index to all CADAM models in the unclassified CADAM database and provides query and report capabilities. Section 2.0 of this manual presents an overview of the ADDB, describing the system's purpose; the functions it performs; hardware, software, and security requirements; and help and error functions. Section 3.0 describes how to access the system and how to operate the system functions using Database 2 (DB2), Time Sharing Option (TSO), and Interactive System Productivity Facility (ISPF) features employed by this system. Appendix A contains a dictionary of data elements maintained by the system. The data values are collected from the unclassified CADAM database. Appendix B provides a printout of the system help and error screens.

  2. PHYMYCO-DB: A Curated Database for Analyses of Fungal Diversity and Evolution

    PubMed Central

    Mahé, Stéphane; Duhamel, Marie; Le Calvez, Thomas; Guillot, Laetitia; Sarbu, Ludmila; Bretaudeau, Anthony; Collin, Olivier; Dufresne, Alexis; Kiers, E. Toby; Vandenkoornhuyse, Philippe

    2012-01-01

    Background In environmental sequencing studies, fungi can be identified based on nucleic acid sequences, using either highly variable sequences as species barcodes or conserved sequences containing a high-quality phylogenetic signal. For the latter, identification relies on phylogenetic analyses and the adoption of the phylogenetic species concept. Such analysis requires that the reference sequences are well identified and deposited in public-access databases. However, many entries in the public sequence databases are problematic in terms of quality and reliability and these data require screening to ensure correct phylogenetic interpretation. Methods and Principal Findings To facilitate phylogenetic inferences and phylogenetic assignment, we introduce a fungal sequence database. The database PHYMYCO-DB comprises fungal sequences from GenBank that have been filtered to satisfy stringent sequence quality criteria. For the first release, two widely used molecular taxonomic markers were chosen: the nuclear SSU rRNA and EF1-α gene sequences. Following the automatic extraction and filtration, a manual curation is performed to remove problematic sequences while preserving relevant sequences useful for phylogenetic studies. As a result of curation, ∼20% of the automatically filtered sequences have been removed from the database. To demonstrate how PHYMYCO-DB can be employed, we test a set of environmental Chytridiomycota sequences obtained from deep sea samples. Conclusion PHYMYCO-DB offers the tools necessary to: (i) extract high quality fungal sequences for each of the 5 fungal phyla, at all taxonomic levels, (ii) extract already performed alignments, to act as ‘reference alignments’, (iii) launch alignments of personal sequences along with stored data. A total of 9120 SSU rRNA and 672 EF1-α high-quality fungal sequences are now available. The PHYMYCO-DB is accessible through the URL http://phymycodb.genouest.org/. PMID:23028445

  3. PMRD: a curated database for genes and mutants involved in plant male reproduction

    PubMed Central

    2012-01-01

    Background Male reproduction is an essential biological event in the plant life cycle separating the diploid sporophyte and haploid gametophyte generations, which involves expression of approximately 20,000 genes. The control of male reproduction is also of economic importance for plant breeding and hybrid seed production. With the advent of forward and reverse genetics and genomic technologies, a large number of male reproduction-related genes have been identified. Thus it is extremely challenging for individual researchers to systematically collect, and continually update, all the available information on genes and mutants related to plant male reproduction. The aim of this study is to manually curate such gene and mutant information and provide a web-accessible resource to facilitate the effective study of plant male reproduction. Description Plant Male Reproduction Database (PMRD) is a comprehensive resource for browsing and retrieving knowledge on genes and mutants related to plant male reproduction. It is based upon literature and biological databases and includes 506 male sterile genes and 484 mutants with defects of male reproduction from a variety of plant species. Based on Gene Ontology (GO) annotations and literature, information relating to a further 3697 male reproduction related genes were systematically collected and included, and using in text curation, gene expression and phenotypic information were captured from the literature. PMRD provides a web interface which allows users to easily access the curated annotations and genomic information, including full names, symbols, locations, sequences, expression patterns, functions of genes, mutant phenotypes, male sterile categories, and corresponding publications. PMRD also provides mini tools to search and browse expression patterns of genes in microarray datasets, run BLAST searches, convert gene ID and generate gene networks. In addition, a Mediawiki engine and a forum have been integrated within the

  4. MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution

    PubMed Central

    Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian

    2015-01-01

    Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. PMID:26286928

  5. MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution.

    PubMed

    Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian

    2015-01-01

    Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. PMID:26286928

  6. RiceWiki: a wiki-based database for community curation of rice genes.

    PubMed

    Zhang, Zhang; Sang, Jian; Ma, Lina; Wu, Gang; Wu, Hao; Huang, Dawei; Zou, Dong; Liu, Siqi; Li, Ang; Hao, Lili; Tian, Ming; Xu, Chao; Wang, Xumin; Wu, Jiayan; Xiao, Jingfa; Dai, Lin; Chen, Ling-Ling; Hu, Songnian; Yu, Jun

    2014-01-01

    Rice is the most important staple food for a large part of the world's human population and also a key model organism for biological studies of crops as well as other related plants. Here we present RiceWiki (http://ricewiki.big.ac.cn), a wiki-based, publicly editable and open-content platform for community curation of rice genes. Most existing related biological databases are based on expert curation; with the exponentially exploding volume of rice knowledge and other relevant data, however, expert curation becomes increasingly laborious and time-consuming to keep knowledge up-to-date, accurate and comprehensive, struggling with the flood of data and requiring a large number of people getting involved in rice knowledge curation. Unlike extant relevant databases, RiceWiki features harnessing collective intelligence in community curation of rice genes, quantifying users' contributions in each curated gene and providing explicit authorship for each contributor in any given gene, with the aim to exploit the full potential of the scientific community for rice knowledge curation. Based on community curation, RiceWiki bears the potential to make it possible to build a rice encyclopedia by and for the scientific community that harnesses community intelligence for collaborative knowledge curation, covers all aspects of biological knowledge and keeps evolving with novel knowledge. PMID:24136999

  7. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences

    PubMed Central

    McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta

    2016-01-01

    BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation. Database URL: https://www.biosharing.org PMID:27189610

  8. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences.

    PubMed

    McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta

    2016-01-01

    BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation.Database URL: https://www.biosharing.org. PMID:27189610

  9. Instruction manual for the Wahoo computerized database

    SciTech Connect

    Lasota, D.; Watts, K.

    1995-05-01

    As part of our research on the Lisburne Group, we have developed a powerful relational computerized database to accommodate the huge amounts of data generated by our multi-disciplinary research project. The Wahoo database has data files on petrographic data, conodont analyses, locality and sample data, well logs and diagenetic (cement) studies. Chapter 5 is essentially an instruction manual that summarizes some of the unique attributes and operating procedures of the Wahoo database. The main purpose of a database is to allow users to manipulate their data and produce reports and graphs for presentation. We present a variety of data tables in appendices at the end of this report, each encapsulating a small part of the data contained in the Wahoo database. All the data are sorted and listed by map index number and stratigraphic position (depth). The Locality data table (Appendix A) lists of the stratigraphic sections examined in our study. It gives names of study areas, stratigraphic units studied, locality information, and researchers. Most localities are keyed to a geologic map that shows the distribution of the Lisburne Group and location of our sections in ANWR. Petrographic reports (Appendix B) are detailed summaries of data the composition and texture of the Lisburne Group carbonates. The relative abundance of different carbonate grains (allochems) and carbonate texture are listed using symbols that portray data in a format similar to stratigraphic columns. This enables researchers to recognize trends in the evolution of the Lisburne carbonate platform and to check their paleoenvironmental interpretations in a stratigraphic context. Some of the figures in Chapter 1 were made using the Wahoo database.

  10. Geroprotectors.org: a new, structured and curated database of current therapeutic interventions in aging and age-related disease

    PubMed Central

    Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E.; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C.; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex

    2015-01-01

    As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions. PMID:26342919

  11. National Solar Radiation Database 1991-2010 Update: User's Manual

    SciTech Connect

    Wilcox, S. M.

    2012-08-01

    This user's manual provides information on the updated 1991-2010 National Solar Radiation Database. Included are data format descriptions, data sources, production processes, and information about data uncertainty.

  12. National Solar Radiation Database 1991-2005 Update: User's Manual

    SciTech Connect

    Wilcox, S.

    2007-04-01

    This manual describes how to obtain and interpret the data products from the updated 1991-2005 National Solar Radiation Database (NSRDB). This is an update of the original 1961-1990 NSRDB released in 1992.

  13. A CTD–Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug–disease and drug–phenotype interactions

    PubMed Central

    Davis, Allan Peter; Wiegers, Thomas C.; Roberts, Phoebe M.; King, Benjamin L.; Lay, Jean M.; Lennon-Hopkins, Kelley; Sciaky, Daniela; Johnson, Robin; Keating, Heather; Greene, Nigel; Hernandez, Robert; McConnell, Kevin J.; Enayetallah, Ahmed E.; Mattingly, Carolyn J.

    2013-01-01

    Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88 629 articles relating over 1 200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 2 54 173 toxicogenomic interactions (1 52 173 chemical–disease, 58 572 chemical–gene, 5 345 gene–disease and 38 083 phenotype interactions). All chemical–gene–disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer’s text-mining process to collate the articles, and CTD’s curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug–disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades’ worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/ PMID:24288140

  14. Hydrologic database user`s manual

    SciTech Connect

    Champman, J.B.; Gray, K.J.; Thompson, C.B.

    1993-09-01

    The Hydrologic Database is an electronic filing cabinet containing water-related data for the Nevada Test Site (NTS). The purpose of the database is to enhance research on hydrologic issues at the NTS by providing efficient access to information gathered by a variety of scientists. Data are often generated for specific projects and are reported to DOE in the context of specific project goals. The originators of the database recognized that much of this information has a general value that transcends project-specific requirements. Allowing researchers access to information generated by a wide variety of projects can prevent needless duplication of data-gathering efforts and can augment new data collection and interpretation. In addition, collecting this information in the database ensures that the results are not lost at the end of discrete projects as long as the database is actively maintained. This document is a guide to using the database.

  15. DEPOT database: Reference manual and user's guide

    SciTech Connect

    Clancey, P.; Logg, C.

    1991-03-01

    DEPOT has been developed to provide tracking for the Stanford Linear Collider (SLC) control system equipment. For each piece of equipment entered into the database, complete location, service, maintenance, modification, certification, and radiation exposure histories can be maintained. To facilitate data entry accuracy, efficiency, and consistency, barcoding technology has been used extensively. DEPOT has been an important tool in improving the reliability of the microsystems controlling SLC. This document describes the components of the DEPOT database, the elements in the database records, and the use of the supporting programs for entering data, searching the database, and producing reports from the information.

  16. Human and chicken TLR pathways: manual curation and computer-based orthology analysis

    PubMed Central

    Gillespie, Marc; Shamovsky, Veronica; D’Eustachio, Peter

    2011-01-01

    The innate immune responses mediated by Toll-like receptors (TLR) provide an evolutionarily well-conserved first line of defense against microbial pathogens. In the Reactome Knowledgebase we previously integrated annotations of human TLR molecular functions with those of over 4000 other human proteins involved in processes such as adaptive immunity, DNA replication, signaling, and intermediary metabolism, and have linked these annotations to external resources, including PubMed, UniProt, EntrezGene, Ensembl, and the Gene Ontology to generate a resource suitable for data mining, pathway analysis, and other systems biology approaches. We have now used a combination of manual expert curation and computer-based orthology analysis to generate a set of annotations for TLR molecular function in the chicken (Gallus gallus). Mammalian and avian lineages diverged approximately 300 million years ago, and the avian TLR repertoire consists of both orthologs and distinct new genes. The work described here centers on the molecular biology of TLR3, the host receptor that mediates responses to viral and other doubled-stranded polynucleotides, as a paradigm for our approach to integrated manual and computationally based annotation and data analysis. It tests the quality of computationally generated annotations projected from human onto other species and supports a systems biology approach to analysis of virus-activated signaling pathways and identification of clinically useful antiviral measures. PMID:21052677

  17. mycoCLAP, the database for characterized lignocellulose-active proteins of fungal origin: resource and text mining curation support

    PubMed Central

    Strasser, Kimchi; McDonnell, Erin; Nyaga, Carol; Wu, Min; Wu, Sherry; Almeida, Hayda; Meurs, Marie-Jean; Kosseim, Leila; Powlowski, Justin; Butler, Greg; Tsang, Adrian

    2015-01-01

    Enzymes active on components of lignocellulosic biomass are used for industrial applications ranging from food processing to biofuels production. These include a diverse array of glycoside hydrolases, carbohydrate esterases, polysaccharide lyases and oxidoreductases. Fungi are prolific producers of these enzymes, spurring fungal genome sequencing efforts to identify and catalogue the genes that encode them. To facilitate the functional annotation of these genes, biochemical data on over 800 fungal lignocellulose-degrading enzymes have been collected from the literature and organized into the searchable database, mycoCLAP (http://mycoclap.fungalgenomics.ca). First implemented in 2011, and updated as described here, mycoCLAP is capable of ranking search results according to closest biochemically characterized homologues: this improves the quality of the annotation, and significantly decreases the time required to annotate novel sequences. The database is freely available to the scientific community, as are the open source applications based on natural language processing developed to support the manual curation of mycoCLAP. Database URL: http://mycoclap.fungalgenomics.ca PMID:25754864

  18. The SIB Swiss Institute of Bioinformatics’ resources: focus on curated databases

    PubMed Central

    2016-01-01

    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article. PMID:26615188

  19. The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases.

    PubMed

    2016-01-01

    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article. PMID:26615188

  20. CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by 13C-fluxomics.

    PubMed

    Zhang, Zhengdong; Shen, Tie; Rui, Bin; Zhou, Wenwei; Zhou, Xiangfei; Shang, Chuanyu; Xin, Chenwei; Liu, Xiaoguang; Li, Gang; Jiang, Jiansi; Li, Chao; Li, Ruiyuan; Han, Mengshu; You, Shanping; Yu, Guojun; Yi, Yin; Wen, Han; Liu, Zhijie; Xie, Xiaoyao

    2015-01-01

    The Central Carbon Metabolic Flux Database (CeCaFDB, available at http://www.cecafdb.org) is a manually curated, multipurpose and open-access database for the documentation, visualization and comparative analysis of the quantitative flux results of central carbon metabolism among microbes and animal cells. It encompasses records for more than 500 flux distributions among 36 organisms and includes information regarding the genotype, culture medium, growth conditions and other specific information gathered from hundreds of journal articles. In addition to its comprehensive literature-derived data, the CeCaFDB supports a common text search function among the data and interactive visualization of the curated flux distributions with compartmentation information based on the Cytoscape Web API, which facilitates data interpretation. The CeCaFDB offers four modules to calculate a similarity score or to perform an alignment between the flux distributions. One of the modules was built using an inter programming algorithm for flux distribution alignment that was specifically designed for this study. Based on these modules, the CeCaFDB also supports an extensive flux distribution comparison function among the curated data. The CeCaFDB is strenuously designed to address the broad demands of biochemists, metabolic engineers, systems biologists and members of the -omics community. PMID:25392417

  1. Strategies for annotation and curation of translational databases: the eTUMOUR project.

    PubMed

    Julià-Sapé, Margarida; Lurgi, Miguel; Mier, Mariola; Estanyol, Francesc; Rafael, Xavier; Candiota, Ana Paula; Barceló, Anna; García, Alina; Martínez-Bisbal, M Carmen; Ferrer-Luna, Rubén; Moreno-Torres, Ángel; Celda, Bernardo; Arús, Carles

    2012-01-01

    The eTUMOUR (eT) multi-centre project gathered in vivo and ex vivo magnetic resonance (MR) data, as well as transcriptomic and clinical information from brain tumour patients, with the purpose of improving the diagnostic and prognostic evaluation of future patients. In order to carry this out, among other work, a database--the eTDB--was developed. In addition to complex permission rules and software and management quality control (QC), it was necessary to develop anonymization, processing and data visualization tools for the data uploaded. It was also necessary to develop sophisticated curation strategies that involved on one hand, dedicated fields for QC-generated meta-data and specialized queries and global permissions for senior curators and on the other, to establish a set of metrics to quantify its contents. The indispensable dataset (ID), completeness and pairedness indices were set. The database contains 1317 cases created as a result of the eT project and 304 from a previous project, INTERPRET. The number of cases fulfilling the ID was 656. Completeness and pairedness were heterogeneous, depending on the data type involved. PMID:23180768

  2. Laminin database: a tool to retrieve high-throughput and curated data for studies on laminins.

    PubMed

    Golbert, Daiane C F; Linhares-Lacerda, Leandra; Almeida, Luiz G; Correa-de-Santana, Eliane; de Oliveira, Alice R; Mundstein, Alex S; Savino, Wilson; de Vasconcelos, Ana T R

    2011-01-01

    The Laminin(LM)-database, hosted at http://www.lm.lncc.br, is the first database focusing a non-collagenous extracellular matrix protein family, the LMs. Part of the knowledge available in this website is automatically retrieved, whereas a significant amount of information is curated and annotated, thus placing LM-database beyond a simple repository of data. In its home page, an overview of the rationale for the database is seen and readers can access a tutorial to facilitate navigation in the website, which in turn is presented with tabs subdivided into LMs, receptors, extracellular binding and other related proteins. Each tab opens into a given LM or LM-related molecule, where the reader finds a series of further tabs for 'protein', 'gene structure', 'gene expression' and 'tissue distribution' and 'therapy'. Data are separated as a function of species, comprising Homo sapiens, Mus musculus and Rattus novergicus. Furthermore, there is specific tab displaying the LM nomenclatures. In another tab, a direct link to PubMed, which can be then consulted in a specific way, in terms of the biological functions of each molecule, knockout animals and genetic diseases, immune response and lymphomas/leukemias. LM-database will hopefully be a relevant tool for retrieving information concerning LMs in health and disease, particularly regarding the hemopoietic system. PMID:21087995

  3. Strategies for annotation and curation of translational databases: the eTUMOUR project

    PubMed Central

    Julià-Sapé, Margarida; Lurgi, Miguel; Mier, Mariola; Estanyol, Francesc; Rafael, Xavier; Candiota, Ana Paula; Barceló, Anna; García, Alina; Martínez-Bisbal, M. Carmen; Ferrer-Luna, Rubén; Moreno-Torres, Àngel; Celda, Bernardo; Arús, Carles

    2012-01-01

    The eTUMOUR (eT) multi-centre project gathered in vivo and ex vivo magnetic resonance (MR) data, as well as transcriptomic and clinical information from brain tumour patients, with the purpose of improving the diagnostic and prognostic evaluation of future patients. In order to carry this out, among other work, a database—the eTDB—was developed. In addition to complex permission rules and software and management quality control (QC), it was necessary to develop anonymization, processing and data visualization tools for the data uploaded. It was also necessary to develop sophisticated curation strategies that involved on one hand, dedicated fields for QC-generated meta-data and specialized queries and global permissions for senior curators and on the other, to establish a set of metrics to quantify its contents. The indispensable dataset (ID), completeness and pairedness indices were set. The database contains 1317 cases created as a result of the eT project and 304 from a previous project, INTERPRET. The number of cases fulfilling the ID was 656. Completeness and pairedness were heterogeneous, depending on the data type involved. PMID:23180768

  4. The Developmental Brain Disorders Database (DBDB): A Curated Neurogenetics Knowledge Base With Clinical and Research Applications

    PubMed Central

    Mirzaa, Ghayda M.; Millen, Kathleen J.; Barkovich, A. James; Dobyns, William B.; Paciorkowski, Alex R.

    2014-01-01

    The number of single genes associated with neurodevelopmental disorders has increased dramatically over the past decade. The identification of causative genes for these disorders is important to clinical outcome as it allows for accurate assessment of prognosis, genetic counseling, delineation of natural history, inclusion in clinical trials, and in some cases determines therapy. Clinicians face the challenge of correctly identifying neurodevelopmental phenotypes, recognizing syndromes, and prioritizing the best candidate genes for testing. However, there is no central repository of definitions for many phenotypes, leading to errors of diagnosis. Additionally, there is no system of levels of evidence linking genes to phenotypes, making it difficult for clinicians to know which genes are most strongly associated with a given condition. We have developed the Developmental Brain Disorders Database (DBDB: https://www.dbdb.urmc.rochester.edu/home), a publicly available, online-curated repository of genes, phenotypes, and syndromes associated with neurodevelopmental disorders. DBDB contains the first referenced ontology of developmental brain phenotypes, and uses a novel system of levels of evidence for gene-phenotype associations. It is intended to assist clinicians in arriving at the correct diagnosis, select the most appropriate genetic test for that phenotype, and improve the care of patients with developmental brain disorders. For researchers interested in the discovery of novel genes for developmental brain disorders, DBDB provides a well-curated source of important genes against which research sequencing results can be compared. Finally, DBDB allows novel observations about the landscape of the neurogenetics knowledge base. PMID:24700709

  5. NCG 5.0: updates of a manually curated repository of cancer genes and associated properties from cancer mutational screenings

    PubMed Central

    An, Omer; Dall'Olio, Giovanni M.; Mourikis, Thanos P.; Ciccarelli, Francesca D.

    2016-01-01

    The Network of Cancer Genes (NCG, http://ncg.kcl.ac.uk/) is a manually curated repository of cancer genes derived from the scientific literature. Due to the increasing amount of cancer genomic data, we have introduced a more robust procedure to extract cancer genes from published cancer mutational screenings and two curators independently reviewed each publication. NCG release 5.0 (August 2015) collects 1571 cancer genes from 175 published studies that describe 188 mutational screenings of 13 315 cancer samples from 49 cancer types and 24 primary sites. In addition to collecting cancer genes, NCG also provides information on the experimental validation that supports the role of these genes in cancer and annotates their properties (duplicability, evolutionary origin, expression profile, function and interactions with proteins and miRNAs). PMID:26516186

  6. ProPepper: a curated database for identification and analysis of peptide and immune-responsive epitope composition of cereal grain protein families

    PubMed Central

    Juhász, Angéla; Haraszi, Réka; Maulis, Csaba

    2015-01-01

    ProPepper is a database that contains prolamin proteins identified from true grasses (Poaceae), their peptides obtained with single- and multi-enzyme in silico digestions as well as linear T- and B-cell-specific epitopes that are responsible for wheat-related food disorders. The integrated database and analysis platform contains datasets that are collected from multiple public databases (UniprotKB, IEDB, NCBI GenBank), manually curated and annotated, and interpreted in three main data tables: Protein-, Peptide- and Epitope list views that are cross-connected by unique identifications. Altogether 21 genera and 80 different species are represented. Currently, the database contains 2146 unique and complete protein sequences related to 2618 GenBank entries and 35 657 unique peptide sequences that are a result of 575 110 unique digestion events obtained by in silico digestion methods involving six proteolytic enzymes and their combinations. The interface allows advanced global and parametric search functions along with a download option, with direct connections to the relevant public databases. Database URL: https://propepper.net PMID:26450949

  7. Biomedical text summarization to support genetic database curation: using Semantic MEDLINE to create a secondary database of genetic information

    PubMed Central

    Fiszman, Marcelo; Hurdle, John F; Rindflesch, Thomas C

    2010-01-01

    Objective: This paper examines the development and evaluation of an automatic summarization system in the domain of molecular genetics. The system is a potential component of an advanced biomedical information management application called Semantic MEDLINE and could assist librarians in developing secondary databases of genetic information extracted from the primary literature. Methods: An existing summarization system was modified for identifying biomedical text relevant to the genetic etiology of disease. The summarization system was evaluated on the task of identifying data describing genes associated with bladder cancer in MEDLINE citations. A gold standard was produced using records from Genetics Home Reference and Online Mendelian Inheritance in Man. Genes in text found by the system were compared to the gold standard. Recall, precision, and F-measure were calculated. Results: The system achieved recall of 46%, and precision of 88% (F-measure = 0.61) by taking Gene References into Function (GeneRIFs) into account. Conclusion: The new summarization schema for genetic etiology has potential as a component in Semantic MEDLINE to support the work of data curators. PMID:20936065

  8. EpimiR: a database of curated mutual regulation between miRNAs and epigenetic modifications.

    PubMed

    Dai, Enyu; Yu, Xuexin; Zhang, Yan; Meng, Fanlin; Wang, Shuyuan; Liu, Xinyi; Liu, Dianming; Wang, Jing; Li, Xia; Jiang, Wei

    2014-01-01

    As two kinds of important gene expression regulators, both epigenetic modification and microRNA (miRNA) can play significant roles in a wide range of human diseases. Recently, many studies have demonstrated that epigenetics and miRNA can affect each other in various ways. In this study, we established the EpimiR database, which collects 1974 regulations between 19 kinds of epigenetic modifications (such as DNA methylation, histone acetylation, H3K4me3, H3S10p) and 617 miRNAs across seven species (including Homo sapiens, Mus musculus, Rattus norvegicus, Gallus gallus, Epstein-Barr virus, Canis familiaris and Arabidopsis thaliana) from >300 references in the literature. These regulations can be divided into two parts: miR2Epi (103 entries describing how miRNA regulates epigenetic modification) and Epi2miR (1871 entries describing how epigenetic modification affects miRNA). Each entry of EpimiR not only contains basic descriptions of the validated experiment (method, species, reference and so on) but also clearly illuminates the regulatory pathway between epigenetics and miRNA. As a supplement to the curated information, the EpimiR extends to gather predicted epigenetic features (such as predicted transcription start site, upstream CpG island) associated with miRNA for users to guide their future biological experiments. Finally, EpimiR offers download and submission pages. Thus, EpimiR provides a fairly comprehensive repository about the mutual regulation between epigenetic modifications and miRNAs, which will promote the research on the regulatory mechanism of epigenetics and miRNA. Database URL: http://bioinfo.hrbmu.edu.cn/EpimiR/. PMID:24682734

  9. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    PubMed

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248

  10. RNRdb, a curated database of the universal enzyme family ribonucleotide reductase, reveals a high level of misannotation in sequences deposited to Genbank

    PubMed Central

    2009-01-01

    Background Ribonucleotide reductases (RNRs) catalyse the only known de novo pathway for deoxyribonucleotide synthesis, and are therefore essential to DNA-based life. While ribonucleotide reduction has a single evolutionary origin, significant differences between RNRs nevertheless exist, notably in cofactor requirements, subunit composition and allosteric regulation. These differences result in distinct operational constraints (anaerobicity, iron/oxygen dependence and cobalamin dependence), and form the basis for the classification of RNRs into three classes. Description In RNRdb (Ribonucleotide Reductase database), we have collated and curated all known RNR protein sequences with the aim of providing a resource for exploration of RNR diversity and distribution. By comparing expert manual annotations with annotations stored in Genbank, we find that significant inaccuracies exist in larger databases. To our surprise, only 23% of protein sequences included in RNRdb are correctly annotated across the key attributes of class, role and function, with 17% being incorrectly annotated across all three categories. This illustrates the utility of specialist databases for applications where a high degree of annotation accuracy may be important. The database houses information on annotation, distribution and diversity of RNRs, and links to solved RNR structures, and can be searched through a BLAST interface. RNRdb is accessible through a public web interface at http://rnrdb.molbio.su.se. Conclusion RNRdb is a specialist database that provides a reliable annotation and classification resource for RNR proteins, as well as a tool to explore distribution patterns of RNR classes. The recent expansion in available genome sequence data have provided us with a picture of RNR distribution that is more complex than believed only a few years ago; our database indicates that RNRs of all three classes are found across all three cellular domains. Moreover, we find a number of organisms that

  11. PASmiR: a literature-curated database for miRNA molecular regulation in plant response to abiotic stress

    PubMed Central

    2013-01-01

    Background Over 200 published studies of more than 30 plant species have reported a role for miRNAs in regulating responses to abiotic stresses. However, data from these individual reports has not been collected into a single database. The lack of a curated database of stress-related miRNAs limits research in this field, and thus a cohesive database system should necessarily be constructed for data deposit and further application. Description PASmiR, a literature-curated and web-accessible database, was developed to provide detailed, searchable descriptions of miRNA molecular regulation in different plant abiotic stresses. PASmiR currently includes data from ~200 published studies, representing 1038 regulatory relationships between 682 miRNAs and 35 abiotic stresses in 33 plant species. PASmiR’s interface allows users to retrieve miRNA-stress regulatory entries by keyword search using plant species, abiotic stress, and miRNA identifier. Each entry upon keyword query contains detailed regulation information for a specific miRNA, including species name, miRNA identifier, stress name, miRNA expression pattern, detection method for miRNA expression, a reference literature, and target gene(s) of the miRNA extracted from the corresponding reference or miRBase. Users can also contribute novel regulatory entries by using a web-based submission page. The PASmiR database is freely accessible from the two URLs of http://hi.ustc.edu.cn:8080/PASmiR, and http://pcsb.ahau.edu.cn:8080/PASmiR. Conclusion The PASmiR database provides a solid platform for collection, standardization, and searching of miRNA-abiotic stress regulation data in plants. As such this database will be a comprehensive repository for miRNA regulatory mechanisms involved in plant response to abiotic stresses for the plant stress physiology community. PMID:23448274

  12. Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?

    PubMed

    Winnenburg, Rainer; Wächter, Thomas; Plake, Conrad; Doms, Andreas; Schroeder, Michael

    2008-11-01

    The biomedical literature can be seen as a large integrated, but unstructured data repository. Extracting facts from literature and making them accessible is approached from two directions: manual curation efforts develop ontologies and vocabularies to annotate gene products based on statements in papers. Text mining aims to automatically identify entities and their relationships in text using information retrieval and natural language processing techniques. Manual curation is highly accurate but time consuming, and does not scale with the ever increasing growth of literature. Text mining as a high-throughput computational technique scales well, but is error-prone due to the complexity of natural language. How can both be married to combine scalability and accuracy? Here, we review the state-of-the-art text mining approaches that are relevant to annotation and discuss available online services analysing biomedical literature by means of text mining techniques, which could also be utilised by annotation projects. We then examine how far text mining has already been utilised in existing annotation projects and conclude how these techniques could be tightly integrated into the manual annotation process through novel authoring systems to scale-up high-quality manual curation. PMID:19060303

  13. Improving the Discoverability and Availability of Sample Data and Imagery in NASA's Astromaterials Curation Digital Repository Using a New Common Architecture for Sample Databases

    NASA Technical Reports Server (NTRS)

    Todd, N. S.; Evans, C.

    2015-01-01

    The Astromaterials Acquisition and Curation Office at NASA's Johnson Space Center (JSC) is the designated facility for curating all of NASA's extraterrestrial samples. The suite of collections includes the lunar samples from the Apollo missions, cosmic dust particles falling into the Earth's atmosphere, meteorites collected in Antarctica, comet and interstellar dust particles from the Stardust mission, asteroid particles from the Japanese Hayabusa mission, and solar wind atoms collected during the Genesis mission. To support planetary science research on these samples, NASA's Astromaterials Curation Office hosts the Astromaterials Curation Digital Repository, which provides descriptions of the missions and collections, and critical information about each individual sample. Our office is implementing several informatics initiatives with the goal of better serving the planetary research community. One of these initiatives aims to increase the availability and discoverability of sample data and images through the use of a newly designed common architecture for Astromaterials Curation databases.

  14. The Rat Genome Database curation tool suite: a set of optimized software tools enabling efficient acquisition, organization, and presentation of biological data

    PubMed Central

    Laulederkind, Stanley J. F.; Shimoyama, Mary; Hayman, G. Thomas; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Wang, Shur-Jen; de Pons, Jeff; Kowalski, George; Liu, Weisong; Rood, Wes; Munzenmaier, Diane H.; Dwinell, Melinda R.; Twigger, Simon N.; Jacob, Howard J.

    2011-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40 000 rat gene records as well as human and mouse orthologs, 1771 rat and 1911 human quantitative trait loci (QTLs) and 2209 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. A suite of tools has been developed to aid curators in acquiring and validating data objects, assigning nomenclature, attaching biological information to objects and making connections among data types. The software used to assign nomenclature, to create and edit objects and to make annotations to the data objects has been specifically designed to make the curation process as fast and efficient as possible. The user interfaces have been adapted to the work routines of the curators, creating a suite of tools that is intuitive and powerful. Database URL: http://rgd.mcw.edu PMID:21321022

  15. UNSODA UNSATURATED SOIL HYDRAULIC DATABASE USER'S MANUAL VERSION 1.0

    EPA Science Inventory

    This report contains general documentation and serves as a user manual of the UNSODA program. UNSODA is a database of unsaturated soil hydraulic properties (water retention, hydraulic conductivity, and soil water diffusivity), basic soil properties (particle-size distribution, b...

  16. MALIDUP: a database of manually constructed structure alignments for duplicated domain pairs.

    PubMed

    Cheng, Hua; Kim, Bong-Hyun; Grishin, Nick V

    2008-03-01

    We describe MALIDUP (manual alignments of duplicated domains), a database of 241 pairwise structure alignments for homologous domains originated by internal duplication within the same polypeptide chain. Since duplicated domains within a protein frequently diverge in function and thus in sequence, this would be the first database of structurally similar homologs that is not strongly biased by sequence or functional similarity. Our manual alignments in most cases agree with the automatic structural alignments generated by several commonly used programs. This carefully constructed database could be used in studies on protein evolution and as a reference for testing structure alignment programs. The database is available at http://prodata.swmed.edu/malidup. PMID:17932926

  17. Selective databases distributed on the basis of Frascati manual

    PubMed Central

    Kujundzic, Enes; Masic, Izet

    2013-01-01

    Introduction The answer to the question of what a database is and what is its relevance to the scientific research is not easy. We may not be wrong if we say that it is, basically, a kind of information resource, often incomparably richer than it is, for example, a single book or magazine. Discussion and conclusion As a form of storing and retrieval of the knowledge, appeared in the information age, which we’ve just participated and witnesses. In it, thanks to the technical possibilities of information networks, it can be searched for a number of more or less relevant information, and that scientific and profound content. Databases are divided into: bibliographic databases, citation databases and databases containing full-text. In the paper are shortly presented most important on-line databases with their web site links. Thanks to those online databases scientific knowledge is spreading much more easy and useful. PMID:23572867

  18. Data Curation for the Exploitation of Large Earth Observation Products Databases - The MEA system

    NASA Astrophysics Data System (ADS)

    Mantovani, Simone; Natali, Stefano; Barboni, Damiano; Cavicchi, Mario; Della Vecchia, Andrea

    2014-05-01

    National Space Agencies under the umbrella of the European Space Agency are performing a strong activity to handle and provide solutions to Big Data and related knowledge (metadata, software tools and services) management and exploitation. The continuously increasing amount of long-term and of historic data in EO facilities in the form of online datasets and archives, the incoming satellite observation platforms that will generate an impressive amount of new data and the new EU approach on the data distribution policy make necessary to address technologies for the long-term management of these data sets, including their consolidation, preservation, distribution, continuation and curation across multiple missions. The management of long EO data time series of continuing or historic missions - with more than 20 years of data available already today - requires technical solutions and technologies which differ considerably from the ones exploited by existing systems. Several tools, both open source and commercial, are already providing technologies to handle data and metadata preparation, access and visualization via OGC standard interfaces. This study aims at describing the Multi-sensor Evolution Analysis (MEA) system and the Data Curation concept as approached and implemented within the ASIM and EarthServer projects, funded by the European Space Agency and the European Commission, respectively.

  19. Practical guidelines addressing ethical issues pertaining to the curation of human locus-specific variation databases (LSDBs)

    PubMed Central

    Povey, Sue; Al Aqeel, Aida I; Cambon-Thomsen, Anne; Dalgleish, Raymond; den Dunnen, Johan T; Firth, Helen V; Greenblatt, Marc S; Barash, Carol Isaacson; Parker, Michael; Patrinos, George P; Savige, Judith; Sobrido, Maria-Jesus; Winship, Ingrid; Cotton, Richard GH

    2010-01-01

    More than 1,000 Web-based locus-specific variation databases (LSDBs) are listed on the Website of the Human Genetic Variation Society (HGVS). These individual efforts, which often relate phenotype to genotype, are a valuable source of information for clinicians, patients, and their families, as well as for basic research. The initiators of the Human Variome Project recently recognized that having access to some of the immense resources of unpublished information already present in diagnostic laboratories would provide critical data to help manage genetic disorders. However, there are significant ethical issues involved in sharing these data worldwide. An international working group presents second-generation guidelines addressing ethical issues relating to the curation of human LSDBs that provide information via a Web-based interface. It is intended that these should help current and future curators and may also inform the future decisions of ethics committees and legislators. These guidelines have been reviewed by the Ethics Committee of the Human Genome Organization (HUGO). Hum Mutat 31:–6, 2010. © 2010 Wiley-Liss, Inc. PMID:20683926

  20. R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring

    PubMed Central

    Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès

    2016-01-01

    Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the

  1. R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring.

    PubMed

    Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès

    2016-01-01

    Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the

  2. NEMiD: A Web-Based Curated Microbial Diversity Database with Geo-Based Plotting

    PubMed Central

    Bhattacharjee, Kaushik; Joshi, Santa Ram

    2014-01-01

    The majority of the Earth's microbes remain unknown, and that their potential utility cannot be exploited until they are discovered and characterized. They provide wide scope for the development of new strains as well as biotechnological uses. The documentation and bioprospection of microorganisms carry enormous significance considering their relevance to human welfare. This calls for an urgent need to develop a database with emphasis on the microbial diversity of the largest untapped reservoirs in the biosphere. The data annotated in the North-East India Microbial database (NEMiD) were obtained by the isolation and characterization of microbes from different parts of the Eastern Himalayan region. The database was constructed as a relational database management system (RDBMS) for data storage in MySQL in the back-end on a Linux server and implemented in an Apache/PHP environment. This database provides a base for understanding the soil microbial diversity pattern in this megabiodiversity hotspot and indicates the distribution patterns of various organisms along with identification. The NEMiD database is freely available at www.mblabnehu.info/nemid/. PMID:24714636

  3. NEMiD: a web-based curated microbial diversity database with geo-based plotting.

    PubMed

    Bhattacharjee, Kaushik; Joshi, Santa Ram

    2014-01-01

    The majority of the Earth's microbes remain unknown, and that their potential utility cannot be exploited until they are discovered and characterized. They provide wide scope for the development of new strains as well as biotechnological uses. The documentation and bioprospection of microorganisms carry enormous significance considering their relevance to human welfare. This calls for an urgent need to develop a database with emphasis on the microbial diversity of the largest untapped reservoirs in the biosphere. The data annotated in the North-East India Microbial database (NEMiD) were obtained by the isolation and characterization of microbes from different parts of the Eastern Himalayan region. The database was constructed as a relational database management system (RDBMS) for data storage in MySQL in the back-end on a Linux server and implemented in an Apache/PHP environment. This database provides a base for understanding the soil microbial diversity pattern in this megabiodiversity hotspot and indicates the distribution patterns of various organisms along with identification. The NEMiD database is freely available at www.mblabnehu.info/nemid/. PMID:24714636

  4. TWRS information locator database system administrator`s manual

    SciTech Connect

    Knutson, B.J., Westinghouse Hanford

    1996-09-13

    This document is a guide for use by the Tank Waste Remediation System (TWRS) Information Locator Database (ILD) System Administrator. The TWRS ILD System is an inventory of information used in the TWRS Systems Engineering process to represent the TWRS Technical Baseline. The inventory is maintained in the form of a relational database developed in Paradox 4.5.

  5. Classifying the bacterial gut microbiota of termites and cockroaches: A curated phylogenetic reference database (DictDb).

    PubMed

    Mikaelyan, Aram; Köhler, Tim; Lampert, Niclas; Rohland, Jeffrey; Boga, Hamadi; Meuser, Katja; Brune, Andreas

    2015-10-01

    Recent developments in sequencing technology have given rise to a large number of studies that assess bacterial diversity and community structure in termite and cockroach guts based on large amplicon libraries of 16S rRNA genes. Although these studies have revealed important ecological and evolutionary patterns in the gut microbiota, classification of the short sequence reads is limited by the taxonomic depth and resolution of the reference databases used in the respective studies. Here, we present a curated reference database for accurate taxonomic analysis of the bacterial gut microbiota of dictyopteran insects. The Dictyopteran gut microbiota reference Database (DictDb) is based on the Silva database but was significantly expanded by the addition of clones from 11 mostly unexplored termite and cockroach groups, which increased the inventory of bacterial sequences from dictyopteran guts by 26%. The taxonomic depth and resolution of DictDb was significantly improved by a general revision of the taxonomic guide tree for all important lineages, including a detailed phylogenetic analysis of the Treponema and Alistipes complexes, the Fibrobacteres, and the TG3 phylum. The performance of this first documented version of DictDb (v. 3.0) using the revised taxonomic guide tree in the classification of short-read libraries obtained from termites and cockroaches was highly superior to that of the current Silva and RDP databases. DictDb uses an informative nomenclature that is consistent with the literature also for clades of uncultured bacteria and provides an invaluable tool for anyone exploring the gut community structure of termites and cockroaches. PMID:26283320

  6. The Coral Trait Database, a curated database of trait information for coral species from the global oceans.

    PubMed

    Madin, Joshua S; Anderson, Kristen D; Andreasen, Magnus Heide; Bridge, Tom C L; Cairns, Stephen D; Connolly, Sean R; Darling, Emily S; Diaz, Marcela; Falster, Daniel S; Franklin, Erik C; Gates, Ruth D; Hoogenboom, Mia O; Huang, Danwei; Keith, Sally A; Kosnik, Matthew A; Kuo, Chao-Yang; Lough, Janice M; Lovelock, Catherine E; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M; Pochon, Xavier; Pratchett, Morgan S; Putnam, Hollie M; Roberts, T Edward; Stat, Michael; Wallace, Carden C; Widman, Elizabeth; Baird, Andrew H

    2016-01-01

    Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism's function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research. PMID:27023900

  7. The Coral Trait Database, a curated database of trait information for coral species from the global oceans

    PubMed Central

    Madin, Joshua S.; Anderson, Kristen D.; Andreasen, Magnus Heide; Bridge, Tom C.L.; Cairns, Stephen D.; Connolly, Sean R.; Darling, Emily S.; Diaz, Marcela; Falster, Daniel S.; Franklin, Erik C.; Gates, Ruth D.; Hoogenboom, Mia O.; Huang, Danwei; Keith, Sally A.; Kosnik, Matthew A.; Kuo, Chao-Yang; Lough, Janice M.; Lovelock, Catherine E.; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M.; Pochon, Xavier; Pratchett, Morgan S.; Putnam, Hollie M.; Roberts, T. Edward; Stat, Michael; Wallace, Carden C.; Widman, Elizabeth; Baird, Andrew H.

    2016-01-01

    Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism’s function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research. PMID:27023900

  8. Database Changes (Post-Publication). ERIC Processing Manual, Section X.

    ERIC Educational Resources Information Center

    Brandhorst, Ted, Ed.

    The purpose of this section is to specify the procedure for making changes to the ERIC database after the data involved have been announced in the abstract journals RIE or CIJE. As a matter of general ERIC policy, a document or journal article is not re-announced or re-entered into the database as a new accession for the purpose of accomplishing a…

  9. Nuclear Energy Infrastructure Database Description and User’s Manual

    SciTech Connect

    Heidrich, Brenden

    2015-11-01

    In 2014, the Deputy Assistant Secretary for Science and Technology Innovation initiated the Nuclear Energy (NE)–Infrastructure Management Project by tasking the Nuclear Science User Facilities, formerly the Advanced Test Reactor National Scientific User Facility, to create a searchable and interactive database of all pertinent NE-supported and -related infrastructure. This database, known as the Nuclear Energy Infrastructure Database (NEID), is used for analyses to establish needs, redundancies, efficiencies, distributions, etc., to best understand the utility of NE’s infrastructure and inform the content of infrastructure calls. The Nuclear Science User Facilities developed the database by utilizing data and policy direction from a variety of reports from the U.S. Department of Energy, the National Research Council, the International Atomic Energy Agency, and various other federal and civilian resources. The NEID currently contains data on 802 research and development instruments housed in 377 facilities at 84 institutions in the United States and abroad. The effort to maintain and expand the database is ongoing. Detailed information on many facilities must be gathered from associated institutions and added to complete the database. The data must be validated and kept current to capture facility and instrumentation status as well as to cover new acquisitions and retirements. This document provides a short tutorial on the navigation of the NEID web portal at NSUF-Infrastructure.INL.gov.

  10. DREMECELS: A Curated Database for Base Excision and Mismatch Repair Mechanisms Associated Human Malignancies

    PubMed Central

    Shukla, Ankita; Singh, Tiratha Raj

    2016-01-01

    DNA repair mechanisms act as a warrior combating various damaging processes that ensue critical malignancies. DREMECELS was designed considering the malignancies with frequent alterations in DNA repair pathways, that is, colorectal and endometrial cancers, associated with Lynch syndrome (also known as HNPCC). Since lynch syndrome carries high risk (~40–60%) for both cancers, therefore we decided to cover all three diseases in this portal. Although a large population is presently affected by these malignancies, many resources are available for various cancer types but no database archives information on the genes specifically for only these cancers and disorders. The database contains 156 genes and two repair mechanisms, base excision repair (BER) and mismatch repair (MMR). Other parameters include some of the regulatory processes that have roles in these disease progressions due to incompetent repair mechanisms, specifically BER and MMR. However, our unique database mainly provides qualitative and quantitative information on these cancer types along with methylation, drug sensitivity, miRNAs, copy number variation (CNV) and somatic mutations data. This database would serve the scientific community by providing integrated information on these disease types, thus sustaining diagnostic and therapeutic processes. This repository would serve as an excellent accompaniment for researchers and biomedical professionals and facilitate in understanding such critical diseases. DREMECELS is publicly available at http://www.bioinfoindia.org/dremecels. PMID:27276067

  11. dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes

    PubMed Central

    Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P. S.

    2016-01-01

    We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem. PMID:26777304

  12. BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models

    PubMed Central

    2010-01-01

    Background Quantitative models of biochemical and cellular systems are used to answer a variety of questions in the biological sciences. The number of published quantitative models is growing steadily thanks to increasing interest in the use of models as well as the development of improved software systems and the availability of better, cheaper computer hardware. To maximise the benefits of this growing body of models, the field needs centralised model repositories that will encourage, facilitate and promote model dissemination and reuse. Ideally, the models stored in these repositories should be extensively tested and encoded in community-supported and standardised formats. In addition, the models and their components should be cross-referenced with other resources in order to allow their unambiguous identification. Description BioModels Database http://www.ebi.ac.uk/biomodels/ is aimed at addressing exactly these needs. It is a freely-accessible online resource for storing, viewing, retrieving, and analysing published, peer-reviewed quantitative models of biochemical and cellular systems. The structure and behaviour of each simulation model distributed by BioModels Database are thoroughly checked; in addition, model elements are annotated with terms from controlled vocabularies as well as linked to relevant data resources. Models can be examined online or downloaded in various formats. Reaction network diagrams generated from the models are also available in several formats. BioModels Database also provides features such as online simulation and the extraction of components from large scale models into smaller submodels. Finally, the system provides a range of web services that external software systems can use to access up-to-date data from the database. Conclusions BioModels Database has become a recognised reference resource for systems biology. It is being used by the community in a variety of ways; for example, it is used to benchmark different simulation

  13. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database.

    PubMed

    Khoury, George A; Baliban, Richard C; Floudas, Christodoulos A

    2011-09-13

    Post-translational modifications (PTMs) broadly contribute to the recent explosion of proteomic data and possess a complexity surpassing that of protein design. PTMs are the chemical modification of a protein after its translation, and have wide effects broadening its range of functionality. Based on previous estimates, it is widely believed that more than half of proteins are glycoproteins. Whereas mutations can only occur once per position, different forms of post-translational modifications may occur in tandem. With the number and abundances of modifications constantly being discovered, there is no method to readily assess their relative levels. Here we report the relative abundances of each PTM found experimentally and putatively, from high-quality, manually curated, proteome-wide data, and show that at best, less than one-fifth of proteins are glycosylated. We make available to the academic community a continuously updated resource (http://selene.princeton.edu/PTMCuration) containing the statistics so scientists can assess "how many" of each PTM exists. PMID:22034591

  14. TRENDS: A flight test relational database user's guide and reference manual

    NASA Technical Reports Server (NTRS)

    Bondi, M. J.; Bjorkman, W. S.; Cross, J. L.

    1994-01-01

    This report is designed to be a user's guide and reference manual for users intending to access rotocraft test data via TRENDS, the relational database system which was developed as a tool for the aeronautical engineer with no programming background. This report has been written to assist novice and experienced TRENDS users. TRENDS is a complete system for retrieving, searching, and analyzing both numerical and narrative data, and for displaying time history and statistical data in graphical and numerical formats. This manual provides a 'guided tour' and a 'user's guide' for the new and intermediate-skilled users. Examples for the use of each menu item within TRENDS is provided in the Menu Reference section of the manual, including full coverage for TIMEHIST, one of the key tools. This manual is written around the XV-15 Tilt Rotor database, but does include an appendix on the UH-60 Blackhawk database. This user's guide and reference manual establishes a referrable source for the research community and augments NASA TM-101025, TRENDS: The Aeronautical Post-Test, Database Management System, Jan. 1990, written by the same authors.

  15. ODIN. Online Database Information Network: ODIN Policy & Procedure Manual.

    ERIC Educational Resources Information Center

    Townley, Charles T.; And Others

    Policies and procedures are outlined for the Online Database Information Network (ODIN), a cooperative of libraries in south-central Pennsylvania, which was organized to improve library services through technology. The first section covers organization and goals, members, and responsibilities of the administrative council and libraries. Patrons…

  16. The FOCUS Database: The Nation's Premier Resource in Dropout Prevention. Instruction Manual.

    ERIC Educational Resources Information Center

    National Dropout Prevention Center, Clemson, SC.

    This booklet is an instruction manual for those using the FOCUS database, an information source on dropout prevention of the National Dropout Prevention Center. An introduction lists the FOCUS files, which include Program Profiles, Calendar of Events, Resource Materials Library, Organizations, and Consultants and Speakers. Also given is the…

  17. The development of an Ada programming support environment database: SEAD (Software Engineering and Ada Database), user's manual

    NASA Technical Reports Server (NTRS)

    Liaw, Morris; Evesson, Donna

    1988-01-01

    This is a manual for users of the Software Engineering and Ada Database (SEAD). SEAD was developed to provide an information resource to NASA and NASA contractors with respect to Ada-based resources and activities that are available or underway either in NASA or elsewhere in the worldwide Ada community. The sharing of such information will reduce the duplication of effort while improving quality in the development of future software systems. The manual describes the organization of the data in SEAD, the user interface from logging in to logging out, and concludes with a ten chapter tutorial on how to use the information in SEAD. Two appendices provide quick reference for logging into SEAD and using the keyboard of an IBM 3270 or VT100 computer terminal.

  18. Development of an Ada programming support environment database SEAD (Software Engineering and Ada Database) administration manual

    NASA Technical Reports Server (NTRS)

    Liaw, Morris; Evesson, Donna

    1988-01-01

    Software Engineering and Ada Database (SEAD) was developed to provide an information resource to NASA and NASA contractors with respect to Ada-based resources and activities which are available or underway either in NASA or elsewhere in the worldwide Ada community. The sharing of such information will reduce duplication of effort while improving quality in the development of future software systems. SEAD data is organized into five major areas: information regarding education and training resources which are relevant to the life cycle of Ada-based software engineering projects such as those in the Space Station program; research publications relevant to NASA projects such as the Space Station Program and conferences relating to Ada technology; the latest progress reports on Ada projects completed or in progress both within NASA and throughout the free world; Ada compilers and other commercial products that support Ada software development; and reusable Ada components generated both within NASA and from elsewhere in the free world. This classified listing of reusable components shall include descriptions of tools, libraries, and other components of interest to NASA. Sources for the data include technical newletters and periodicals, conference proceedings, the Ada Information Clearinghouse, product vendors, and project sponsors and contractors.

  19. Genetic Variations and Diseases in UniProtKB/Swiss-Prot: The Ins and Outs of Expert Manual Curation

    PubMed Central

    Famiglietti, Maria Livia; Estreicher, Anne; Gos, Arnaud; Bolleman, Jerven; Géhant, Sébastien; Breuza, Lionel; Bridge, Alan; Poux, Sylvain; Redaschi, Nicole; Bougueleret, Lydie; Xenarios, Ioannis

    2014-01-01

    During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype. PMID:24848695

  20. PFR²: a curated database of planktonic foraminifera 18S ribosomal DNA as a resource for studies of plankton ecology, biogeography and evolution.

    PubMed

    Morard, Raphaël; Darling, Kate F; Mahé, Frédéric; Audic, Stéphane; Ujiié, Yurika; Weiner, Agnes K M; André, Aurore; Seears, Heidi A; Wade, Christopher M; Quillévéré, Frédéric; Douady, Christophe J; Escarguel, Gilles; de Garidel-Thoron, Thibault; Siccha, Michael; Kucera, Michal; de Vargas, Colomban

    2015-11-01

    Planktonic foraminifera (Rhizaria) are ubiquitous marine pelagic protists producing calcareous shells with conspicuous morphology. They play an important role in the marine carbon cycle, and their exceptional fossil record serves as the basis for biochronostratigraphy and past climate reconstructions. A major worldwide sampling effort over the last two decades has resulted in the establishment of multiple large collections of cryopreserved individual planktonic foraminifera samples. Thousands of 18S rDNA partial sequences have been generated, representing all major known morphological taxa across their worldwide oceanic range. This comprehensive data coverage provides an opportunity to assess patterns of molecular ecology and evolution in a holistic way for an entire group of planktonic protists. We combined all available published and unpublished genetic data to build PFR(2), the Planktonic foraminifera Ribosomal Reference database. The first version of the database includes 3322 reference 18S rDNA sequences belonging to 32 of the 47 known morphospecies of extant planktonic foraminifera, collected from 460 oceanic stations. All sequences have been rigorously taxonomically curated using a six-rank annotation system fully resolved to the morphological species level and linked to a series of metadata. The PFR(2) website, available at http://pfr2.sb-roscoff.fr, allows downloading the entire database or specific sections, as well as the identification of new planktonic foraminiferal sequences. Its novel, fully documented curation process integrates advances in morphological and molecular taxonomy. It allows for an increase in its taxonomic resolution and assures that integrity is maintained by including a complete contingency tracking of annotations and assuring that the annotations remain internally consistent. PMID:25828689

  1. Scaling drug indication curation through crowdsourcing

    PubMed Central

    Khare, Ritu; Burger, John D.; Aberdeen, John S.; Tresner-Kirsch, David W.; Corrales, Theodore J.; Hirchman, Lynette; Lu, Zhiyong

    2015-01-01

    Motivated by the high cost of human curation of biological databases, there is an increasing interest in using computational approaches to assist human curators and accelerate the manual curation process. Towards the goal of cataloging drug indications from FDA drug labels, we recently developed LabeledIn, a human-curated drug indication resource for 250 clinical drugs. Its development required over 40 h of human effort across 20 weeks, despite using well-defined annotation guidelines. In this study, we aim to investigate the feasibility of scaling drug indication annotation through a crowdsourcing technique where an unknown network of workers can be recruited through the technical environment of Amazon Mechanical Turk (MTurk). To translate the expert-curation task of cataloging indications into human intelligence tasks (HITs) suitable for the average workers on MTurk, we first simplify the complex task such that each HIT only involves a worker making a binary judgment of whether a highlighted disease, in context of a given drug label, is an indication. In addition, this study is novel in the crowdsourcing interface design where the annotation guidelines are encoded into user options. For evaluation, we assess the ability of our proposed method to achieve high-quality annotations in a time-efficient and cost-effective manner. We posted over 3000 HITs drawn from 706 drug labels on MTurk. Within 8 h of posting, we collected 18 775 judgments from 74 workers, and achieved an aggregated accuracy of 96% on 450 control HITs (where gold-standard answers are known), at a cost of $1.75 per drug label. On the basis of these results, we conclude that our crowdsourcing approach not only results in significant cost and time saving, but also leads to accuracy comparable to that of domain experts. Database URL: ftp://ftp.ncbi.nlm.nih.gov/pub/lu/LabeledIn/Crowdsourcing/. PMID:25797061

  2. Protein sequence databases.

    PubMed

    Apweiler, Rolf; Bairoch, Amos; Wu, Cathy H

    2004-02-01

    A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium. PMID:15036160

  3. The needs for chemistry standards, database tools and data curation at the chemical-biology interface (SLAS meeting)

    EPA Science Inventory

    This presentation will highlight known challenges with the production of high quality chemical databases and outline recent efforts made to address these challenges. Specific examples will be provided illustrating these challenges within the U.S. Environmental Protection Agency ...

  4. PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy.

    PubMed

    Decelle, Johan; Romac, Sarah; Stern, Rowena F; Bendif, El Mahdi; Zingone, Adriana; Audic, Stéphane; Guiry, Michael D; Guillou, Laure; Tessier, Désiré; Le Gall, Florence; Gourvil, Priscillia; Dos Santos, Adriana L; Probert, Ian; Vaulot, Daniel; de Vargas, Colomban; Christen, Richard

    2015-11-01

    Photosynthetic eukaryotes have a critical role as the main producers in most ecosystems of the biosphere. The ongoing environmental metabarcoding revolution opens the perspective for holistic ecosystems biological studies of these organisms, in particular the unicellular microalgae that often lack distinctive morphological characters and have complex life cycles. To interpret environmental sequences, metabarcoding necessarily relies on taxonomically curated databases containing reference sequences of the targeted gene (or barcode) from identified organisms. To date, no such reference framework exists for photosynthetic eukaryotes. In this study, we built the PhytoREF database that contains 6490 plastidial 16S rDNA reference sequences that originate from a large diversity of eukaryotes representing all known major photosynthetic lineages. We compiled 3333 amplicon sequences available from public databases and 879 sequences extracted from plastidial genomes, and generated 411 novel sequences from cultured marine microalgal strains belonging to different eukaryotic lineages. A total of 1867 environmental Sanger 16S rDNA sequences were also included in the database. Stringent quality filtering and a phylogeny-based taxonomic classification were applied for each 16S rDNA sequence. The database mainly focuses on marine microalgae, but sequences from land plants (representing half of the PhytoREF sequences) and freshwater taxa were also included to broaden the applicability of PhytoREF to different aquatic and terrestrial habitats. PhytoREF, accessible via a web interface (http://phytoref.fr), is a new resource in molecular ecology to foster the discovery, assessment and monitoring of the diversity of photosynthetic eukaryotes using high-throughput sequencing. PMID:25740460

  5. Egas: a collaborative and interactive document curation platform.

    PubMed

    Campos, David; Lourenço, Jóni; Matos, Sérgio; Oliveira, José Luís

    2014-01-01

    With the overwhelming amount of biomedical textual information being produced, several manual curation efforts have been set up to extract and store concepts and their relationships into structured resources. As manual annotation is a demanding and expensive task, computerized solutions were developed to perform such tasks automatically. However, high-end information extraction techniques are still not widely used by biomedical research communities, mainly because of the lack of standards and limitations in usability. Interactive annotation tools intend to fill this gap, taking advantage of automatic techniques and existing knowledge bases to assist expert curators in their daily tasks. This article presents Egas, a web-based platform for biomedical text mining and assisted curation with highly usable interfaces for manual and automatic in-line annotation of concepts and relations. A comprehensive set of de facto standard knowledge bases are integrated and indexed to provide straightforward concept normalization features. Real-time collaboration and conversation functionalities allow discussing details of the annotation task as well as providing instant feedback of curator's interactions. Egas also provides interfaces for on-demand management of the annotation task settings and guidelines, and supports standard formats and literature services to import and export documents. By taking advantage of Egas, we participated in the BioCreative IV interactive annotation task, targeting the assisted identification of protein-protein interactions described in PubMed abstracts related to neuropathological disorders. When evaluated by expert curators, it obtained positive scores in terms of usability, reliability and performance. These results, together with the provided innovative features, place Egas as a state-of-the-art solution for fast and accurate curation of information, facilitating the task of creating and updating knowledge bases and annotated resources. Database

  6. Data Curation

    ERIC Educational Resources Information Center

    Mallon, Melissa, Ed.

    2012-01-01

    In their Top Trends of 2012, the Association of College and Research Libraries (ACRL) named data curation as one of the issues to watch in academic libraries in the near future (ACRL, 2012, p. 312). Data curation can be summarized as "the active and ongoing management of data through its life cycle of interest and usefulness to scholarship,…

  7. Tox-Database.net: a curated resource for data describing chemical triggered in vitro cardiac ion channels inhibition

    PubMed Central

    2012-01-01

    Background Drugs safety issues are now recognized as being factors generating the most reasons for drug withdrawals at various levels of development and at the post-approval stage. Among them cardiotoxicity remains the main reason, despite the substantial effort put into in vitro and in vivo testing, with the main focus put on hERG channel inhibition as the hypothesized surrogate of drug proarrhythmic potency. The large interest in the IKr current has resulted in the development of predictive tools and informative databases describing a drug's susceptibility to interactions with the hERG channel, although there are no similar, publicly available sets of data describing other ionic currents driven by the human cardiomyocyte ionic channels, which are recognized as an overlooked drug safety target. Discussion The aim of this database development and publication was to provide a scientifically useful, easily usable and clearly verifiable set of information describing not only IKr (hERG), but also other human cardiomyocyte specific ionic channels inhibition data (IKs, INa, ICa). Summary The broad range of data (chemical space and in vitro settings) and the easy to use user interface makes tox-database.net a useful tool for interested scientists. Database URL http://tox-database.net. PMID:22947121

  8. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

    PubMed

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  9. A Comparison of Computer-Based Bibliographic Database Searching vs. Manual Bibliographic Searching. Final CARE Grant Report #5964.

    ERIC Educational Resources Information Center

    Pritchard, Eileen E.; Rockman, Ilene F.

    The purpose of this study was to improve upon previous research investigations by analyzing the elements of cost effectiveness, precision, recall, citation overlap, and hours searching in a comparison between computerized database searching and manual searching, using the setting of an academic library environment with a diverse group of students…

  10. Scaling drug indication curation through crowdsourcing.

    PubMed

    Khare, Ritu; Burger, John D; Aberdeen, John S; Tresner-Kirsch, David W; Corrales, Theodore J; Hirchman, Lynette; Lu, Zhiyong

    2015-01-01

    Motivated by the high cost of human curation of biological databases, there is an increasing interest in using computational approaches to assist human curators and accelerate the manual curation process. Towards the goal of cataloging drug indications from FDA drug labels, we recently developed LabeledIn, a human-curated drug indication resource for 250 clinical drugs. Its development required over 40 h of human effort across 20 weeks, despite using well-defined annotation guidelines. In this study, we aim to investigate the feasibility of scaling drug indication annotation through a crowdsourcing technique where an unknown network of workers can be recruited through the technical environment of Amazon Mechanical Turk (MTurk). To translate the expert-curation task of cataloging indications into human intelligence tasks (HITs) suitable for the average workers on MTurk, we first simplify the complex task such that each HIT only involves a worker making a binary judgment of whether a highlighted disease, in context of a given drug label, is an indication. In addition, this study is novel in the crowdsourcing interface design where the annotation guidelines are encoded into user options. For evaluation, we assess the ability of our proposed method to achieve high-quality annotations in a time-efficient and cost-effective manner. We posted over 3000 HITs drawn from 706 drug labels on MTurk. Within 8 h of posting, we collected 18 775 judgments from 74 workers, and achieved an aggregated accuracy of 96% on 450 control HITs (where gold-standard answers are known), at a cost of $1.75 per drug label. On the basis of these results, we conclude that our crowdsourcing approach not only results in significant cost and time saving, but also leads to accuracy comparable to that of domain experts. PMID:25797061

  11. miRGate: a curated database of human, mouse and rat miRNA–mRNA targets

    PubMed Central

    Andrés-León, Eduardo; González Peña, Daniel; Gómez-López, Gonzalo; Pisano, David G.

    2015-01-01

    MicroRNAs (miRNAs) are small non-coding elements involved in the post-transcriptional down-regulation of gene expression through base pairing with messenger RNAs (mRNAs). Through this mechanism, several miRNA–mRNA pairs have been described as critical in the regulation of multiple cellular processes, including early embryonic development and pathological conditions. Many of these pairs (such as miR-15 b/BCL2 in apoptosis or BART-6/BCL6 in diffuse large B-cell lymphomas) were experimentally discovered and/or computationally predicted. Available tools for target prediction are usually based on sequence matching, thermodynamics and conservation, among other approaches. Nevertheless, the main issue on miRNA–mRNA pair prediction is the little overlapping results among different prediction methods, or even with experimentally validated pairs lists, despite the fact that all rely on similar principles. To circumvent this problem, we have developed miRGate, a database containing novel computational predicted miRNA–mRNA pairs that are calculated using well-established algorithms. In addition, it includes an updated and complete dataset of sequences for both miRNA and mRNAs 3′-Untranslated region from human (including human viruses), mouse and rat, as well as experimentally validated data from four well-known databases. The underlying methodology of miRGate has been successfully applied to independent datasets providing predictions that were convincingly validated by functional assays. miRGate is an open resource available at http://mirgate.bioinfo.cnio.es. For programmatic access, we have provided a representational state transfer web service application programming interface that allows accessing the database at http://mirgate.bioinfo.cnio.es/API/ Database URL: http://mirgate.bioinfo.cnio.es PMID:25858286

  12. How much does curation cost?

    PubMed Central

    2016-01-01

    NIH administrators have recently expressed concerns about the cost of curation for biological databases. However, they did not articulate the exact costs of curation. Here we calculate the cost of biocuration of articles for the EcoCyc database as $219 per article over a 5-year period. That cost is 6–15% of the cost of open-access publication fees for publishing biomedical articles, and we estimate that cost is 0.088% of the cost of the overall research project that generated the experimental results. Thus, curation costs are small in an absolute sense, and represent a miniscule fraction of the cost of the research. PMID:27504008

  13. 76 FR 30997 - National Transit Database: Amendments to Urbanized Area Annual Reporting Manual

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-27

    ... Federal Register (73 FR 7361) inviting comments on proposed amendments to the 2011 Annual Manual. This... Federal Register (75 FR 192) inviting comments on proposed amendments to the 2011 Annual Manual. FTA... Manual AGENCY: Federal Transit Administration (FTA), DOT. ACTION: Notice of Amendments to 2011...

  14. A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework.

    PubMed

    Bandrowski, A E; Cachat, J; Li, Y; Müller, H M; Sternberg, P W; Ciccarese, P; Clark, T; Marenco, L; Wang, R; Astakhov, V; Grethe, J S; Martone, M E

    2012-01-01

    The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is 'hidden' from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to

  15. Egas: a collaborative and interactive document curation platform

    PubMed Central

    Campos, David; Lourenço, Jóni; Matos, Sérgio; Oliveira, José Luís

    2014-01-01

    With the overwhelming amount of biomedical textual information being produced, several manual curation efforts have been set up to extract and store concepts and their relationships into structured resources. As manual annotation is a demanding and expensive task, computerized solutions were developed to perform such tasks automatically. However, high-end information extraction techniques are still not widely used by biomedical research communities, mainly because of the lack of standards and limitations in usability. Interactive annotation tools intend to fill this gap, taking advantage of automatic techniques and existing knowledge bases to assist expert curators in their daily tasks. This article presents Egas, a web-based platform for biomedical text mining and assisted curation with highly usable interfaces for manual and automatic in-line annotation of concepts and relations. A comprehensive set of de facto standard knowledge bases are integrated and indexed to provide straightforward concept normalization features. Real-time collaboration and conversation functionalities allow discussing details of the annotation task as well as providing instant feedback of curator’s interactions. Egas also provides interfaces for on-demand management of the annotation task settings and guidelines, and supports standard formats and literature services to import and export documents. By taking advantage of Egas, we participated in the BioCreative IV interactive annotation task, targeting the assisted identification of protein–protein interactions described in PubMed abstracts related to neuropathological disorders. When evaluated by expert curators, it obtained positive scores in terms of usability, reliability and performance. These results, together with the provided innovative features, place Egas as a state-of-the-art solution for fast and accurate curation of information, facilitating the task of creating and updating knowledge bases and annotated resources

  16. Solid waste projection model: Database version 1. 0 technical reference manual

    SciTech Connect

    Carr, F.; Bowman, A.

    1990-11-01

    The Solid Waste Projection Model (SWPM) system is an analytical tool developed by Pacific Northwest Laboratory (PNL) for Westinghouse Hanford Company (WHC). The SWPM system provides a modeling and analysis environment that supports decisions in the process of evaluating various solid waste management alternatives. This document, one of a series describing the SWPM system, contains detailed information regarding the software and data structures utilized in developing the SWPM Version 1.0 Database. This document is intended for use by experienced database specialists and supports database maintenance, utility development, and database enhancement. Those interested in using the SWPM database should refer to the SWPM Database User's Guide. 14 figs., 6 tabs.

  17. Curation of characterized glycoside hydrolases of fungal origin.

    PubMed

    Murphy, Caitlin; Powlowski, Justin; Wu, Min; Butler, Greg; Tsang, Adrian

    2011-01-01

    Fungi produce a wide range of extracellular enzymes to break down plant cell walls, which are composed mainly of cellulose, lignin and hemicellulose. Among them are the glycoside hydrolases (GH), the largest and most diverse family of enzymes active on these substrates. To facilitate research and development of enzymes for the conversion of cell-wall polysaccharides into fermentable sugars, we have manually curated a comprehensive set of characterized fungal glycoside hydrolases. Characterized glycoside hydrolases were retrieved from protein and enzyme databases, as well as literature repositories. A total of 453 characterized glycoside hydrolases have been cataloged. They come from 131 different fungal species, most of which belong to the phylum Ascomycota. These enzymes represent 46 different GH activities and cover 44 of the 115 CAZy GH families. In addition to enzyme source and enzyme family, available biochemical properties such as temperature and pH optima, specific activity, kinetic parameters and substrate specificities were recorded. To simplify comparative studies, enzyme and species abbreviations have been standardized, Gene Ontology terms assigned and reference to supporting evidence provided. The annotated genes have been organized in a searchable, online database called mycoCLAP (Characterized Lignocellulose-Active Proteins of fungal origin). It is anticipated that this manually curated collection of biochemically characterized fungal proteins will be used to enhance functional annotation of novel GH genes. Database URL: http://mycoCLAP.fungalgenomics.ca/. PMID:21622642

  18. Curation of characterized glycoside hydrolases of Fungal origin

    PubMed Central

    Murphy, Caitlin; Powlowski, Justin; Wu, Min; Butler, Greg; Tsang, Adrian

    2011-01-01

    Fungi produce a wide range of extracellular enzymes to break down plant cell walls, which are composed mainly of cellulose, lignin and hemicellulose. Among them are the glycoside hydrolases (GH), the largest and most diverse family of enzymes active on these substrates. To facilitate research and development of enzymes for the conversion of cell-wall polysaccharides into fermentable sugars, we have manually curated a comprehensive set of characterized fungal glycoside hydrolases. Characterized glycoside hydrolases were retrieved from protein and enzyme databases, as well as literature repositories. A total of 453 characterized glycoside hydrolases have been cataloged. They come from 131 different fungal species, most of which belong to the phylum Ascomycota. These enzymes represent 46 different GH activities and cover 44 of the 115 CAZy GH families. In addition to enzyme source and enzyme family, available biochemical properties such as temperature and pH optima, specific activity, kinetic parameters and substrate specificities were recorded. To simplify comparative studies, enzyme and species abbreviations have been standardized, Gene Ontology terms assigned and reference to supporting evidence provided. The annotated genes have been organized in a searchable, online database called mycoCLAP (Characterized Lignocellulose-Active Proteins of fungal origin). It is anticipated that this manually curated collection of biochemically characterized fungal proteins will be used to enhance functional annotation of novel GH genes. Database URL: http://mycoCLAP.fungalgenomics.ca/ PMID:21622642

  19. Tracking and coordinating an international curation effort for the CCDS Project

    PubMed Central

    Harte, Rachel A.; Farrell, Catherine M.; Loveland, Jane E.; Suner, Marie-Marthe; Wilming, Laurens; Aken, Bronwen; Barrell, Daniel; Frankish, Adam; Wallin, Craig; Searle, Steve; Diekhans, Mark; Harrow, Jennifer; Pruitt, Kim D.

    2012-01-01

    The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a ‘gold standard’ definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines. Database URL: http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi PMID:22434842

  20. Canto: an online tool for community literature curation

    PubMed Central

    Rutherford, Kim M.; Harris, Midori A.; Lock, Antonia; Oliver, Stephen G.; Wood, Valerie

    2014-01-01

    Motivation: Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species. Availability: Canto code and documentation are available under an Open Source license from http://curation.pombase.org/. Canto is a component of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/). Contact: helpdesk@pombase.org PMID:24574118

  1. Solid Waste Projection Model: Database (Version 1.4). Technical reference manual

    SciTech Connect

    Blackburn, C.; Cillan, T.

    1993-09-01

    The Solid Waste Projection Model (SWPM) system is an analytical tool developed by Pacific Northwest Laboratory (PNL) for Westinghouse Hanford Company (WHC). The SWPM system provides a modeling and analysis environment that supports decisions in the process of evaluating various solid waste management alternatives. This document, one of a series describing the SWPM system, contains detailed information regarding the software and data structures utilized in developing the SWPM Version 1.4 Database. This document is intended for use by experienced database specialists and supports database maintenance, utility development, and database enhancement. Those interested in using the SWPM database should refer to the SWPM Database User`s Guide. This document is available from the PNL Task M Project Manager (D. L. Stiles, 509-372-4358), the PNL Task L Project Manager (L. L. Armacost, 509-372-4304), the WHC Restoration Projects Section Manager (509-372-1443), or the WHC Waste Characterization Manager (509-372-1193).

  2. NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases.

    PubMed

    Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin; Senger, Philipp

    2015-01-01

    Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article's supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in

  3. Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts

    PubMed Central

    Neves, Mariana; Damaschun, Alexander; Mah, Nancy; Lekschas, Fritz; Seltmann, Stefanie; Stachelscheid, Harald; Fontaine, Jean-Fred; Kurtz, Andreas; Leser, Ulf

    2013-01-01

    Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ∼50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org/ PMID:23599415

  4. SELCTV SYSTEM MANUAL FOR SELCTV AND REFER DATABASES AND THE SELCTV DATA MANAGEMENT PROGRAM

    EPA Science Inventory

    The SELCTV database is a compilation of the side effects of pesticides on arthropod predators and parasitoids that provide biological control of pest arthropods in the agricultural ecosystem. he primary source of side effects data is the published scientific literature; reference...

  5. A visual review of the interactome of LRRK2: Using deep-curated molecular interaction data to represent biology.

    PubMed

    Porras, Pablo; Duesbury, Margaret; Fabregat, Antonio; Ueffing, Marius; Orchard, Sandra; Gloeckner, Christian Johannes; Hermjakob, Henning

    2015-04-01

    Molecular interaction databases are essential resources that enable access to a wealth of information on associations between proteins and other biomolecules. Network graphs generated from these data provide an understanding of the relationships between different proteins in the cell, and network analysis has become a widespread tool supporting -omics analysis. Meaningfully representing this information remains far from trivial and different databases strive to provide users with detailed records capturing the experimental details behind each piece of interaction evidence. A targeted curation approach is necessary to transfer published data generated by primarily low-throughput techniques into interaction databases. In this review we present an example highlighting the value of both targeted curation and the subsequent effective visualization of detailed features of manually curated interaction information. We have curated interactions involving LRRK2, a protein of largely unknown function linked to familial forms of Parkinson's disease, and hosted the data in the IntAct database. This LRRK2-specific dataset was then used to produce different visualization examples highlighting different aspects of the data: the level of confidence in the interaction based on orthogonal evidence, those interactions found under close-to-native conditions, and the enzyme-substrate relationships in different in vitro enzymatic assays. Finally, pathway annotation taken from the Reactome database was overlaid on top of interaction networks to bring biological functional context to interaction maps. PMID:25648416

  6. A visual review of the interactome of LRRK2: Using deep-curated molecular interaction data to represent biology

    PubMed Central

    Porras, Pablo; Duesbury, Margaret; Fabregat, Antonio; Ueffing, Marius; Orchard, Sandra; Gloeckner, Christian Johannes; Hermjakob, Henning

    2015-01-01

    Molecular interaction databases are essential resources that enable access to a wealth of information on associations between proteins and other biomolecules. Network graphs generated from these data provide an understanding of the relationships between different proteins in the cell, and network analysis has become a widespread tool supporting –omics analysis. Meaningfully representing this information remains far from trivial and different databases strive to provide users with detailed records capturing the experimental details behind each piece of interaction evidence. A targeted curation approach is necessary to transfer published data generated by primarily low-throughput techniques into interaction databases. In this review we present an example highlighting the value of both targeted curation and the subsequent effective visualization of detailed features of manually curated interaction information. We have curated interactions involving LRRK2, a protein of largely unknown function linked to familial forms of Parkinson's disease, and hosted the data in the IntAct database. This LRRK2-specific dataset was then used to produce different visualization examples highlighting different aspects of the data: the level of confidence in the interaction based on orthogonal evidence, those interactions found under close-to-native conditions, and the enzyme–substrate relationships in different in vitro enzymatic assays. Finally, pathway annotation taken from the Reactome database was overlaid on top of interaction networks to bring biological functional context to interaction maps. PMID:25648416

  7. GSOSTATS Database: USAF Synchronous Satellite Catalog Data Conversion Software. User's Guide and Software Maintenance Manual, Version 2.1

    NASA Technical Reports Server (NTRS)

    Mallasch, Paul G.; Babic, Slavoljub

    1994-01-01

    The United States Air Force (USAF) provides NASA Lewis Research Center with monthly reports containing the Synchronous Satellite Catalog and the associated Two Line Mean Element Sets. The USAF Synchronous Satellite Catalog supplies satellite orbital parameters collected by an automated monitoring system and provided to Lewis Research Center as text files on magnetic tape. Software was developed to facilitate automated formatting, data normalization, cross-referencing, and error correction of Synchronous Satellite Catalog files before loading into the NASA Geosynchronous Satellite Orbital Statistics Database System (GSOSTATS). This document contains the User's Guide and Software Maintenance Manual with information necessary for installation, initialization, start-up, operation, error recovery, and termination of the software application. It also contains implementation details, modification aids, and software source code adaptations for use in future revisions.

  8. Cataloging the biomedical world of pain through semi-automated curation of molecular interactions.

    PubMed

    Jamieson, Daniel G; Roberts, Phoebe M; Robertson, David L; Sidders, Ben; Nenadic, Goran

    2013-01-01

    The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie 'pain', a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can

  9. Cataloging the biomedical world of pain through semi-automated curation of molecular interactions

    PubMed Central

    Jamieson, Daniel G.; Roberts, Phoebe M.; Robertson, David L.; Sidders, Ben; Nenadic, Goran

    2013-01-01

    The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie ‘pain’, a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can

  10. User manual for CSP{_}VANA: A check standards measurement and database program for microwave network analyzers

    SciTech Connect

    Duda, L.E.

    1997-10-01

    Vector network analyzers are a convenient way to measure scattering parameters of a variety of microwave devices. However, these instruments, unlike oscilloscopes for example, require a relatively high degree of user knowledge and expertise. Due to the complexity of the instrument and of the calibration process, there are many ways in which an incorrect measurement may be produced. The Microwave Project, which is part of SNL`s Primary Standards laboratory, routinely uses check standards to verify that the network analyzer is operating properly. In the past, these measurements were recorded manually and, sometimes, interpretation of the results was problematic. To aid the measurement assurance process, a software program was developed to automatically measure a check standard and compare the new measurements with a historical database of measurements of the same device. The program acquires new measurement data from selected check standards, plots the new data against the mean and standard deviation of prior data for the same check standard, and updates the database files for the check standard. The program is entirely menu-driven requiring little additional work by the user. This report describes the function of the software, including a discussion of its capabilities, and the way in which the software is used in the lab.

  11. Human Events Reference for ATHEANA (HERA) Database Description and Preliminary User's Manual

    SciTech Connect

    Auflick, J.L.

    1999-08-12

    The Technique for Human Error Analysis (ATHEANA) is a newly developed human reliability analysis (HRA) methodology that aims to facilitate better representation and integration of human performance into probabilistic risk assessment (PRA) modeling and quantification by analyzing risk-significant operating experience in the context of existing behavioral science models. The fundamental premise of ATHEANA is that error forcing contexts (EFCs), which refer to combinations of equipment/material conditions and performance shaping factors (PSFs), set up or create the conditions under which unsafe actions (UAs) can occur. Because ATHEANA relies heavily on the analysis of operational events that have already occurred as a mechanism for generating creative thinking about possible EFCs, a database (db) of analytical operational events, called the Human Events Reference for ATHEANA (HERA), has been developed to support the methodology. This report documents the initial development efforts for HERA.

  12. Human events reference for ATHEANA (HERA) database description and preliminary user`s manual

    SciTech Connect

    Auflick, J.L.; Hahn, H.A.; Pond, D.J.

    1998-05-27

    The Technique for Human Error Analysis (ATHEANA) is a newly developed human reliability analysis (HRA) methodology that aims to facilitate better representation and integration of human performance into probabilistic risk assessment (PRA) modeling and quantification by analyzing risk-significant operating experience in the context of existing behavioral science models. The fundamental premise of ATHEANA is that error-forcing contexts (EFCs), which refer to combinations of equipment/material conditions and performance shaping factors (PSFs), set up or create the conditions under which unsafe actions (UAs) can occur. Because ATHEANA relies heavily on the analysis of operational events that have already occurred as a mechanism for generating creative thinking about possible EFCs, a database, called the Human Events Reference for ATHEANA (HERA), has been developed to support the methodology. This report documents the initial development efforts for HERA.

  13. The Transporter Classification Database

    PubMed Central

    Saier, Milton H.; Reddy, Vamsee S.; Tamang, Dorjee G.; Västermark, Åke

    2014-01-01

    The Transporter Classification Database (TCDB; http://www.tcdb.org) serves as a common reference point for transport protein research. The database contains more than 10 000 non-redundant proteins that represent all currently recognized families of transmembrane molecular transport systems. Proteins in TCDB are organized in a five level hierarchical system, where the first two levels are the class and subclass, the second two are the family and subfamily, and the last one is the transport system. Superfamilies that contain multiple families are included as hyperlinks to the five tier TC hierarchy. TCDB includes proteins from all types of living organisms and is the only transporter classification system that is both universal and recognized by the International Union of Biochemistry and Molecular Biology. It has been expanded by manual curation, contains extensive text descriptions providing structural, functional, mechanistic and evolutionary information, is supported by unique software and is interconnected to many other relevant databases. TCDB is of increasing usefulness to the international scientific community and can serve as a model for the expansion of database technologies. This manuscript describes an update of the database descriptions previously featured in NAR database issues. PMID:24225317

  14. Implications of compositionality in the gene ontology for its curation and usage.

    PubMed

    Ogren, Philip V; Cohen, K Bretonnel; Hunter, Lawrence

    2005-01-01

    In this paper we argue that a richer underlying representational model for the Gene Ontology that captures the implicit compositional structure of GO terms could have a positive impact on two activities crucial to the success of GO: ontology curation and database annotation. We show that many of the new terms added to GO in a one-year span appear to be compositional variations of other terms. We found that 90.2% of the 3,652 new terms added between July 2003 and July 2004 exhibited characteristics of compositionality. We also examine annotations available from the GO Consortium website that are either manually curated or automatically generated. We found that 74.5% and 63.2% of GO terms are seldom, if ever, used in manual and automatic annotations, respectively. We show that there are features that tend to distinguish terms that are used from those that are not. In order to characterize the effect of compositionality on the combinatorial properties of GO, we employ finite state automata that represent sets of GO terms. This representational tool demonstrates how ontologies can grow very fast, and also shows that small conceptual changes can directly result in a large number of changes to the terminology. We argue that the curation and annotation findings we report are influenced by the combinatorial properties that present themselves in an ontology that does not have a model that properly captures the compositional structure of its terms. PMID:15759624

  15. Curation of Frozen Samples

    NASA Technical Reports Server (NTRS)

    Fletcher, L. A.; Allen, C. C.; Bastien, R.

    2008-01-01

    NASA's Johnson Space Center (JSC) and the Astromaterials Curator are charged by NPD 7100.10D with the curation of all of NASA s extraterrestrial samples, including those from future missions. This responsibility includes the development of new sample handling and preparation techniques; therefore, the Astromaterials Curator must begin developing procedures to preserve, prepare and ship samples at sub-freezing temperatures in order to enable future sample return missions. Such missions might include the return of future frozen samples from permanently-shadowed lunar craters, the nuclei of comets, the surface of Mars, etc. We are demonstrating the ability to curate samples under cold conditions by designing, installing and testing a cold curation glovebox. This glovebox will allow us to store, document, manipulate and subdivide frozen samples while quantifying and minimizing contamination throughout the curation process.

  16. NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases

    PubMed Central

    Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin

    2015-01-01

    Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article’s supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in

  17. The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community.

    PubMed

    Rhee, Seung Yon; Beavis, William; Berardini, Tanya Z; Chen, Guanghong; Dixon, David; Doyle, Aisling; Garcia-Hernandez, Margarita; Huala, Eva; Lander, Gabriel; Montoya, Mary; Miller, Neil; Mueller, Lukas A; Mundodi, Suparna; Reiser, Leonore; Tacklind, Julie; Weems, Dan C; Wu, Yihe; Xu, Iris; Yoo, Daniel; Yoon, Jungwon; Zhang, Peifen

    2003-01-01

    Arabidopsis thaliana is the most widely-studied plant today. The concerted efforts of over 11 000 researchers and 4000 organizations around the world are generating a rich diversity and quantity of information and materials. This information is made available through a comprehensive on-line resource called the Arabidopsis Information Resource (TAIR) (http://arabidopsis.org), which is accessible via commonly used web browsers and can be searched and downloaded in a number of ways. In the last two years, efforts have been focused on increasing data content and diversity, functionally annotating genes and gene products with controlled vocabularies, and improving data retrieval, analysis and visualization tools. New information include sequence polymorphisms including alleles, germplasms and phenotypes, Gene Ontology annotations, gene families, protein information, metabolic pathways, gene expression data from microarray experiments and seed and DNA stocks. New data visualization and analysis tools include SeqViewer, which interactively displays the genome from the whole chromosome down to 10 kb of nucleotide sequence and AraCyc, a metabolic pathway database and map tool that allows overlaying expression data onto the pathway diagrams. Finally, we have recently incorporated seed and DNA stock information from the Arabidopsis Biological Resource Center (ABRC) and implemented a shopping-cart style on-line ordering system. PMID:12519987

  18. Curating the Shelves

    ERIC Educational Resources Information Center

    Schiano, Deborah

    2013-01-01

    Curation: to gather, organize, and present resources in a way that meets information needs and interests, makes sense for virtual as well as physical resources. A Northern New Jersey middle school library made the decision to curate its physical resources according to the needs of its users, and, in so doing, created a shelving system that is,…

  19. ZFIN: enhancements and updates to the zebrafish model organism database

    PubMed Central

    Bradford, Yvonne; Conlin, Tom; Dunn, Nathan; Fashena, David; Frazer, Ken; Howe, Douglas G.; Knight, Jonathan; Mani, Prita; Martin, Ryan; Moxon, Sierra A. T.; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruef, Barbara J.; Ruzicka, Leyla; Bauer Schaper, Holle; Schaper, Kevin; Shao, Xiang; Singer, Amy; Sprague, Judy; Sprunger, Brock; Van Slyke, Ceri; Westerfield, Monte

    2011-01-01

    ZFIN, the Zebrafish Model Organism Database, http://zfin.org, serves as the central repository and web-based resource for zebrafish genetic, genomic, phenotypic and developmental data. ZFIN manually curates comprehensive data for zebrafish genes, phenotypes, genotypes, gene expression, antibodies, anatomical structures and publications. A wide-ranging collection of web-based search forms and tools facilitates access to integrated views of these data promoting analysis and scientific discovery. Data represented in ZFIN are derived from three primary sources: curation of zebrafish publications, individual research laboratories and collaborations with bioinformatics organizations. Data formats include text, images and graphical representations. ZFIN is a dynamic resource with data added daily as part of our ongoing curation process. Software updates are frequent. Here, we describe recent additions to ZFIN including (i) enhanced access to images, (ii) genomic features, (iii) genome browser, (iv) transcripts, (v) antibodies and (vi) a community wiki for protocols and antibodies. PMID:21036866

  20. DIP: The Database of Interacting Proteins

    DOE Data Explorer

    The DIP Database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. By interaction, the DIP Database creators mean that two amino acid chains were experimentally identified to bind to each other. The database lists such pairs to aid those studying a particular protein-protein interaction but also those investigating entire regulatory and signaling pathways as well as those studying the organisation and complexity of the protein interaction network at the cellular level. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data. It is a relational database that can be searched by protein, sequence, motif, article information, and pathBLAST. The website also serves as an access point to a number of projects related to DIP, such as LiveDIP, The Database of Ligand-Receptor Partners (DLRP) and JDIP. Users have free and open access to DIP after login. [Taken from the DIP Guide and the DIP website] (Specialized Interface) (Registration Required)

  1. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

    PubMed Central

    O'Leary, Nuala A.; Wright, Mathew W.; Brister, J. Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M.; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S.; Kodali, Vamsi K.; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M.; Murphy, Michael R.; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H.; Rausch, Daniel; Riddick, Lillian D.; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S.; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E.; Vatsan, Anjana R.; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J.; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D.; Pruitt, Kim D.

    2016-01-01

    The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. PMID:26553804

  2. Mining clinical attributes of genomic variants through assisted literature curation in Egas.

    PubMed

    Matos, Sérgio; Campos, David; Pinho, Renato; Silva, Raquel M; Mort, Matthew; Cooper, David N; Oliveira, José Luís

    2016-01-01

    The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/Database URL: https://demo.bmd-software.com/egas/. PMID:27278817

  3. Mining clinical attributes of genomic variants through assisted literature curation in Egas

    PubMed Central

    Matos, Sérgio; Campos, David; Pinho, Renato; Silva, Raquel M.; Mort, Matthew; Cooper, David N.; Oliveira, José Luís

    2016-01-01

    The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/. Database URL: https://demo.bmd-software.com/egas/ PMID:27278817

  4. Rice Annotation Database (RAD): a contig-oriented database for map-based rice genomics.

    PubMed

    Ito, Yuichi; Arikawa, Kohji; Antonio, Baltazar A; Ohta, Isamu; Naito, Shinji; Mukai, Yoshiyuki; Shimano, Atsuko; Masukawa, Masatoshi; Shibata, Michie; Yamamoto, Mayu; Ito, Yukiyo; Yokoyama, Junri; Sakai, Yasumichi; Sakata, Katsumi; Nagamura, Yoshiaki; Namiki, Nobukazu; Matsumoto, Takashi; Higo, Kenichi; Sasaki, Takuji

    2005-01-01

    A contig-oriented database for annotation of the rice genome has been constructed to facilitate map-based rice genomics. The Rice Annotation Database has the following functional features: (i) extensive effort of manual annotations of P1-derived artificial chromosome/bacterial artificial chromosome clones can be merged at chromosome and contig-level; (ii) concise visualization of the annotation information such as the predicted genes, results of various prediction programs (RiceHMM, Genscan, Genscan+, Fgenesh, GeneMark, etc.), homology to expressed sequence tag, full-length cDNA and protein; (iii) user-friendly clone / gene query system; (iv) download functions for nucleotide, amino acid and coding sequences; (v) analysis of various features of the genome (GC-content, average value, etc.); and (vi) genome-wide homology search (BLAST) of contig- and chromosome-level genome sequence to allow comparative analysis with the genome sequence of other organisms. As of October 2004, the database contains a total of 215 Mb sequence with relevant annotation results including 30 000 manually curated genes. The database can provide the latest information on manual annotation as well as a comprehensive structural analysis of various features of the rice genome. The database can be accessed at http://rad.dna.affrc.go.jp/. PMID:15608281

  5. HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    PubMed

    Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D

    2016-01-01

    The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease. PMID:26578596

  6. MGDB: a comprehensive database of genes involved in melanoma

    PubMed Central

    Zhang, Di; Zhu, Rongrong; Zhang, Hanqian; Zheng, Chun-Hou; Xia, Junfeng

    2015-01-01

    The Melanoma Gene Database (MGDB) is a manually curated catalog of molecular genetic data relating to genes involved in melanoma. The main purpose of this database is to establish a network of melanoma related genes and to facilitate the mechanistic study of melanoma tumorigenesis. The entries describing the relationships between melanoma and genes in the current release were manually extracted from PubMed abstracts, which contains cumulative to date 527 human melanoma genes (422 protein-coding and 105 non-coding genes). Each melanoma gene was annotated in seven different aspects (General Information, Expression, Methylation, Mutation, Interaction, Pathway and Drug). In addition, manually curated literature references have also been provided to support the inclusion of the gene in MGDB and establish its association with melanoma. MGDB has a user-friendly web interface with multiple browse and search functions. We hoped MGDB will enrich our knowledge about melanoma genetics and serve as a useful complement to the existing public resources. Database URL: http://bioinfo.ahu.edu.cn:8080/Melanoma/index.jsp PMID:26424083

  7. MGDB: a comprehensive database of genes involved in melanoma.

    PubMed

    Zhang, Di; Zhu, Rongrong; Zhang, Hanqian; Zheng, Chun-Hou; Xia, Junfeng

    2015-01-01

    The Melanoma Gene Database (MGDB) is a manually curated catalog of molecular genetic data relating to genes involved in melanoma. The main purpose of this database is to establish a network of melanoma related genes and to facilitate the mechanistic study of melanoma tumorigenesis. The entries describing the relationships between melanoma and genes in the current release were manually extracted from PubMed abstracts, which contains cumulative to date 527 human melanoma genes (422 protein-coding and 105 non-coding genes). Each melanoma gene was annotated in seven different aspects (General Information, Expression, Methylation, Mutation, Interaction, Pathway and Drug). In addition, manually curated literature references have also been provided to support the inclusion of the gene in MGDB and establish its association with melanoma. MGDB has a user-friendly web interface with multiple browse and search functions. We hoped MGDB will enrich our knowledge about melanoma genetics and serve as a useful complement to the existing public resources. Database URL: http://bioinfo.ahu.edu.cn:8080/Melanoma/index.jsp. PMID:26424083

  8. Southern African Treatment Resistance Network (SATuRN) RegaDB HIV drug resistance and clinical management database: supporting patient management, surveillance and research in southern Africa.

    PubMed

    Manasa, Justen; Lessells, Richard; Rossouw, Theresa; Naidu, Kevindra; Van Vuuren, Cloete; Goedhals, Dominique; van Zyl, Gert; Bester, Armand; Skingsley, Andrew; Stott, Katharine; Danaviah, Siva; Chetty, Terusha; Singh, Lavanya; Moodley, Pravi; Iwuji, Collins; McGrath, Nuala; Seebregts, Christopher J; de Oliveira, Tulio

    2014-01-01

    Substantial amounts of data have been generated from patient management and academic exercises designed to better understand the human immunodeficiency virus (HIV) epidemic and design interventions to control it. A number of specialized databases have been designed to manage huge data sets from HIV cohort, vaccine, host genomic and drug resistance studies. Besides databases from cohort studies, most of the online databases contain limited curated data and are thus sequence repositories. HIV drug resistance has been shown to have a great potential to derail the progress made thus far through antiretroviral therapy. Thus, a lot of resources have been invested in generating drug resistance data for patient management and surveillance purposes. Unfortunately, most of the data currently available relate to subtype B even though >60% of the epidemic is caused by HIV-1 subtype C. A consortium of clinicians, scientists, public health experts and policy markers working in southern Africa came together and formed a network, the Southern African Treatment and Resistance Network (SATuRN), with the aim of increasing curated HIV-1 subtype C and tuberculosis drug resistance data. This article describes the HIV-1 data curation process using the SATuRN Rega database. The data curation is a manual and time-consuming process done by clinical, laboratory and data curation specialists. Access to the highly curated data sets is through applications that are reviewed by the SATuRN executive committee. Examples of research outputs from the analysis of the curated data include trends in the level of transmitted drug resistance in South Africa, analysis of the levels of acquired resistance among patients failing therapy and factors associated with the absence of genotypic evidence of drug resistance among patients failing therapy. All these studies have been important for informing first- and second-line therapy. This database is a free password-protected open source database available on

  9. Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12

    PubMed Central

    Gama-Castro, Socorro; López-Fuentes, Alejandra; Balderas-Martínez, Yalbi Itzel; Clematide, Simon; Ellendorff, Tilia Renate; Santos-Zavaleta, Alberto; Marques-Madeira, Hernani; Collado-Vides, Julio

    2014-01-01

    Given the current explosion of data within original publications generated in the field of genomics, a recognized bottleneck is the transfer of such knowledge into comprehensive databases. We have for years organized knowledge on transcriptional regulation reported in the original literature of Escherichia coli K-12 into RegulonDB (http://regulondb.ccg.unam.mx), our database that is currently supported by >5000 papers. Here, we report a first step towards the automatic biocuration of growth conditions in this corpus. Using the OntoGene text-mining system (http://www.ontogene.org), we extracted and manually validated regulatory interactions and growth conditions in a new approach based on filters that enable the curator to select informative sentences from preprocessed full papers. Based on a set of 48 papers dealing with oxidative stress by OxyR, we were able to retrieve 100% of the OxyR regulatory interactions present in RegulonDB, including the transcription factors and their effect on target genes. Our strategy was designed to extract, as we did, their growth conditions. This result provides a proof of concept for a more direct and efficient curation process, and enables us to define the strategy of the subsequent steps to be implemented for a semi-automatic curation of original literature dealing with regulation of gene expression in bacteria. This project will enhance the efficiency and quality of the curation of knowledge present in the literature of gene regulation, and contribute to a significant increase in the encoding of the regulatory network of E. coli. RegulonDB Database URL: http://regulondb.ccg.unam.mx OntoGene URL: http://www.ontogene.org PMID:24903516

  10. Qrator: A web-based curation tool for glycan structures

    PubMed Central

    Eavenson, Matthew; Kochut, Krys J; Miller, John A; Ranzinger, René; Tiemeyer, Michael; Aoki, Kazuhiro; York, William S

    2015-01-01

    Most currently available glycan structure databases use their own proprietary structure representation schema and contain numerous annotation errors. These cause problems when glycan databases are used for the annotation or mining of data generated in the laboratory. Due to the complexity of glycan structures, curating these databases is often a tedious and labor-intensive process. However, rigorously validating glycan structures can be made easier with a curation workflow that incorporates a structure-matching algorithm that compares candidate glycans to a canonical tree that embodies structural features consistent with established mechanisms for the biosynthesis of a particular class of glycans. To this end, we have implemented Qrator, a web-based application that uses a combination of external literature and database references, user annotations and canonical trees to assist and guide researchers in making informed decisions while curating glycans. Using this application, we have started the curation of large numbers of N-glycans, O-glycans and glycosphingolipids. Our curation workflow allows creating and extending canonical trees for these classes of glycans, which have subsequently been used to improve the curation workflow. PMID:25165068

  11. Qrator: a web-based curation tool for glycan structures.

    PubMed

    Eavenson, Matthew; Kochut, Krys J; Miller, John A; Ranzinger, René; Tiemeyer, Michael; Aoki, Kazuhiro; York, William S

    2015-01-01

    Most currently available glycan structure databases use their own proprietary structure representation schema and contain numerous annotation errors. These cause problems when glycan databases are used for the annotation or mining of data generated in the laboratory. Due to the complexity of glycan structures, curating these databases is often a tedious and labor-intensive process. However, rigorously validating glycan structures can be made easier with a curation workflow that incorporates a structure-matching algorithm that compares candidate glycans to a canonical tree that embodies structural features consistent with established mechanisms for the biosynthesis of a particular class of glycans. To this end, we have implemented Qrator, a web-based application that uses a combination of external literature and database references, user annotations and canonical trees to assist and guide researchers in making informed decisions while curating glycans. Using this application, we have started the curation of large numbers of N-glycans, O-glycans and glycosphingolipids. Our curation workflow allows creating and extending canonical trees for these classes of glycans, which have subsequently been used to improve the curation workflow. PMID:25165068

  12. The Immune Epitope Database 2.0

    PubMed Central

    Vita, Randi; Zarebski, Laura; Greenbaum, Jason A.; Emami, Hussein; Hoof, Ilka; Salimi, Nima; Damle, Rohini; Sette, Alessandro; Peters, Bjoern

    2010-01-01

    The Immune Epitope Database (IEDB, www.iedb.org) provides a catalog of experimentally characterized B and T cell epitopes, as well as data on Major Histocompatibility Complex (MHC) binding and MHC ligand elution experiments. The database represents the molecular structures recognized by adaptive immune receptors and the experimental contexts in which these molecules were determined to be immune epitopes. Epitopes recognized in humans, nonhuman primates, rodents, pigs, cats and all other tested species are included. Both positive and negative experimental results are captured. Over the course of 4 years, the data from 180 978 experiments were curated manually from the literature, which covers ∼99% of all publicly available information on peptide epitopes mapped in infectious agents (excluding HIV) and 93% of those mapped in allergens. In addition, data that would otherwise be unavailable to the public from 129 186 experiments were submitted directly by investigators. The curation of epitopes related to autoimmunity is expected to be completed by the end of 2010. The database can be queried by epitope structure, source organism, MHC restriction, assay type or host organism, among other criteria. The database structure, as well as its querying, browsing and reporting interfaces, was completely redesigned for the IEDB 2.0 release, which became publicly available in early 2009. PMID:19906713

  13. The BioGRID interaction database: 2013 update.

    PubMed

    Chatr-Aryamontri, Andrew; Breitkreutz, Bobby-Joe; Heinicke, Sven; Boucher, Lorrie; Winter, Andrew; Stark, Chris; Nixon, Julie; Ramage, Lindsay; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Breitkreutz, Ashton; Sellam, Adnane; Chen, Daici; Chang, Christie; Rust, Jennifer; Livstone, Michael; Oughtred, Rose; Dolinski, Kara; Tyers, Mike

    2013-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: http//thebiogrid.org) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species. As of September 2012, BioGRID houses more than 500 000 manually annotated interactions from more than 30 model organisms. BioGRID maintains complete curation coverage of the literature for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe and the model plant Arabidopsis thaliana. A number of themed curation projects in areas of biomedical importance are also supported. BioGRID has established collaborations and/or shares data records for the annotation of interactions and phenotypes with most major model organism databases, including Saccharomyces Genome Database, PomBase, WormBase, FlyBase and The Arabidopsis Information Resource. BioGRID also actively engages with the text-mining community to benchmark and deploy automated tools to expedite curation workflows. BioGRID data are freely accessible through both a user-defined interactive interface and in batch downloads in a wide variety of formats, including PSI-MI2.5 and tab-delimited files. BioGRID records can also be interrogated and analyzed with a series of new bioinformatics tools, which include a post-translational modification viewer, a graphical viewer, a REST service and a Cytoscape plugin. PMID:23203989

  14. Reactome: a database of reactions, pathways and biological processes.

    PubMed

    Croft, David; O'Kelly, Gavin; Wu, Guanming; Haw, Robin; Gillespie, Marc; Matthews, Lisa; Caudy, Michael; Garapati, Phani; Gopinath, Gopal; Jassal, Bijay; Jupe, Steven; Kalatskaya, Irina; Mahajan, Shahana; May, Bruce; Ndegwa, Nelson; Schmidt, Esther; Shamovsky, Veronica; Yung, Christina; Birney, Ewan; Hermjakob, Henning; D'Eustachio, Peter; Stein, Lincoln

    2011-01-01

    Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice. PMID:21067998

  15. BacMet: antibacterial biocide and metal resistance genes database

    PubMed Central

    Pal, Chandan; Bengtsson-Palme, Johan; Rensing, Christopher; Kristiansson, Erik; Larsson, D. G. Joakim

    2014-01-01

    Antibiotic resistance has become a major human health concern due to widespread use, misuse and overuse of antibiotics. In addition to antibiotics, antibacterial biocides and metals can contribute to the development and maintenance of antibiotic resistance in bacterial communities through co-selection. Information on metal and biocide resistance genes, including their sequences and molecular functions, is, however, scattered. Here, we introduce BacMet (http://bacmet.biomedicine.gu.se)—a manually curated database of antibacterial biocide- and metal-resistance genes based on an in-depth review of the scientific literature. The BacMet database contains 470 experimentally verified resistance genes. In addition, the database also contains 25 477 potential resistance genes collected from public sequence repositories. All resistance genes in the BacMet database have been organized according to their molecular function and induced resistance phenotype. PMID:24304895

  16. Curating NASA's Past, Present, and Future Astromaterial Sample Collections

    NASA Technical Reports Server (NTRS)

    Zeigler, R. A.; Allton, J. H.; Evans, C. A.; Fries, M. D.; McCubbin, F. M.; Nakamura-Messenger, K.; Righter, K.; Zolensky, M.; Stansbery, E. K.

    2016-01-01

    The Astromaterials Acquisition and Curation Office at NASA Johnson Space Center (hereafter JSC curation) is responsible for curating all of NASA's extraterrestrial samples. JSC presently curates 9 different astromaterials collections in seven different clean-room suites: (1) Apollo Samples (ISO (International Standards Organization) class 6 + 7); (2) Antarctic Meteorites (ISO 6 + 7); (3) Cosmic Dust Particles (ISO 5); (4) Microparticle Impact Collection (ISO 7; formerly called Space-Exposed Hardware); (5) Genesis Solar Wind Atoms (ISO 4); (6) Stardust Comet Particles (ISO 5); (7) Stardust Interstellar Particles (ISO 5); (8) Hayabusa Asteroid Particles (ISO 5); (9) OSIRIS-REx Spacecraft Coupons and Witness Plates (ISO 7). Additional cleanrooms are currently being planned to house samples from two new collections, Hayabusa 2 (2021) and OSIRIS-REx (2023). In addition to the labs that house the samples, we maintain a wide variety of infra-structure facilities required to support the clean rooms: HEPA-filtered air-handling systems, ultrapure dry gaseous nitrogen systems, an ultrapure water system, and cleaning facilities to provide clean tools and equipment for the labs. We also have sample preparation facilities for making thin sections, microtome sections, and even focused ion-beam sections. We routinely monitor the cleanliness of our clean rooms and infrastructure systems, including measurements of inorganic or organic contamination, weekly airborne particle counts, compositional and isotopic monitoring of liquid N2 deliveries, and daily UPW system monitoring. In addition to the physical maintenance of the samples, we track within our databases the current and ever changing characteristics (weight, location, etc.) of more than 250,000 individually numbered samples across our various collections, as well as more than 100,000 images, and countless "analog" records that record the sample processing records of each individual sample. JSC Curation is co-located with JSC

  17. MicroScope--an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data.

    PubMed

    Vallenet, David; Belda, Eugeni; Calteau, Alexandra; Cruveiller, Stéphane; Engelen, Stefan; Lajus, Aurélie; Le Fèvre, François; Longin, Cyrille; Mornico, Damien; Roche, David; Rouy, Zoé; Salvignol, Gregory; Scarpelli, Claude; Thil Smith, Adam Alexander; Weiman, Marion; Médigue, Claudine

    2013-01-01

    MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest. PMID:23193269

  18. NeXO Web: the NeXO ontology database and visualization platform

    PubMed Central

    Dutkowski, Janusz; Ono, Keiichiro; Kramer, Michael; Yu, Michael; Pratt, Dexter; Demchak, Barry; Ideker, Trey

    2014-01-01

    The Network-extracted Ontology (NeXO) is a gene ontology inferred directly from large-scale molecular networks. While most ontologies are constructed through manual expert curation, NeXO uses a principled computational approach which integrates evidence from hundreds of thousands of individual gene and protein interactions to construct a global hierarchy of cellular components and processes. Here, we describe the development of the NeXO Web platform (http://www.nexontology.org)—an online database and graphical user interface for visualizing, browsing and performing term enrichment analysis using NeXO and the gene ontology. The platform applies state-of-the-art web technology and visualization techniques to provide an intuitive framework for investigating biological machinery captured by both data-driven and manually curated ontologies. PMID:24271398

  19. PomBase 2015: updates to the fission yeast database

    PubMed Central

    McDowall, Mark D.; Harris, Midori A.; Lock, Antonia; Rutherford, Kim; Staines, Daniel M.; Bähler, Jürg; Kersey, Paul J.; Oliver, Stephen G.; Wood, Valerie

    2015-01-01

    PomBase (http://www.pombase.org) is the model organism database for the fission yeast Schizosaccharomyces pombe. PomBase provides a central hub for the fission yeast community, supporting both exploratory and hypothesis-driven research. It provides users easy access to data ranging from the sequence level, to molecular and phenotypic annotations, through to the display of genome-wide high-throughput studies. Recent improvements to the site extend annotation specificity, improve usability and allow for monthly data updates. Both in-house curators and community researchers provide manually curated data to PomBase. The genome browser provides access to published high-throughput data sets and the genomes of three additional Schizosaccharomyces species (Schizosaccharomyces cryophilus, Schizosaccharomyces japonicus and Schizosaccharomyces octosporus). PMID:25361970

  20. Cazymes Analysis Toolkit (CAT): Webservice for searching and analyzing carbohydrateactive enzymes in a newly sequenced organism using CAZy database

    SciTech Connect

    Karpinets, Tatiana V; Park, Byung; Syed, Mustafa H; Uberbacher, Edward C; Leuze, Michael Rex

    2010-01-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire non-redundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains (DUF) and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit (CAT), and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

  1. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies

    PubMed Central

    Cserhati, Matyas F.; Pandey, Sanjit; Beaudoin, James J.; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S.

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33 017 407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. Database URL: http://nntc-dcc.unmc.edu PMID:26228431

  2. [Proliferative vitreoretinopathy: curative treatment].

    PubMed

    Chiquet, C; Rouberol, F

    2014-10-01

    Proliferative vitreoretinopathy (PVR), which causes contractile fibrocellular membranes that may prevent retinal reattachment, remains one of the most severe complications of rhegmatogenous retinal detachment (RD), with an incidence of 5-11%, and one of the most frequent causes of surgical failure (50-75%). Its severity is due to the complexity of the surgery required to treat patients, and to its uncertain anatomic and functional prognosis. Curative treatment of PVR includes vitrectomy, sometimes associated with phacoemulsification or scleral buckling; systematic peeling of epiretinal membranes, occasionally retinectomy; and systematic retinopexy by endolaser photocoagulation. The current preferred internal tamponade is silicone oil. Silicone oils of various densities are undergoing comparative studies. PMID:24997865

  3. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases

    PubMed Central

    Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    Motivation First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. Results We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material. PMID:27489953

  4. The Protein-DNA Interface database

    PubMed Central

    2010-01-01

    The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 Å or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface. We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes. PMID:20482798

  5. DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters

    PubMed Central

    Ichikawa, Natsuko; Sasagawa, Machi; Yamamoto, Mika; Komaki, Hisayuki; Yoshida, Yumi; Yamazaki, Shuji; Fujita, Nobuyuki

    2013-01-01

    This article introduces DoBISCUIT (Database of BIoSynthesis clusters CUrated and InTegrated, http://www.bio.nite.go.jp/pks/), a literature-based, manually curated database of gene clusters for secondary metabolite biosynthesis. Bacterial secondary metabolites often show pharmacologically important activities and can serve as lead compounds and/or candidates for drug development. Biosynthesis of each secondary metabolite is catalyzed by a number of enzymes, usually encoded by a gene cluster. Although many scientific papers describe such gene clusters, the gene information is not always described in a comprehensive manner and the related information is rarely integrated. DoBISCUIT integrates the latest literature information and provides standardized gene/module/domain descriptions related to the gene clusters. PMID:23185043

  6. The Structure–Function Linkage Database

    PubMed Central

    Akiva, Eyal; Brown, Shoshana; Almonacid, Daniel E.; Barber, Alan E.; Custer, Ashley F.; Hicks, Michael A.; Huang, Conrad C.; Lauck, Florian; Mashiyama, Susan T.; Meng, Elaine C.; Mischel, David; Morris, John H.; Ojha, Sunil; Schnoes, Alexandra M.; Stryke, Doug; Yunes, Jeffrey M.; Ferrin, Thomas E.; Holliday, Gemma L.; Babbitt, Patricia C.

    2014-01-01

    The Structure–Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure–function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies ‘look alike’, making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity. PMID:24271399

  7. Biocuration at the Saccharomyces genome database.

    PubMed

    Skrzypek, Marek S; Nash, Robert S

    2015-08-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. PMID:25997651

  8. Biocuration at the Saccharomyces Genome Database

    PubMed Central

    Skrzypek, Marek S.; Nash, Robert S.

    2015-01-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. PMID:25997651

  9. DDRprot: a database of DNA damage response-related proteins

    PubMed Central

    Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M.

    2016-01-01

    The DNA Damage Response (DDR) signalling network is an essential system that protects the genome’s integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used. Database URL: http://ddr.cbbio.es. PMID:27577567

  10. Building a genome database using an object-oriented approach.

    PubMed

    Barbasiewicz, Anna; Liu, Lin; Lang, B Franz; Burger, Gertraud

    2002-01-01

    GOBASE is a relational database that integrates data associated with mitochondria and chloroplasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions in records retrieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language. PMID:12542407

  11. DDRprot: a database of DNA damage response-related proteins.

    PubMed

    Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M

    2016-01-01

    The DNA Damage Response (DDR) signalling network is an essential system that protects the genome's integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.Database URL: http://ddr.cbbio.es. PMID:27577567

  12. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies.

    PubMed

    Cserhati, Matyas F; Pandey, Sanjit; Beaudoin, James J; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33,017,407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. PMID:26228431

  13. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015

    PubMed Central

    Davis, Allan Peter; Grondin, Cynthia J.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; King, Benjamin L.; Wiegers, Thomas C.; Mattingly, Carolyn J.

    2015-01-01

    Ten years ago, the Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) was developed out of a need to formalize, harmonize and centralize the information on numerous genes and proteins responding to environmental toxic agents across diverse species. CTD's initial approach was to facilitate comparisons of nucleotide and protein sequences of toxicologically significant genes by curating these sequences and electronically annotating them with chemical terms from their associated references. Since then, however, CTD has vastly expanded its scope to robustly represent a triad of chemical–gene, chemical–disease and gene–disease interactions that are manually curated from the scientific literature by professional biocurators using controlled vocabularies, ontologies and structured notation. Today, CTD includes 24 million toxicogenomic connections relating chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, Gene Ontology annotations, pathways and interaction modules. In this 10th year anniversary update, we outline the evolution of CTD, including our increased data content, new ‘Pathway View’ visualization tool, enhanced curation practices, pilot chemical–phenotype results and impending exposure data set. The prototype database originally described in our first report has transformed into a sophisticated resource used actively today to help scientists develop and test hypotheses about the etiologies of environmentally influenced diseases. PMID:25326323

  14. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015.

    PubMed

    Davis, Allan Peter; Grondin, Cynthia J; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; King, Benjamin L; Wiegers, Thomas C; Mattingly, Carolyn J

    2015-01-01

    Ten years ago, the Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) was developed out of a need to formalize, harmonize and centralize the information on numerous genes and proteins responding to environmental toxic agents across diverse species. CTD's initial approach was to facilitate comparisons of nucleotide and protein sequences of toxicologically significant genes by curating these sequences and electronically annotating them with chemical terms from their associated references. Since then, however, CTD has vastly expanded its scope to robustly represent a triad of chemical-gene, chemical-disease and gene-disease interactions that are manually curated from the scientific literature by professional biocurators using controlled vocabularies, ontologies and structured notation. Today, CTD includes 24 million toxicogenomic connections relating chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, Gene Ontology annotations, pathways and interaction modules. In this 10th year anniversary update, we outline the evolution of CTD, including our increased data content, new 'Pathway View' visualization tool, enhanced curation practices, pilot chemical-phenotype results and impending exposure data set. The prototype database originally described in our first report has transformed into a sophisticated resource used actively today to help scientists develop and test hypotheses about the etiologies of environmentally influenced diseases. PMID:25326323

  15. Pancreatic Cancer Database

    PubMed Central

    Thomas, Joji Kurian; Kim, Min-Sik; Balakrishnan, Lavanya; Nanjappa, Vishalakshi; Raju, Rajesh; Marimuthu, Arivusudar; Radhakrishnan, Aneesha; Muthusamy, Babylakshmi; Khan, Aafaque Ahmad; Sakamuri, Sruthi; Tankala, Shantal Gupta; Singal, Mukul; Nair, Bipin; Sirdeshmukh, Ravi; Chatterjee, Aditi; Prasad, T S Keshava; Maitra, Anirban; Gowda, Harsha; Hruban, Ralph H; Pandey, Akhilesh

    2014-01-01

    Pancreatic cancer is the fourth leading cause of cancer-related death in the world. The etiology of pancreatic cancer is heterogeneous with a wide range of alterations that have already been reported at the level of the genome, transcriptome, and proteome. The past decade has witnessed a large number of experimental studies using high-throughput technology platforms to identify genes whose expression at the transcript or protein levels is altered in pancreatic cancer. Based on expression studies, a number of molecules have also been proposed as potential biomarkers for diagnosis and prognosis of this deadly cancer. Currently, there are no repositories which provide an integrative view of multiple Omics data sets from published research on pancreatic cancer. Here, we describe the development of a web-based resource, Pancreatic Cancer Database (http://www.pancreaticcancerdatabase.org), as a unified platform for pancreatic cancer research. PCD contains manually curated information pertaining to quantitative alterations in miRNA, mRNA, and proteins obtained from small-scale as well as high-throughput studies of pancreatic cancer tissues and cell lines. We believe that PCD will serve as an integrative platform for scientific community involved in pancreatic cancer research. PMID:24839966

  16. The Protein Ensemble Database.

    PubMed

    Varadi, Mihaly; Tompa, Peter

    2015-01-01

    The scientific community's major conceptual notion of structural biology has recently shifted in emphasis from the classical structure-function paradigm due to the emergence of intrinsically disordered proteins (IDPs). As opposed to their folded cousins, these proteins are defined by the lack of a stable 3D fold and a high degree of inherent structural heterogeneity that is closely tied to their function. Due to their flexible nature, solution techniques such as small-angle X-ray scattering (SAXS), nuclear magnetic resonance (NMR) spectroscopy and fluorescence resonance energy transfer (FRET) are particularly well-suited for characterizing their biophysical properties. Computationally derived structural ensembles based on such experimental measurements provide models of the conformational sampling displayed by these proteins, and they may offer valuable insights into the functional consequences of inherent flexibility. The Protein Ensemble Database (http://pedb.vib.be) is the first openly accessible, manually curated online resource storing the ensemble models, protocols used during the calculation procedure, and underlying primary experimental data derived from SAXS and/or NMR measurements. By making this previously inaccessible data freely available to researchers, this novel resource is expected to promote the development of more advanced modelling methodologies, facilitate the design of standardized calculation protocols, and consequently lead to a better understanding of how function arises from the disordered state. PMID:26387108

  17. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  18. CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

    PubMed

    Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C

    2010-12-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi. PMID:20696711

  19. VHLdb: A database of von Hippel-Lindau protein interactors and mutations.

    PubMed

    Tabaro, Francesco; Minervini, Giovanni; Sundus, Faiza; Quaglia, Federica; Leonardi, Emanuela; Piovesan, Damiano; Tosatto, Silvio C E

    2016-01-01

    Mutations in von Hippel-Lindau tumor suppressor protein (pVHL) predispose to develop tumors affecting specific target organs, such as the retina, epididymis, adrenal glands, pancreas and kidneys. Currently, more than 400 pVHL interacting proteins are either described in the literature or predicted in public databases. This data is scattered among several different sources, slowing down the comprehension of pVHL's biological role. Here we present VHLdb, a novel database collecting available interaction and mutation data on pVHL to provide novel integrated annotations. In VHLdb, pVHL interactors are organized according to two annotation levels, manual and automatic. Mutation data are easily accessible and a novel visualization tool has been implemented. A user-friendly feedback function to improve database content through community-driven curation is also provided. VHLdb presently contains 478 interactors, of which 117 have been manually curated, and 1,074 mutations. This makes it the largest available database for pVHL-related information. VHLdb is available from URL: http://vhldb.bio.unipd.it/. PMID:27511743

  20. MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins

    PubMed Central

    Potenza, Emilio; Domenico, Tomás Di; Walsh, Ian; Tosatto, Silvio C.E.

    2015-01-01

    MobiDB (http://mobidb.bio.unipd.it/) is a database of intrinsically disordered and mobile proteins. Intrinsically disordered regions are key for the function of numerous proteins. Here we provide a new version of MobiDB, a centralized source aimed at providing the most complete picture on different flavors of disorder in protein structures covering all UniProt sequences (currently over 80 million). The database features three levels of annotation: manually curated, indirect and predicted. Manually curated data is extracted from the DisProt database. Indirect data is inferred from PDB structures that are considered an indication of intrinsic disorder. The 10 predictors currently included (three ESpritz flavors, two IUPred flavors, two DisEMBL flavors, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein in absence of more reliable data. The new version also features a consensus annotation and classification for long disordered regions. In order to complement the disorder annotations, MobiDB features additional annotations from external sources. Annotations from the UniProt database include post-translational modifications and linear motifs. Pfam annotations are displayed in graphical form and are link-enabled, allowing the user to visit the corresponding Pfam page for further information. Experimental protein–protein interactions from STRING are also classified for disorder content. PMID:25361972

  1. VHLdb: A database of von Hippel-Lindau protein interactors and mutations

    PubMed Central

    Tabaro, Francesco; Minervini, Giovanni; Sundus, Faiza; Quaglia, Federica; Leonardi, Emanuela; Piovesan, Damiano; Tosatto, Silvio C. E.

    2016-01-01

    Mutations in von Hippel-Lindau tumor suppressor protein (pVHL) predispose to develop tumors affecting specific target organs, such as the retina, epididymis, adrenal glands, pancreas and kidneys. Currently, more than 400 pVHL interacting proteins are either described in the literature or predicted in public databases. This data is scattered among several different sources, slowing down the comprehension of pVHL’s biological role. Here we present VHLdb, a novel database collecting available interaction and mutation data on pVHL to provide novel integrated annotations. In VHLdb, pVHL interactors are organized according to two annotation levels, manual and automatic. Mutation data are easily accessible and a novel visualization tool has been implemented. A user-friendly feedback function to improve database content through community-driven curation is also provided. VHLdb presently contains 478 interactors, of which 117 have been manually curated, and 1,074 mutations. This makes it the largest available database for pVHL-related information. VHLdb is available from URL: http://vhldb.bio.unipd.it/. PMID:27511743

  2. JSC Stardust Curation Team

    NASA Technical Reports Server (NTRS)

    Zolensky, Michael E.

    2000-01-01

    STARDUST, a NASA Discovery-class mission, is the first to return samples from a comet. Grains from comet Wild 2's coma-the gas and dust envelope that surrounds the nucleus-will be collected as well as interstellar dust. The mission which launched on February 7, 1999 will encounter the comet on January 10, 2004. As the spacecraft passes through the coma, a tray of silica aerogel will be exposed, and coma grains will impact there and become captured. Following the collection, the aerogel tray is closed for return to Earth in 2006. A dust impact mass spectrometer on board the STARDUST spacecraft will be used to gather spectra. of dust during the entire mission, including the coma passage. This instrument will be the best chance to obtain data on volatile grains, which will not be well-collected in the aerogel. The dust impact mass spectrometer will also be used to study the composition of interstellar grains. In the past 5 years, analysis of data from dust detectors aboard the Ulysses and Galileo spacecraft have revealed that there is a stream of interstellar dust flowing through our solar system. These grains will be captured during the cruise phase of the STARDUST mission, as the spacecraft travels toward the comet. The sample return capsule will parachute to Earth in February 2006, and will land in western Utah. Once on y the ground, the sample return capsule will be placed into a dry nitrogen environment and flown to the curation lab at JSC.

  3. FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana

    PubMed Central

    Bouché, Frédéric; Lobet, Guillaume; Tocquin, Pierre; Périlleux, Claire

    2016-01-01

    Flowering is a hot topic in Plant Biology and important progress has been made in Arabidopsis thaliana toward unraveling the genetic networks involved. The increasing complexity and the explosion of literature however require development of new tools for information management and update. We therefore created an evolutive and interactive database of flowering time genes, named FLOR-ID (Flowering-Interactive Database), which is freely accessible at http://www.flor-id.org. The hand-curated database contains information on 306 genes and links to 1595 publications gathering the work of >4500 authors. Gene/protein functions and interactions within the flowering pathways were inferred from the analysis of related publications, included in the database and translated into interactive manually drawn snapshots. PMID:26476447

  4. FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana.

    PubMed

    Bouché, Frédéric; Lobet, Guillaume; Tocquin, Pierre; Périlleux, Claire

    2016-01-01

    Flowering is a hot topic in Plant Biology and important progress has been made in Arabidopsis thaliana toward unraveling the genetic networks involved. The increasing complexity and the explosion of literature however require development of new tools for information management and update. We therefore created an evolutive and interactive database of flowering time genes, named FLOR-ID (Flowering-Interactive Database), which is freely accessible at http://www.flor-id.org. The hand-curated database contains information on 306 genes and links to 1595 publications gathering the work of >4500 authors. Gene/protein functions and interactions within the flowering pathways were inferred from the analysis of related publications, included in the database and translated into interactive manually drawn snapshots. PMID:26476447

  5. Curation and Computational Design of Bioenergy-Related Metabolic Pathways

    SciTech Connect

    Karp, Peter D.

    2014-09-12

    Pathway Tools is a systems-biology software package written by SRI International (SRI) that produces Pathway/Genome Databases (PGDBs) for organisms with a sequenced genome. Pathway Tools also provides a wide range of capabilities for analyzing predicted metabolic networks and user-generated omics data. More than 5,000 academic, industrial, and government groups have licensed Pathway Tools. This user community includes researchers at all three DOE bioenergy centers, as well as academic and industrial metabolic engineering (ME) groups. An integral part of the Pathway Tools software is MetaCyc, a large, multiorganism database of metabolic pathways and enzymes that SRI and its academic collaborators manually curate. This project included two main goals: I. Enhance the MetaCyc content of bioenergy-related enzymes and pathways. II. Develop computational tools for engineering metabolic pathways that satisfy specified design goals, in particular for bioenergy-related pathways. In part I, SRI proposed to significantly expand the coverage of bioenergy-related metabolic information in MetaCyc, followed by the generation of organism-specific PGDBs for all energy-relevant organisms sequenced at the DOE Joint Genome Institute (JGI). Part I objectives included: 1: Expand the content of MetaCyc to include bioenergy-related enzymes and pathways. 2: Enhance the Pathway Tools software to enable display of complex polymer degradation processes. 3: Create new PGDBs for the energy-related organisms sequenced by JGI, update existing PGDBs with new MetaCyc content, and make these data available to JBEI via the BioCyc website. In part II, SRI proposed to develop an efficient computational tool for the engineering of metabolic pathways. Part II objectives included: 4: Develop computational tools for generating metabolic pathways that satisfy specified design goals, enabling users to specify parameters such as starting and ending compounds, and preferred or disallowed intermediate compounds

  6. TimeLineCurator: Interactive Authoring of Visual Timelines from Unstructured Text.

    PubMed

    Fulda, Johanna; Brehmel, Matthew; Munzner, Tamara

    2016-01-01

    We present TimeLineCurator, a browser-based authoring tool that automatically extracts event data from temporal references in unstructured text documents using natural language processing and encodes them along a visual timeline. Our goal is to facilitate the timeline creation process for journalists and others who tell temporal stories online. Current solutions involve manually extracting and formatting event data from source documents, a process that tends to be tedious and error prone. With TimeLineCurator, a prospective timeline author can quickly identify the extent of time encompassed by a document, as well as the distribution of events occurring along this timeline. Authors can speculatively browse possible documents to quickly determine whether they are appropriate sources of timeline material. TimeLineCurator provides controls for curating and editing events on a timeline, the ability to combine timelines from multiple source documents, and export curated timelines for online deployment. We evaluate TimeLineCurator through a benchmark comparison of entity extraction error against a manual timeline curation process, a preliminary evaluation of the user experience of timeline authoring, a brief qualitative analysis of its visual output, and a discussion of prospective use cases suggested by members of the target author communities following its deployment. PMID:26529709

  7. Current Challenges in Development of a Database of Three-Dimensional Chemical Structures.

    PubMed

    Maeda, Miki H

    2015-01-01

    We are developing a database named 3DMET, a three-dimensional structure database of natural metabolites. There are two major impediments to the creation of 3D chemical structures from a set of planar structure drawings: the limited accuracy of computer programs and insufficient human resources for manual curation. We have tested some 2D-3D converters to convert 2D structure files from external databases. These automatic conversion processes yielded an excessive number of improper conversions. To ascertain the quality of the conversions, we compared IUPAC Chemical Identifier and canonical SMILES notations before and after conversion. Structures whose notations correspond to each other were regarded as a correct conversion in our present work. We found that chiral inversion is the most serious factor during the improper conversion. In the current stage of our database construction, published books or articles have been resources for additions to our database. Chemicals are usually drawn as pictures on the paper. To save human resources, an optical structure reader was introduced. The program was quite useful but some particular errors were observed during our operation. We hope our trials for producing correct 3D structures will help other developers of chemical programs and curators of chemical databases. PMID:26075200

  8. Current Challenges in Development of a Database of Three-Dimensional Chemical Structures

    PubMed Central

    Maeda, Miki H.

    2015-01-01

    We are developing a database named 3DMET, a three-dimensional structure database of natural metabolites. There are two major impediments to the creation of 3D chemical structures from a set of planar structure drawings: the limited accuracy of computer programs and insufficient human resources for manual curation. We have tested some 2D–3D converters to convert 2D structure files from external databases. These automatic conversion processes yielded an excessive number of improper conversions. To ascertain the quality of the conversions, we compared IUPAC Chemical Identifier and canonical SMILES notations before and after conversion. Structures whose notations correspond to each other were regarded as a correct conversion in our present work. We found that chiral inversion is the most serious factor during the improper conversion. In the current stage of our database construction, published books or articles have been resources for additions to our database. Chemicals are usually drawn as pictures on the paper. To save human resources, an optical structure reader was introduced. The program was quite useful but some particular errors were observed during our operation. We hope our trials for producing correct 3D structures will help other developers of chemical programs and curators of chemical databases. PMID:26075200

  9. Lynx: a database and knowledge extraction engine for integrative medicine.

    PubMed

    Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T Conrad; Maltsev, Natalia

    2014-01-01

    We have developed Lynx (http://lynx.ci.uchicago.edu)--a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces. PMID:24270788

  10. Comprehensive coverage of cardiovascular disease data in the disease portals at the Rat Genome Database.

    PubMed

    Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary

    2016-08-01

    Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. PMID:27287925

  11. Xenbase: expansion and updates of the Xenopus model organism database

    PubMed Central

    James-Zorn, Christina; Ponferrada, Virgilio G.; Jarabek, Chris J.; Burns, Kevin A.; Segerdell, Erik J.; Lee, Jacqueline; Snyder, Kevin; Bhattacharyya, Bishnu; Karpinka, J. Brad; Fortriede, Joshua; Bowes, Jeff B.; Zorn, Aaron M.; Vize, Peter D.

    2013-01-01

    Xenbase (http://www.xenbase.org) is a model organism database that provides genomic, molecular, cellular and developmental biology content to biomedical researchers working with the frog, Xenopus and Xenopus data to workers using other model organisms. As an amphibian Xenopus serves as a useful evolutionary bridge between invertebrates and more complex vertebrates such as birds and mammals. Xenbase content is collated from a variety of external sources using automated and semi-automated pipelines then processed via a combination of automated and manual annotation. A link-matching system allows for the wide variety of synonyms used to describe biological data on unique features, such as a gene or an anatomical entity, to be used by the database in an equivalent manner. Recent updates to the database include the Xenopus laevis genome, a new Xenopus tropicalis genome build, epigenomic data, collections of RNA and protein sequences associated with genes, more powerful gene expression searches, a community and curated wiki, an extensive set of manually annotated gene expression patterns and a new database module that contains data on over 700 antibodies that are useful for exploring Xenopus cell and developmental biology. PMID:23125366

  12. The ribosomal database project.

    PubMed Central

    Larsen, N; Olsen, G J; Maidak, B L; McCaughey, M J; Overbeek, R; Macke, T J; Marsh, T L; Woese, C R

    1993-01-01

    The Ribosomal Database Project (RDP) is a curated database that offers ribosome data along with related programs and services. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams and various software packages for handling, analyzing and displaying alignments and trees. The data are available via ftp and electronic mail. Certain analytic services are also provided by the electronic mail server. PMID:8332524

  13. Curation of the genome annotation of Pichia pastoris (Komagataella phaffii) CBS7435 from gene level to protein function.

    PubMed

    Valli, Minoska; Tatto, Nadine E; Peymann, Armin; Gruber, Clemens; Landes, Nils; Ekker, Heinz; Thallinger, Gerhard G; Mattanovich, Diethard; Gasser, Brigitte; Graf, Alexandra B

    2016-09-01

    As manually curated and non-automated BLAST analysis of the published Pichia pastoris genome sequences revealed many differences between the gene annotations of the strains GS115 and CBS7435, RNA-Seq analysis, supported by proteomics, was performed to improve the genome annotation. Detailed analysis of sequence alignment and protein domain predictions were made to extend the functional genome annotation to all P. pastoris sequences. This allowed the identification of 492 new ORFs, 4916 hypothetical UTRs and the correction of 341 incorrect ORF predictions, which were mainly due to the presence of upstream ATG or erroneous intron predictions. Moreover, 175 previously erroneously annotated ORFs need to be removed from the annotation. In total, we have annotated 5325 ORFs. Regarding the functionality of those genes, we improved all gene and protein descriptions. Thereby, the percentage of ORFs with functional annotation was increased from 48% to 73%. Furthermore, we defined functional groups, covering 25 biological cellular processes of interest, by grouping all genes that are part of the defined process. All data are presented in the newly launched genome browser and database available at www.pichiagenome.org In summary, we present a wide spectrum of curation of the P. pastoris genome annotation from gene level to protein function. PMID:27388471

  14. HistoneDB 2.0: a histone database with variants--an integrated resource to explore histones and their variants.

    PubMed

    Draizen, Eli J; Shaytan, Alexey K; Mariño-Ramírez, Leonardo; Talbert, Paul B; Landsman, David; Panchenko, Anna R

    2016-01-01

    Compaction of DNA into chromatin is a characteristic feature of eukaryotic organisms. The core (H2A, H2B, H3, H4) and linker (H1) histone proteins are responsible for this compaction through the formation of nucleosomes and higher order chromatin aggregates. Moreover, histones are intricately involved in chromatin functioning and provide a means for genome dynamic regulation through specific histone variants and histone post-translational modifications. 'HistoneDB 2.0--with variants' is a comprehensive database of histone protein sequences, classified by histone types and variants. All entries in the database are supplemented by rich sequence and structural annotations with many interactive tools to explore and compare sequences of different variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant. HistoneDB 2.0 is a resource for the interactive comparative analysis of histone protein sequences and their implications for chromatin function. Database URL: http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0. PMID:26989147

  15. Data Mining in the MetaCyc Family of Pathway Databases

    PubMed Central

    Karp, Peter D.; Paley, Suzanne; Altman, Tomer

    2013-01-01

    Pathway databases collect the bioreactions and molecular interactions that define the processes of life. The MetaCyc family of pathway databases consists of thousands of databases that were derived through computational inference of metabolic pathways from the MetaCyc Pathway/Genome Database (PGDB). In some cases these DBs underwent subsequent manual curation. Curated pathway DBs are now available for most of the major model organisms. Databases in the MetaCyc family are managed using the Pathway Tools software. This chapter presents methods for performing data mining on the MetaCyc family of pathway DBs. We discuss the major data access mechanisms for the family, which include data files in multiple formats; application programming interfaces (APIs) for the Lisp, Java, and Perl languages; and web services. We present an overview of the Pathway Tools schema, an understanding of which is needed to query the DBs. The chapter also presents several interactive data mining tools within Pathway Tools for performing omics data analysis. PMID:23192547

  16. Data mining in the MetaCyc family of pathway databases.

    PubMed

    Karp, Peter D; Paley, Suzanne; Altman, Tomer

    2013-01-01

    Pathway databases collect the bioreactions and molecular interactions that define the processes of life. The MetaCyc family of pathway databases consists of thousands of databases that were derived through computational inference of metabolic pathways from the MetaCyc pathway/genome database (PGDB). In some cases, these DBs underwent subsequent manual curation. Curated pathway DBs are now available for most of the major model organisms. Databases in the MetaCyc family are managed using the Pathway Tools software. This chapter presents methods for performing data mining on the MetaCyc family of pathway DBs. We discuss the major data access mechanisms for the family, which include data files in multiple formats; application programming interfaces (APIs) for the Lisp, Java, and Perl languages; and web services. We present an overview of the Pathway Tools schema, an understanding of which is needed to query the DBs. The chapter also presents several interactive data mining tools within Pathway Tools for performing omics data analysis. PMID:23192547

  17. The Rice Annotation Project Database (RAP-DB): 2008 update.

    PubMed

    Tanaka, Tsuyoshi; Antonio, Baltazar A; Kikuchi, Shoshi; Matsumoto, Takashi; Nagamura, Yoshiaki; Numa, Hisataka; Sakai, Hiroaki; Wu, Jianzhong; Itoh, Takeshi; Sasaki, Takuji; Aono, Ryo; Fujii, Yasuyuki; Habara, Takuya; Harada, Erimi; Kanno, Masako; Kawahara, Yoshihiro; Kawashima, Hiroaki; Kubooka, Hiromi; Matsuya, Akihiro; Nakaoka, Hajime; Saichi, Naomi; Sanbonmatsu, Ryoko; Sato, Yoshiharu; Shinso, Yuji; Suzuki, Mami; Takeda, Jun-ichi; Tanino, Motohiko; Todokoro, Fusano; Yamaguchi, Kaori; Yamamoto, Naoyuki; Yamasaki, Chisato; Imanishi, Tadashi; Okido, Toshihisa; Tada, Masahito; Ikeo, Kazuho; Tateno, Yoshio; Gojobori, Takashi; Lin, Yao-Cheng; Wei, Fu-Jin; Hsing, Yue-ie; Zhao, Qiang; Han, Bin; Kramer, Melissa R; McCombie, Richard W; Lonsdale, David; O'Donovan, Claire C; Whitfield, Eleanor J; Apweiler, Rolf; Koyanagi, Kanako O; Khurana, Jitendra P; Raghuvanshi, Saurabh; Singh, Nagendra K; Tyagi, Akhilesh K; Haberer, Georg; Fujisawa, Masaki; Hosokawa, Satomi; Ito, Yukiyo; Ikawa, Hiroshi; Shibata, Michie; Yamamoto, Mayu; Bruskiewich, Richard M; Hoen, Douglas R; Bureau, Thomas E; Namiki, Nobukazu; Ohyanagi, Hajime; Sakai, Yasumichi; Nobushima, Satoshi; Sakata, Katsumi; Barrero, Roberto A; Sato, Yutaka; Souvorov, Alexandre; Smith-White, Brian; Tatusova, Tatiana; An, Suyoung; An, Gynheung; OOta, Satoshi; Fuks, Galina; Fuks, Galina; Messing, Joachim; Christie, Karen R; Lieberherr, Damien; Kim, HyeRan; Zuccolo, Andrea; Wing, Rod A; Nobuta, Kan; Green, Pamela J; Lu, Cheng; Meyers, Blake C; Chaparro, Cristian; Piegu, Benoit; Panaud, Olivier; Echeverria, Manuel

    2008-01-01

    The Rice Annotation Project Database (RAP-DB) was created to provide the genome sequence assembly of the International Rice Genome Sequencing Project (IRGSP), manually curated annotation of the sequence, and other genomics information that could be useful for comprehensive understanding of the rice biology. Since the last publication of the RAP-DB, the IRGSP genome has been revised and reassembled. In addition, a large number of rice-expressed sequence tags have been released, and functional genomics resources have been produced worldwide. Thus, we have thoroughly updated our genome annotation by manual curation of all the functional descriptions of rice genes. The latest version of the RAP-DB contains a variety of annotation data as follows: clone positions, structures and functions of 31 439 genes validated by cDNAs, RNA genes detected by massively parallel signature sequencing (MPSS) technology and sequence similarity, flanking sequences of mutant lines, transposable elements, etc. Other annotation data such as Gnomon can be displayed along with those of RAP for comparison. We have also developed a new keyword search system to allow the user to access useful information. The RAP-DB is available at: http://rapdb.dna.affrc.go.jp/ and http://rapdb.lab.nig.ac.jp/. PMID:18089549

  18. Managing biological networks by using text mining and computer-aided curation

    NASA Astrophysics Data System (ADS)

    Yu, Seok Jong; Cho, Yongseong; Lee, Min-Ho; Lim, Jongtae; Yoo, Jaesoo

    2015-11-01

    In order to understand a biological mechanism in a cell, a researcher should collect a huge number of protein interactions with experimental data from experiments and the literature. Text mining systems that extract biological interactions from papers have been used to construct biological networks for a few decades. Even though the text mining of literature is necessary to construct a biological network, few systems with a text mining tool are available for biologists who want to construct their own biological networks. We have developed a biological network construction system called BioKnowledge Viewer that can generate a biological interaction network by using a text mining tool and biological taggers. It also Boolean simulation software to provide a biological modeling system to simulate the model that is made with the text mining tool. A user can download PubMed articles and construct a biological network by using the Multi-level Knowledge Emergence Model (KMEM), MetaMap, and A Biomedical Named Entity Recognizer (ABNER) as a text mining tool. To evaluate the system, we constructed an aging-related biological network that consist 9,415 nodes (genes) by using manual curation. With network analysis, we found that several genes, including JNK, AP-1, and BCL-2, were highly related in aging biological network. We provide a semi-automatic curation environment so that users can obtain a graph database for managing text mining results that are generated in the server system and can navigate the network with BioKnowledge Viewer, which is freely available at http://bioknowledgeviewer.kisti.re.kr.

  19. Teacher Training in Curative Education.

    ERIC Educational Resources Information Center

    Juul, Kristen D.; Maier, Manfred

    1992-01-01

    This article considers the application of the philosophical and educational principles of Rudolf Steiner, called "anthroposophy," to the training of teachers and curative educators in the Waldorf schools. Special emphasis is on the Camphill movement which focuses on therapeutic schools and communities for children with special needs. (DB)

  20. Cognitive Curations of Collaborative Curricula

    ERIC Educational Resources Information Center

    Ackerman, Amy S.

    2015-01-01

    Assuming the role of learning curators, 22 graduate students (in-service teachers) addressed authentic problems (challenges) within their respective classrooms by selecting digital tools as part of implementation of interdisciplinary lesson plans. Students focused on formative assessment tools as a means to gather evidence to make improvements in…

  1. How should the completeness and quality of curated nanomaterial data be evaluated?

    PubMed

    Marchese Robinson, Richard L; Lynch, Iseult; Peijnenburg, Willie; Rumble, John; Klaessig, Fred; Marquardt, Clarissa; Rauscher, Hubert; Puzyn, Tomasz; Purian, Ronit; Åberg, Christoffer; Karcher, Sandra; Vriens, Hanne; Hoet, Peter; Hoover, Mark D; Hendren, Christine Ogilvie; Harper, Stacey L

    2016-05-21

    Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated? PMID:27143028

  2. TOWARDS PATHWAY CURATION THROUGH LITERATURE MINING – A CASE STUDY USING PHARMGKB

    PubMed Central

    RAVIKUMAR, K.E.; WAGHOLIKAR, KAVISHWAR B.; LIU, HONGFANG

    2014-01-01

    The creation of biological pathway knowledge bases is largely driven by manual effort to curate based on evidences from the scientific literature. It is highly challenging for the curators to keep up with the literature. Text mining applications have been developed in the last decade to assist human curators to speed up the curation pace where majority of them aim to identify the most relevant papers for curation with little attempt to directly extract the pathway information from text. In this paper, we describe a rule-based literature mining system to extract pathway information from text. We evaluated the system using curated pharmacokinetic (PK) and pharmacodynamic (PD) pathways in PharmGKB. The system achieved an F-measure of 63.11% and 34.99% for entity extraction and event extraction respectively against all PubMed abstracts cited in PharmGKB. It may be possible to improve the system performance by incorporating using statistical machine learning approaches. This study also helped us gain insights into the barriers towards automated event extraction from text for pathway curation. PMID:24297561

  3. Towards pathway curation through literature mining--a case study using PharmGKB.

    PubMed

    Ravikumar, K E; Wagholikar, Kavishwar B; Liu, Hongfang

    2014-01-01

    The creation of biological pathway knowledge bases is largely driven by manual effort to curate based on evidences from the scientific literature. It is highly challenging for the curators to keep up with the literature. Text mining applications have been developed in the last decade to assist human curators to speed up the curation pace where majority of them aim to identify the most relevant papers for curation with little attempt to directly extract the pathway information from text. In this paper, we describe a rule-based literature mining system to extract pathway information from text. We evaluated the system using curated pharmacokinetic (PK) and pharmacodynamic (PD) pathways in PharmGKB. The system achieved an F-measure of 63.11% and 34.99% for entity extraction and event extraction respectively against all PubMed abstracts cited in PharmGKB. It may be possible to improve the system performance by incorporating using statistical machine learning approaches. This study also helped us gain insights into the barriers towards automated event extraction from text for pathway curation. PMID:24297561

  4. NPInter v3.0: an upgraded database of noncoding RNA-associated interactions

    PubMed Central

    Hao, Yajing; Wu, Wei; Li, Hui; Yuan, Jiao; Luo, Jianjun; Zhao, Yi; Chen, Runsheng

    2016-01-01

    Despite the fact that a large quantity of noncoding RNAs (ncRNAs) have been identified, their functions remain unclear. To enable researchers to have a better understanding of ncRNAs’ functions, we updated the NPInter database to version 3.0, which contains experimentally verified interactions between ncRNAs (excluding tRNAs and rRNAs), especially long noncoding RNAs (lncRNAs) and other biomolecules (proteins, mRNAs, miRNAs and genomic DNAs). In NPInter v3.0, interactions pertaining to ncRNAs are not only manually curated from scientific literature but also curated from high-throughput technologies. In addition, we also curated lncRNA–miRNA interactions from in silico predictions supported by AGO CLIP-seq data. When compared with NPInter v2.0, the interactions are more informative (with additional information on tissues or cell lines, binding sites, conservation, co-expression values and other features) and more organized (with divisions on data sets by data sources, tissues or cell lines, experiments and other criteria). NPInter v3.0 expands the data set to 491,416 interactions in 188 tissues (or cell lines) from 68 kinds of experimental technologies. NPInter v3.0 also improves the user interface and adds new web services, including a local UCSC Genome Browser to visualize binding sites. Additionally, NPInter v3.0 defined a high-confidence set of interactions and predicted the functions of lncRNAs in human and mouse based on the interactions curated in the database. NPInter v3.0 is available at http://www.bioinfo.org/NPInter/. Database URL: http://www.bioinfo.org/NPInter/ PMID:27087310

  5. NPInter v3.0: an upgraded database of noncoding RNA-associated interactions.

    PubMed

    Hao, Yajing; Wu, Wei; Li, Hui; Yuan, Jiao; Luo, Jianjun; Zhao, Yi; Chen, Runsheng

    2016-01-01

    Despite the fact that a large quantity of noncoding RNAs (ncRNAs) have been identified, their functions remain unclear. To enable researchers to have a better understanding of ncRNAs' functions, we updated the NPInter database to version 3.0, which contains experimentally verified interactions between ncRNAs (excluding tRNAs and rRNAs), especially long noncoding RNAs (lncRNAs) and other biomolecules (proteins, mRNAs, miRNAs and genomic DNAs). In NPInter v3.0, interactions pertaining to ncRNAs are not only manually curated from scientific literature but also curated from high-throughput technologies. In addition, we also curated lncRNA-miRNA interactions fromin silicopredictions supported by AGO CLIP-seq data. When compared with NPInter v2.0, the interactions are more informative (with additional information on tissues or cell lines, binding sites, conservation, co-expression values and other features) and more organized (with divisions on data sets by data sources, tissues or cell lines, experiments and other criteria). NPInter v3.0 expands the data set to 491,416 interactions in 188 tissues (or cell lines) from 68 kinds of experimental technologies. NPInter v3.0 also improves the user interface and adds new web services, including a local UCSC Genome Browser to visualize binding sites. Additionally, NPInter v3.0 defined a high-confidence set of interactions and predicted the functions of lncRNAs in human and mouse based on the interactions curated in the database. NPInter v3.0 is available athttp://www.bioinfo.org/NPInter/Database URL:http://www.bioinfo.org/NPInter/. PMID:27087310

  6. Kalium: a database of potassium channel toxins from scorpion venom.

    PubMed

    Kuzmenkov, Alexey I; Krylov, Nikolay A; Chugunov, Anton O; Grishin, Eugene V; Vassilevski, Alexander A

    2016-01-01

    Kalium (http://kaliumdb.org/) is a manually curated database that accumulates data on potassium channel toxins purified from scorpion venom (KTx). This database is an open-access resource, and provides easy access to pages of other databases of interest, such as UniProt, PDB, NCBI Taxonomy Browser, and PubMed. General achievements of Kalium are a strict and easy regulation of KTx classification based on the unified nomenclature supported by researchers in the field, removal of peptides with partial sequence and entries supported by transcriptomic information only, classification of β-family toxins, and addition of a novel λ-family. Molecules presented in the database can be processed by the Clustal Omega server using a one-click option. Molecular masses of mature peptides are calculated and available activity data are compiled for all KTx. We believe that Kalium is not only of high interest to professional toxinologists, but also of general utility to the scientific community.Database URL:http://kaliumdb.org/. PMID:27087309

  7. Kalium: a database of potassium channel toxins from scorpion venom

    PubMed Central

    Kuzmenkov, Alexey I.; Krylov, Nikolay A.; Chugunov, Anton O.; Grishin, Eugene V.; Vassilevski, Alexander A.

    2016-01-01

    Kalium (http://kaliumdb.org/) is a manually curated database that accumulates data on potassium channel toxins purified from scorpion venom (KTx). This database is an open-access resource, and provides easy access to pages of other databases of interest, such as UniProt, PDB, NCBI Taxonomy Browser, and PubMed. General achievements of Kalium are a strict and easy regulation of KTx classification based on the unified nomenclature supported by researchers in the field, removal of peptides with partial sequence and entries supported by transcriptomic information only, classification of β-family toxins, and addition of a novel λ-family. Molecules presented in the database can be processed by the Clustal Omega server using a one-click option. Molecular masses of mature peptides are calculated and available activity data are compiled for all KTx. We believe that Kalium is not only of high interest to professional toxinologists, but also of general utility to the scientific community. Database URL: http://kaliumdb.org/ PMID:27087309

  8. Clean and Cold Sample Curation

    NASA Technical Reports Server (NTRS)

    Allen, C. C.; Agee, C. B.; Beer, R.; Cooper, B. L.

    2000-01-01

    Curation of Mars samples includes both samples that are returned to Earth, and samples that are collected, examined, and archived on Mars. Both kinds of curation operations will require careful planning to ensure that the samples are not contaminated by the instruments that are used to collect and contain them. In both cases, sample examination and subdivision must take place in an environment that is organically, inorganically, and biologically clean. Some samples will need to be prepared for analysis under ultra-clean or cryogenic conditions. Inorganic and biological cleanliness are achievable separately by cleanroom and biosafety lab techniques. Organic cleanliness to the <50 ng/sq cm level requires material control and sorbent removal - techniques being applied in our Class 10 cleanrooms and sample processing gloveboxes.

  9. CTdatabase: a knowledge-base of high-throughput and curated data on cancer-testis antigens.

    PubMed

    Almeida, Luiz Gonzaga; Sakabe, Noboru J; deOliveira, Alice R; Silva, Maria Cristina C; Mundstein, Alex S; Cohen, Tzeela; Chen, Yao-Tseng; Chua, Ramon; Gurung, Sita; Gnjatic, Sacha; Jungbluth, Achim A; Caballero, Otávia L; Bairoch, Amos; Kiesler, Eva; White, Sarah L; Simpson, Andrew J G; Old, Lloyd J; Camargo, Anamaria A; Vasconcelos, Ana Tereza R

    2009-01-01

    The potency of the immune response has still to be harnessed effectively to combat human cancers. However, the discovery of T-cell targets in melanomas and other tumors has raised the possibility that cancer vaccines can be used to induce a therapeutically effective immune response against cancer. The targets, cancer-testis (CT) antigens, are immunogenic proteins preferentially expressed in normal gametogenic tissues and different histological types of tumors. Therapeutic cancer vaccines directed against CT antigens are currently in late-stage clinical trials testing whether they can delay or prevent recurrence of lung cancer and melanoma following surgical removal of primary tumors. CT antigens constitute a large, but ill-defined, family of proteins that exhibit a remarkably restricted expression. Currently, there is a considerable amount of information about these proteins, but the data are scattered through the literature and in several bioinformatic databases. The database presented here, CTdatabase (http://www.cta.lncc.br), unifies this knowledge to facilitate both the mining of the existing deluge of data, and the identification of proteins alleged to be CT antigens, but that do not have their characteristic restricted expression pattern. CTdatabase is more than a repository of CT antigen data, since all the available information was carefully curated and annotated with most data being specifically processed for CT antigens and stored locally. Starting from a compilation of known CT antigens, CTdatabase provides basic information including gene names and aliases, RefSeq accession numbers, genomic location, known splicing variants, gene duplications and additional family members. Gene expression at the mRNA level in normal and tumor tissues has been collated from publicly available data obtained by several different technologies. Manually curated data related to mRNA and protein expression, and antigen-specific immune responses in cancer patients are also

  10. CTdatabase: a knowledge-base of high-throughput and curated data on cancer-testis antigens

    PubMed Central

    Almeida, Luiz Gonzaga; Sakabe, Noboru J.; deOliveira, Alice R.; Silva, Maria Cristina C.; Mundstein, Alex S.; Cohen, Tzeela; Chen, Yao-Tseng; Chua, Ramon; Gurung, Sita; Gnjatic, Sacha; Jungbluth, Achim A.; Caballero, Otávia L.; Bairoch, Amos; Kiesler, Eva; White, Sarah L.; Simpson, Andrew J. G.; Old, Lloyd J.; Camargo, Anamaria A.; Vasconcelos, Ana Tereza R.

    2009-01-01

    The potency of the immune response has still to be harnessed effectively to combat human cancers. However, the discovery of T-cell targets in melanomas and other tumors has raised the possibility that cancer vaccines can be used to induce a therapeutically effective immune response against cancer. The targets, cancer-testis (CT) antigens, are immunogenic proteins preferentially expressed in normal gametogenic tissues and different histological types of tumors. Therapeutic cancer vaccines directed against CT antigens are currently in late-stage clinical trials testing whether they can delay or prevent recurrence of lung cancer and melanoma following surgical removal of primary tumors. CT antigens constitute a large, but ill-defined, family of proteins that exhibit a remarkably restricted expression. Currently, there is a considerable amount of information about these proteins, but the data are scattered through the literature and in several bioinformatic databases. The database presented here, CTdatabase (http://www.cta.lncc.br), unifies this knowledge to facilitate both the mining of the existing deluge of data, and the identification of proteins alleged to be CT antigens, but that do not have their characteristic restricted expression pattern. CTdatabase is more than a repository of CT antigen data, since all the available information was carefully curated and annotated with most data being specifically processed for CT antigens and stored locally. Starting from a compilation of known CT antigens, CTdatabase provides basic information including gene names and aliases, RefSeq accession numbers, genomic location, known splicing variants, gene duplications and additional family members. Gene expression at the mRNA level in normal and tumor tissues has been collated from publicly available data obtained by several different technologies. Manually curated data related to mRNA and protein expression, and antigen-specific immune responses in cancer patients are also

  11. ORegAnno 3.0: a community-driven resource for curated regulatory annotation

    PubMed Central

    Lesurf, Robert; Cotto, Kelsy C.; Wang, Grace; Griffith, Malachi; Kasaian, Katayoon; Jones, Steven J. M.; Montgomery, Stephen B.; Griffith, Obi L.

    2016-01-01

    The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/. PMID:26578589

  12. ORegAnno 3.0: a community-driven resource for curated regulatory annotation.

    PubMed

    Lesurf, Robert; Cotto, Kelsy C; Wang, Grace; Griffith, Malachi; Kasaian, Katayoon; Jones, Steven J M; Montgomery, Stephen B; Griffith, Obi L

    2016-01-01

    The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/. PMID:26578589

  13. WormBase 2014: new views of curated biology

    PubMed Central

    Harris, Todd W.; Baran, Joachim; Bieri, Tamberlyn; Cabunoc, Abigail; Chan, Juancarlos; Chen, Wen J.; Davis, Paul; Done, James; Grove, Christian; Howe, Kevin; Kishore, Ranjana; Lee, Raymond; Li, Yuling; Muller, Hans-Michael; Nakamura, Cecilia; Ozersky, Philip; Paulini, Michael; Raciti, Daniela; Schindelman, Gary; Tuli, Mary Ann; Auken, Kimberly Van; Wang, Daniel; Wang, Xiaodong; Williams, Gary; Wong, J. D.; Yook, Karen; Schedl, Tim; Hodgkin, Jonathan; Berriman, Matthew; Kersey, Paul; Spieth, John; Stein, Lincoln; Sternberg, Paul W.

    2014-01-01

    WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest. PMID:24194605

  14. Rfam 12.0: updates to the RNA families database.

    PubMed

    Nawrocki, Eric P; Burge, Sarah W; Bateman, Alex; Daub, Jennifer; Eberhardt, Ruth Y; Eddy, Sean R; Floden, Evan W; Gardner, Paul P; Jones, Thomas A; Tate, John; Finn, Robert D

    2015-01-01

    The Rfam database (available at http://rfam.xfam.org) is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families. PMID:25392425

  15. NESdb: a database of NES-containing CRM1 cargoes.

    PubMed

    Xu, Darui; Grishin, Nick V; Chook, Yuh Min

    2012-09-01

    The leucine-rich nuclear export signal (NES) is the only known class of targeting signal that directs macromolecules out of the cell nucleus. NESs are short stretches of 8-15 amino acids with regularly spaced hydrophobic residues that bind the export karyopherin CRM1. NES-containing proteins are involved in numerous cellular and disease processes. We compiled a database named NESdb that contains 221 NES-containing CRM1 cargoes that were manually curated from the published literature. Each NESdb entry is annotated with information about sequence and structure of both the NES and the cargo protein, as well as information about experimental evidence of NES-mapping and CRM1-mediated nuclear export. NESdb will be updated regularly and will serve as an important resource for nuclear export signals. NESdb is freely available to nonprofit organizations at http://prodata.swmed.edu/LRNes. PMID:22833564

  16. NESdb: a database of NES-containing CRM1 cargoes

    PubMed Central

    Xu, Darui; Grishin, Nick V.; Chook, Yuh Min

    2012-01-01

    The leucine-rich nuclear export signal (NES) is the only known class of targeting signal that directs macromolecules out of the cell nucleus. NESs are short stretches of 8–15 amino acids with regularly spaced hydrophobic residues that bind the export karyopherin CRM1. NES-containing proteins are involved in numerous cellular and disease processes. We compiled a database named NESdb that contains 221 NES-containing CRM1 cargoes that were manually curated from the published literature. Each NESdb entry is annotated with information about sequence and structure of both the NES and the cargo protein, as well as information about experimental evidence of NES-mapping and CRM1-mediated nuclear export. NESdb will be updated regularly and will serve as an important resource for nuclear export signals. NESdb is freely available to nonprofit organizations at http://prodata.swmed.edu/LRNes. PMID:22833564

  17. The MetaboLights repository: curation challenges in metabolomics

    PubMed Central

    Salek, Reza M.; Haug, Kenneth; Conesa, Pablo; Hastings, Janna; Williams, Mark; Mahendraker, Tejasvi; Maguire, Eamonn; González-Beltrán, Alejandra N.; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Steinbeck, Christoph

    2013-01-01

    MetaboLights is the first general-purpose open-access curated repository for metabolomic studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Increases in the number of depositions, number of samples per study and the file size of data submitted to MetaboLights present a challenge for the objective of ensuring high-quality and standardized data in the context of diverse metabolomic workflows and data representations. Here, we describe the MetaboLights curation pipeline, its challenges and its practical application in quality control of complex data depositions. Database URL: http://www.ebi.ac.uk/metabolights PMID:23630246

  18. The MetaboLights repository: curation challenges in metabolomics.

    PubMed

    Salek, Reza M; Haug, Kenneth; Conesa, Pablo; Hastings, Janna; Williams, Mark; Mahendraker, Tejasvi; Maguire, Eamonn; González-Beltrán, Alejandra N; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Steinbeck, Christoph

    2013-01-01

    MetaboLights is the first general-purpose open-access curated repository for metabolomic studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Increases in the number of depositions, number of samples per study and the file size of data submitted to MetaboLights present a challenge for the objective of ensuring high-quality and standardized data in the context of diverse metabolomic workflows and data representations. Here, we describe the MetaboLights curation pipeline, its challenges and its practical application in quality control of complex data depositions. Database URL: http://www.ebi.ac.uk/metabolights. PMID:23630246

  19. LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs.

    PubMed

    Ma, Lina; Li, Ang; Zou, Dong; Xu, Xingjian; Xia, Lin; Yu, Jun; Bajic, Vladimir B; Zhang, Zhang

    2015-01-01

    Long non-coding RNAs (lncRNAs) perform a diversity of functions in numerous important biological processes and are implicated in many human diseases. In this report we present lncRNAWiki (http://lncrna.big.ac.cn), a wiki-based platform that is open-content and publicly editable and aimed at community-based curation and collection of information on human lncRNAs. Current related databases are dependent primarily on curation by experts, making it laborious to annotate the exponentially accumulated information on lncRNAs, which inevitably requires collective efforts in community-based curation of lncRNAs. Unlike existing databases, lncRNAWiki features comprehensive integration of information on human lncRNAs obtained from multiple different resources and allows not only existing lncRNAs to be edited, updated and curated by different users but also the addition of newly identified lncRNAs by any user. It harnesses community collective knowledge in collecting, editing and annotating human lncRNAs and rewards community-curated efforts by providing explicit authorship based on quantified contributions. LncRNAWiki relies on the underling knowledge of scientific community for collective and collaborative curation of human lncRNAs and thus has the potential to serve as an up-to-date and comprehensive knowledgebase for human lncRNAs. PMID:25399417

  20. MET network in PubMed: a text-mined network visualization and curation system

    PubMed Central

    Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian

    2016-01-01

    Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET. Database URL: http://btm.tmu.edu.tw/metastasisway PMID:27242035

  1. MET network in PubMed: a text-mined network visualization and curation system.

    PubMed

    Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian

    2016-01-01

    Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway. PMID:27242035

  2. Methods and strategies for gene structure curation in WormBase

    PubMed Central

    Williams, G.W.; Davis, P.A.; Rogers, A.S.; Bieri, T.; Ozersky, P.; Spieth, J.

    2011-01-01

    The Caenorhabditis elegans genome sequence was published over a decade ago; this was the first published genome of a multi-cellular organism and now the WormBase project has had a decade of experience in curating this genome's sequence and gene structures. In one of its roles as a central repository for nematode biology, WormBase continues to refine the gene structure annotations using sequence similarity and other computational methods, as well as information from the literature- and community-submitted annotations. We describe the various methods of gene structure curation that have been tried by WormBase and the problems associated with each of them. We also describe the current strategy for gene structure curation, and introduce the WormBase ‘curation tool’, which integrates different data sources in order to identify new and correct gene structures. Database URL: http://www.wormbase.org/ PMID:21543339

  3. MeioBase: a comprehensive database for meiosis.

    PubMed

    Li, Hao; Meng, Fanrui; Guo, Chunce; Wang, Yingxiang; Xie, Xiaojing; Zhu, Tiansheng; Zhou, Shuigeng; Ma, Hong; Shan, Hongyan; Kong, Hongzhi

    2014-01-01

    Meiosis is a special type of cell division process necessary for the sexual reproduction of all eukaryotes. The ever expanding meiosis research calls for an effective and specialized database that is not readily available yet. To fill this gap, we have developed a knowledge database MeioBase (http://meiosis.ibcas.ac.cn), which is comprised of two core parts, Resources and Tools. In the Resources part, a wealth of meiosis data collected by curation and manual review from published literatures and biological databases are integrated and organized into various sections, such as Cytology, Pathway, Species, Interaction, and Expression. In the Tools part, some useful tools have been integrated into MeioBase, such as Search, Download, Blast, Comparison, My Favorites, Submission, and Advice. With a simplified and efficient web interface, users are able to search against the database with gene model IDs or keywords, and batch download the data for local investigation. We believe that MeioBase can greatly facilitate the researches related to meiosis. PMID:25566299

  4. DBatVir: the database of bat-associated viruses.

    PubMed

    Chen, Lihong; Liu, Bo; Yang, Jian; Jin, Qi

    2014-01-01

    Emerging infectious diseases remain a significant threat to public health. Most emerging infectious disease agents in humans are of zoonotic origin. Bats are important reservoir hosts of many highly lethal zoonotic viruses and have been implicated in numerous emerging infectious disease events in recent years. It is essential to enhance our knowledge and understanding of the genetic diversity of the bat-associated viruses to prevent future outbreaks. To facilitate further research, we constructed the database of bat-associated viruses (DBatVir). Known viral sequences detected in bat samples were manually collected and curated, along with the related metadata, such as the sampling time, location, bat species and specimen type. Additional information concerning the bats, including common names, diet type, geographic distribution and phylogeny were integrated into the database to bridge the gap between virologists and zoologists. The database currently covers >4100 bat-associated animal viruses of 23 viral families detected from 196 bat species in 69 countries worldwide. It provides an overview and snapshot of the current research regarding bat-associated viruses, which is essential now that the field is rapidly expanding. With a user-friendly interface and integrated online bioinformatics tools, DBatVir provides a convenient and powerful platform for virologists and zoologists to analyze the virome diversity of bats, as well as for epidemiologists and public health researchers to monitor and track current and future bat-related infectious diseases. Database URL: http://www.mgc.ac.cn/DBatVir/. PMID:24647629

  5. Management Manual.

    ERIC Educational Resources Information Center

    San Joaquin Delta Community Coll. District, CA.

    This manual articulates the rights, responsibilities, entitlements, and conditions of employment of management personnel at San Joaquin Delta College (SJDC). The manual first presents SJDC's mission statement and then discusses the college's management goals and priorities. An examination of SJDC's administrative organization and a list of…

  6. Resource Manual

    ERIC Educational Resources Information Center

    Human Development Institute, 2008

    2008-01-01

    This manual was designed primarily for use by individuals with developmental disabilities and related conditions. The main focus of this manual is to provide easy-to-read information concerning available resources, and to provide immediate contact information for the purpose of applying for resources and/or locating additional information. The…

  7. Terminology Manual.

    ERIC Educational Resources Information Center

    Felber, Helmut

    A product of the International Information Center for Terminology (Infoterm), this manual is designed to serve as a reference tool for practitioners active in terminology work and documentation. The manual explores the basic ideas of the Vienna School of Terminology and explains developments in the area of applied computer aided terminography…

  8. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs.

    PubMed

    Quek, Xiu Cheng; Thomson, Daniel W; Maag, Jesper L V; Bartonicek, Nenad; Signal, Bethany; Clark, Michael B; Gloss, Brian S; Dinger, Marcel E

    2015-01-01

    Despite the prevalence of long noncoding RNA (lncRNA) genes in eukaryotic genomes, only a small proportion have been examined for biological function. lncRNAdb, available at http://lncrnadb.org, provides users with a comprehensive, manually curated reference database of 287 eukaryotic lncRNAs that have been described independently in the scientific literature. In addition to capturing a great proportion of the recent literature describing functions for individual lncRNAs, lncRNAdb now offers an improved user interface enabling greater accessibility to sequence information, expression data and the literature. The new features in lncRNAdb include the integration of Illumina Body Atlas expression profiles, nucleotide sequence information, a BLAST search tool and easy export of content via direct download or a REST API. lncRNAdb is now endorsed by RNAcentral and is in compliance with the International Nucleotide Sequence Database Collaboration. PMID:25332394

  9. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs

    PubMed Central

    Quek, Xiu Cheng; Thomson, Daniel W.; Maag, Jesper L.V.; Bartonicek, Nenad; Signal, Bethany; Clark, Michael B.; Gloss, Brian S.; Dinger, Marcel E.

    2015-01-01

    Despite the prevalence of long noncoding RNA (lncRNA) genes in eukaryotic genomes, only a small proportion have been examined for biological function. lncRNAdb, available at http://lncrnadb.org, provides users with a comprehensive, manually curated reference database of 287 eukaryotic lncRNAs that have been described independently in the scientific literature. In addition to capturing a great proportion of the recent literature describing functions for individual lncRNAs, lncRNAdb now offers an improved user interface enabling greater accessibility to sequence information, expression data and the literature. The new features in lncRNAdb include the integration of Illumina Body Atlas expression profiles, nucleotide sequence information, a BLAST search tool and easy export of content via direct download or a REST API. lncRNAdb is now endorsed by RNAcentral and is in compliance with the International Nucleotide Sequence Database Collaboration. PMID:25332394

  10. VIEWCACHE: An incremental pointer-base access method for distributed databases. Part 1: The universal index system design document. Part 2: The universal index system low-level design document. Part 3: User's guide. Part 4: Reference manual. Part 5: UIMS test suite

    NASA Technical Reports Server (NTRS)

    Kelley, Steve; Roussopoulos, Nick; Sellis, Timos

    1992-01-01

    The goal of the Universal Index System (UIS), is to provide an easy-to-use and reliable interface to many different kinds of database systems. The impetus for this system was to simplify database index management for users, thus encouraging the use of indexes. As the idea grew into an actual system design, the concept of increasing database performance by facilitating the use of time-saving techniques at the user level became a theme for the project. This Final Report describes the Design, the Implementation of UIS, and its Language Interfaces. It also includes the User's Guide and the Reference Manual.

  11. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  12. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database

    PubMed Central

    Winsor, Geoffrey L.; Griffiths, Emma J.; Lo, Raymond; Dhillon, Bhavjinder K.; Shay, Julie A.; Brinkman, Fiona S. L.

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  13. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    PubMed

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics. PMID:23299411

  14. Manual de Carpinteria (Carpentry Manual).

    ERIC Educational Resources Information Center

    TomSing, Luisa B.

    This manual is part of a Mexican series of instructional materials designed for Spanish speaking adults who are in the process of becoming literate or have recently become literate in their native language. The manual describes a carpentry course that is structured to appeal to the student as a self-directing adult. The following units are…

  15. Data Curation Is for Everyone! The Case for Master's and Baccalaureate Institutional Engagement with Data Curation

    ERIC Educational Resources Information Center

    Shorish, Yasmeen

    2012-01-01

    This article describes the fundamental challenges to data curation, how these challenges may be compounded for smaller institutions, and how data management is an essential and manageable component of data curation. Data curation is often discussed within the confines of large research universities. As a result, master's and baccalaureate…

  16. The Ribosomal Database Project.

    PubMed Central

    Maidak, B L; Larsen, N; McCaughey, M J; Overbeek, R; Olsen, G J; Fogel, K; Blandy, J; Woese, C R

    1994-01-01

    The Ribosomal Database Project (RDP) is a curated database that offers ribosome-related data, analysis services, and associated computer programs. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (rdp.life.uiuc.edu), electronic mail (server/rdp.life.uiuc.edu) and gopher (rdpgopher.life.uiuc.edu). The electronic mail server also provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for chimeric nature of newly sequenced rRNAs, and automated alignment. PMID:7524021

  17. Maize Genetics and Genomics Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The 2007 report for MaizeGDB lists the new hires who will focus on curation/outreach and the genome sequence, respectively. Currently all sequence in the database comes from a PlantGDB pipeline and is presented with deep links to external resources such as PlantGDB, Dana Farber, GenBank, the Arizona...

  18. Pathway Interaction Database (PID) —

    Cancer.gov

    The National Cancer Institute (NCI) in collaboration with Nature Publishing Group has established the Pathway Interaction Database (PID) in order to provide a highly structured, curated collection of information about known biomolecular interactions and key cellular processes assembled into signaling pathways.

  19. Biological Databases for Behavioral Neurobiology

    PubMed Central

    Baker, Erich J.

    2014-01-01

    Databases are, at their core, abstractions of data and their intentionally derived relationships. They serve as a central organizing metaphor and repository, supporting or augmenting nearly all bioinformatics. Behavioral domains provide a unique stage for contemporary databases, as research in this area spans diverse data types, locations, and data relationships. This chapter provides foundational information on the diversity and prevalence of databases, how data structures support the various needs of behavioral neuroscience analysis and interpretation. The focus is on the classes of databases, data curation, and advanced applications in bioinformatics using examples largely drawn from research efforts in behavioral neuroscience. PMID:23195119

  20. ZFIN, the Zebrafish Model Organism Database: increased support for mutants and transgenics.

    PubMed

    Howe, Douglas G; Bradford, Yvonne M; Conlin, Tom; Eagle, Anne E; Fashena, David; Frazer, Ken; Knight, Jonathan; Mani, Prita; Martin, Ryan; Moxon, Sierra A Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruef, Barbara J; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Sprunger, Brock; Van Slyke, Ceri E; Westerfield, Monte

    2013-01-01

    ZFIN, the Zebrafish Model Organism Database (http://zfin.org), is the central resource for zebrafish genetic, genomic, phenotypic and developmental data. ZFIN curators manually curate and integrate comprehensive data involving zebrafish genes, mutants, transgenics, phenotypes, genotypes, gene expressions, morpholinos, antibodies, anatomical structures and publications. Integrated views of these data, as well as data gathered through collaborations and data exchanges, are provided through a wide selection of web-based search forms. Among the vertebrate model organisms, zebrafish are uniquely well suited for rapid and targeted generation of mutant lines. The recent rapid production of mutants and transgenic zebrafish is making management of data associated with these resources particularly important to the research community. Here, we describe recent enhancements to ZFIN aimed at improving our support for mutant and transgenic lines, including (i) enhanced mutant/transgenic search functionality; (ii) more expressive phenotype curation methods; (iii) new downloads files and archival data access; (iv) incorporation of new data loads from laboratories undertaking large-scale generation of mutant or transgenic lines and (v) new GBrowse tracks for transgenic insertions, genes with antibodies and morpholinos. PMID:23074187

  1. Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR.

    PubMed

    Van Auken, Kimberly; Fey, Petra; Berardini, Tanya Z; Dodson, Robert; Cooper, Laurel; Li, Donghui; Chan, Juancarlos; Li, Yuling; Basu, Siddhartha; Muller, Hans-Michael; Chisholm, Rex; Huala, Eva; Sternberg, Paul W

    2012-01-01

    WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology. PMID:23160413

  2. Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR

    PubMed Central

    Van Auken, Kimberly; Fey, Petra; Berardini, Tanya Z.; Dodson, Robert; Cooper, Laurel; Li, Donghui; Chan, Juancarlos; Li, Yuling; Basu, Siddhartha; Muller, Hans-Michael; Chisholm, Rex; Huala, Eva; Sternberg, Paul W.

    2012-01-01

    WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology. PMID:23160413

  3. Hayabusa Sample Curation in the JAXA's Planetary Material Curation Facility

    NASA Astrophysics Data System (ADS)

    Okada, T.; Abe, M.; Fujimura, A.; Yada, T.; Ishibashi, Y.; Uesugi, M.; Yuzuru, K.; Yakame, S.; Nakamura, T.; Noguchi, T.; Okazaki, R.; Zolensky, M.; Sandford, S.; Ueno, M.; Mukai, T.; Yoshikawa, M.; Kawaguchi, J.

    2011-12-01

    Hayabusa has successfully returned its reentry capsule in Australia on June 13th, 2010. As detailed previously [1], a series of processes have been held in the JAXA's Planetary Material Curation Facility to introduce the sample container of reentry capsule into the pure nitrogen filled clean chamber without influence by water or oxygen, retrieve fine particles found inside the container, characterize them with scanning electron microscope (SEM) with energy dispersive X-ray spectroscopy (EDX), classify them into mineral or rock types, and store them for future analysis. Some of those particles are delivered for initial analysis to catalogue them [2-10]. The facility is demanded to develop new methodologies or train techniques to pick up the recovered samples much finer than originally expected One of them is the electrostatic micro-probe for pickups, and .a trial started to slice the fine samples for detailed analysis of extra-fine structures. Electrostatic nano-probe to be used in SEM is also considered and developed.. To maximize the scientific outputs, the analyses must go on .based on more advanced methodology or sophisticated ideas. So far we have identified those samples as materials from S-class asteroid 25143 Itokawa due to their consistency with results by remote near-infrared and X-rsy spectroscopy: about 1500 ultra-fine particles (mostly smaller than 10 microns) caught by Teflon spatula scooping, and about 100 fine particles (mostly 20-200 microns) collected by compulsory fall onto silica glass plates. Future schedule for sample distribution must be planned. The initial analyses are still in progress, and we will distribute some more of particles recovered. Then some part of the particles will be distributed to NASA, based on the Memorandum of Understanding (MOU) between Japan and U.S.A. for the Hayabusa mission. Finally, in the near future an international Announcement of Opportunity (AO) for sample analyses will be open to any interested researchers In

  4. T3DB: an integrated database for bacterial type III secretion system

    PubMed Central

    2012-01-01

    Background Type III Secretion System (T3SS), which plays important roles in pathogenesis or symbiosis, is widely expressed in a variety of gram negative bacteria. However, lack of unique nomenclature for T3SS genes has hindered T3SS related research. It is necessary to set up a knowledgebase integrating T3SS-related research data to facilitate the communication between different research groups interested in different bacteria. Description A T3SS-related Database (T3DB) was developed. T3DB serves as an integrated platform for sequence collection, function annotation, and ortholog classification for T3SS related apparatus, effector, chaperone and regulatory genes. The collection of T3SS-containing bacteria, T3SS-related genes, function annotation, and the ortholog information were all manually curated from literature. BPBAac, a highly efficient T3SS effector prediction tool, was also implemented. Conclusions T3DB is the first systematic platform integrating well-annotated T3SS-related gene and protein information to facilitate T3SS and bacterial pathogenecity related research. The newly constructed T3 ortholog clusters may faciliate effective communication between different research groups and will promote de novo discoveries. Besides, the manually-curated high-quality effector and chaperone data are useful for feature analysis and evolutionary studies of these important proteins. PMID:22545727

  5. MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database

    PubMed Central

    Davis, Allan Peter; Wiegers, Thomas C.; Rosenstein, Michael C.; Mattingly, Carolyn J.

    2012-01-01

    The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators manually curate a triad of chemical–gene, chemical–disease and gene–disease relationships from the scientific literature. The CTD curation paradigm uses controlled vocabularies for chemicals, genes and diseases. To curate disease information, CTD first had to identify a source of controlled terms. Two resources seemed to be good candidates: the Online Mendelian Inheritance in Man (OMIM) and the ‘Diseases’ branch of the National Library of Medicine's Medical Subject Headers (MeSH). To maximize the advantages of both, CTD biocurators undertook a novel initiative to map the flat list of OMIM disease terms into the hierarchical nature of the MeSH vocabulary. The result is CTD’s ‘merged disease vocabulary’ (MEDIC), a unique resource that integrates OMIM terms, synonyms and identifiers with MeSH terms, synonyms, definitions, identifiers and hierarchical relationships. MEDIC is both a deep and broad vocabulary, composed of 9700 unique diseases described by more than 67 000 terms (including synonyms). It is freely available to download in various formats from CTD. While neither a true ontology nor a perfect solution, this vocabulary has nonetheless proved to be extremely successful and practical for our biocurators in generating over 2.5 million disease-associated toxicogenomic relationships in CTD. Other external databases have also begun to adopt MEDIC for their disease vocabulary. Here, we describe the construction, implementation, maintenance and use of MEDIC to raise awareness of this resource and to offer it as a putative scaffold in the formal construction of an official disease ontology. Database URL: http://ctd.mdibl.org/voc.go?type=disease PMID:22434833

  6. How should the completeness and quality of curated nanomaterial data be evaluated?

    NASA Astrophysics Data System (ADS)

    Marchese Robinson, Richard L.; Lynch, Iseult; Peijnenburg, Willie; Rumble, John; Klaessig, Fred; Marquardt, Clarissa; Rauscher, Hubert; Puzyn, Tomasz; Purian, Ronit; Åberg, Christoffer; Karcher, Sandra; Vriens, Hanne; Hoet, Peter; Hoover, Mark D.; Hendren, Christine Ogilvie; Harper, Stacey L.

    2016-05-01

    Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated?Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict

  7. LeishCyc: a biochemical pathways database for Leishmania major

    PubMed Central

    Doyle, Maria A; MacRae, James I; De Souza, David P; Saunders, Eleanor C; McConville, Malcolm J; Likić, Vladimir A

    2009-01-01

    Background Leishmania spp. are sandfly transmitted protozoan parasites that cause a spectrum of diseases in more than 12 million people worldwide. Much research is now focusing on how these parasites adapt to the distinct nutrient environments they encounter in the digestive tract of the sandfly vector and the phagolysosome compartment of mammalian macrophages. While data mining and annotation of the genomes of three Leishmania species has provided an initial inventory of predicted metabolic components and associated pathways, resources for integrating this information into metabolic networks and incorporating data from transcript, protein, and metabolite profiling studies is currently lacking. The development of a reliable, expertly curated, and widely available model of Leishmania metabolic networks is required to facilitate systems analysis, as well as discovery and prioritization of new drug targets for this important human pathogen. Description The LeishCyc database was initially built from the genome sequence of Leishmania major (v5.2), based on the annotation published by the Wellcome Trust Sanger Institute. LeishCyc was manually curated to remove errors, correct automated predictions, and add information from the literature. The ongoing curation is based on public sources, literature searches, and our own experimental and bioinformatics studies. In a number of instances we have improved on the original genome annotation, and, in some ambiguous cases, collected relevant information from the literature in order to help clarify gene or protein annotation in the future. All genes in LeishCyc are linked to the corresponding entry in GeneDB (Wellcome Trust Sanger Institute). Conclusion The LeishCyc database describes Leishmania major genes, gene products, metabolites, their relationships and biochemical organization into metabolic pathways. LeishCyc provides a systematic approach to organizing the evolving information about Leishmania biochemical networks and is

  8. REFEREE: BIBLIOGRAPHIC DATABASE MANAGER, DOCUMENTATION

    EPA Science Inventory

    The publication is the user's manual for 3.xx releases of REFEREE, a general-purpose bibliographic database management program for IBM-compatible microcomputers. The REFEREE software also is available from NTIS. The manual has two main sections--Quick Tour and References Guide--a...

  9. Trust, but verify: On the importance of chemical structure curation in cheminformatics and QSAR modeling research

    PubMed Central

    Fourches, Denis; Muratov, Eugene; Tropsha, Alexander

    2010-01-01

    Molecular modelers and cheminformaticians typically analyze experimental data generated by other scientists. Consequently, when it comes to data accuracy, cheminformaticians are always at the mercy of data providers who may inadvertently publish (partially) erroneous data. Thus, dataset curation is crucial for any cheminformatics analysis such as similarity searching, clustering, QSAR modeling, virtual screening, etc., especially nowadays when the availability of chemical datasets in public domain has skyrocketed in recent years. Despite the obvious importance of this preliminary step in the computational analysis of any dataset, there appears to be no commonly accepted guidance or set of procedures for chemical data curation. The main objective of this paper is to emphasize the need for a standardized chemical data curation strategy that should be followed at the onset of any molecular modeling investigation. Herein, we discuss several simple but important steps for cleaning chemical records in a database including the removal of a fraction of the data that cannot be appropriately handled by conventional cheminformatics techniques. Such steps include the removal of inorganic and organometallic compounds, counterions, salts and mixtures; structure validation; ring aromatization; normalization of specific chemotypes; curation of tautomeric forms; and the deletion of duplicates. To emphasize the importance of data curation as a mandatory step in data analysis, we discuss several case studies where chemical curation of the original “raw” database enabled the successful modeling study (specifically, QSAR analysis) or resulted in a significant improvement of model's prediction accuracy. We also demonstrate that in some cases rigorously developed QSAR models could be even used to correct erroneous biological data associated with chemical compounds. We believe that good practices for curation of chemical records outlined in this paper will be of value to all

  10. IPD: the Immuno Polymorphism Database.

    PubMed

    Robinson, James; Marsh, Steven G E

    2007-01-01

    The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer cell immunoglobulin-like receptors (KIRs); IPD-MHC, a database of sequences of the major histocompatibility complex (MHC) of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC, share the same database structure. PMID:18449992

  11. Curatable Named-Entity Recognition Using Semantic Relations.

    PubMed

    Hsu, Yi-Yu; Kao, Hung-Yu

    2015-01-01

    Named-entity recognition (NER) plays an important role in the development of biomedical databases. However, the existing NER tools produce multifarious named-entities which may result in both curatable and non-curatable markers. To facilitate biocuration with a straightforward approach, classifying curatable named-entities is helpful with regard to accelerating the biocuration workflow. Co-occurrence Interaction Nexus with Named-entity Recognition (CoINNER) is a web-based tool that allows users to identify genes, chemicals, diseases, and action term mentions in the Comparative Toxicogenomic Database (CTD). To further discover interactions, CoINNER uses multiple advanced algorithms to recognize the mentions in the BioCreative IV CTD Track. CoINNER is developed based on a prototype system that annotated gene, chemical, and disease mentions in PubMed abstracts at BioCreative 2012 Track I (literature triage). We extended our previous system in developing CoINNER. The pre-tagging results of CoINNER were developed based on the state-of-the-art named entity recognition tools in BioCreative III. Next, a method based on conditional random fields (CRFs) is proposed to predict chemical and disease mentions in the articles. Finally, action term mentions were collected by latent Dirichlet allocation (LDA). At the BioCreative IV CTD Track, the best F-measures reached for gene/protein, chemical/drug and disease NER were 54 percent while CoINNER achieved a 61.5 percent F-measure. System URL: http://ikmbio.csie.ncku.edu.tw/coinner/ introduction.htm. PMID:26357317

  12. EVOG: a database for evolutionary analysis of overlapping genes.

    PubMed

    Kim, Dae-Soo; Cho, Chi-Young; Huh, Jae-Won; Kim, Heui-Soo; Cho, Hwan-Gue

    2009-01-01

    Overlapping genes are defined as a pair of genes whose transcripts are overlapped. Recently, many cases of overlapped genes have been investigated in various eukaryotic organisms; however, their origin and transcriptional control mechanism has not yet been clearly determined. In this study, we implemented evolutionary visualizer for overlapping genes (EVOG), a Web-based DB with a novel visualization interface, to investigate the evolutionary relationship between overlapping genes. Using this technique, we collected and analyzed all overlapping genes in human, chimpanzee, orangutan, marmoset, rhesus, cow, dog, mouse, rat, chicken, Xenopus, zebrafish and Drosophila. This integrated database provides a manually curated database that displays the evolutionary features of overlapping genes. The EVOG DB components included a number of overlapping genes (10074 in human, 10,009 in chimpanzee, 67,039 in orangutan, 51,001 in marmoset, 219 in rhesus, 3627 in cow, 209 in dog, 10,700 in mouse, 7987 in rat, 1439 in chicken, 597 in Xenopus, 2457 in zebrafish and 4115 in Drosophila). The EVOG database is very effective and easy to use for the analysis of the evolutionary process of overlapping genes when comparing different species. Therefore, EVOG could potentially be used as the main tool to investigate the evolution of the human genome in relation to disease by comparing the expression profiles of overlapping genes. EVOG is available at http://neobio.cs.pusan.ac.kr/evog/. PMID:18986995

  13. Use of Semantic Technology to Create Curated Data Albums

    NASA Technical Reports Server (NTRS)

    Ramachandran, Rahul; Kulkarni, Ajinkya; Li, Xiang; Sainju, Roshan; Bakare, Rohan; Basyal, Sabin

    2014-01-01

    One of the continuing challenges in any Earth science investigation is the discovery and access of useful science content from the increasingly large volumes of Earth science data and related information available online. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. Those who know exactly the data sets they need can obtain the specific files using these systems. However, in cases where researchers are interested in studying an event of research interest, they must manually assemble a variety of relevant data sets by searching the different distributed data systems. Consequently, there is a need to design and build specialized search and discover tools in Earth science that can filter through large volumes of distributed online data and information and only aggregate the relevant resources needed to support climatology and case studies. This paper presents a specialized search and discovery tool that automatically creates curated Data Albums. The tool was designed to enable key elements of the search process such as dynamic interaction and sense-making. The tool supports dynamic interaction via different modes of interactivity and visual presentation of information. The compilation of information and data into a Data Album is analogous to a shoebox within the sense-making framework. This tool automates most of the tedious information/data gathering tasks for researchers. Data curation by the tool is achieved via an ontology-based, relevancy ranking algorithm that filters out nonrelevant information and data. The curation enables better search results as compared to the simple keyword searches provided by existing data systems in Earth science.

  14. Use of Semantic Technology to Create Curated Data Albums

    NASA Technical Reports Server (NTRS)

    Ramachandran, Rahul; Kulkarni, Ajinkya; Li, Xiang; Sainju, Roshan; Bakare, Rohan; Basyal, Sabin; Fox, Peter (Editor); Norack, Tom (Editor)

    2014-01-01

    One of the continuing challenges in any Earth science investigation is the discovery and access of useful science content from the increasingly large volumes of Earth science data and related information available online. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. Those who know exactly the data sets they need can obtain the specific files using these systems. However, in cases where researchers are interested in studying an event of research interest, they must manually assemble a variety of relevant data sets by searching the different distributed data systems. Consequently, there is a need to design and build specialized search and discovery tools in Earth science that can filter through large volumes of distributed online data and information and only aggregate the relevant resources needed to support climatology and case studies. This paper presents a specialized search and discovery tool that automatically creates curated Data Albums. The tool was designed to enable key elements of the search process such as dynamic interaction and sense-making. The tool supports dynamic interaction via different modes of interactivity and visual presentation of information. The compilation of information and data into a Data Album is analogous to a shoebox within the sense-making framework. This tool automates most of the tedious information/data gathering tasks for researchers. Data curation by the tool is achieved via an ontology-based, relevancy ranking algorithm that filters out non-relevant information and data. The curation enables better search results as compared to the simple keyword searches provided by existing data systems in Earth science.

  15. IMAT graphics manual

    NASA Technical Reports Server (NTRS)

    Stockwell, Alan E.; Cooper, Paul A.

    1991-01-01

    The Integrated Multidisciplinary Analysis Tool (IMAT) consists of a menu driven executive system coupled with a relational database which links commercial structures, structural dynamics and control codes. The IMAT graphics system, a key element of the software, provides a common interface for storing, retrieving, and displaying graphical information. The IMAT Graphics Manual shows users of commercial analysis codes (MATRIXx, MSC/NASTRAN and I-DEAS) how to use the IMAT graphics system to obtain high quality graphical output using familiar plotting procedures. The manual explains the key features of the IMAT graphics system, illustrates their use with simple step-by-step examples, and provides a reference for users who wish to take advantage of the flexibility of the software to customize their own applications.

  16. Biosafety Manual

    SciTech Connect

    King, Bruce W.

    2010-05-18

    Work with or potential exposure to biological materials in the course of performing research or other work activities at Lawrence Berkeley National Laboratory (LBNL) must be conducted in a safe, ethical, environmentally sound, and compliant manner. Work must be conducted in accordance with established biosafety standards, the principles and functions of Integrated Safety Management (ISM), this Biosafety Manual, Chapter 26 (Biosafety) of the Health and Safety Manual (PUB-3000), and applicable standards and LBNL policies. The purpose of the Biosafety Program is to protect workers, the public, agriculture, and the environment from exposure to biological agents or materials that may cause disease or other detrimental effects in humans, animals, or plants. This manual provides workers; line management; Environment, Health, and Safety (EH&S) Division staff; Institutional Biosafety Committee (IBC) members; and others with a comprehensive overview of biosafety principles, requirements from biosafety standards, and measures needed to control biological risks in work activities and facilities at LBNL.

  17. Manual for ERIC Awareness Workshop.

    ERIC Educational Resources Information Center

    Strohmenger, C. Todd; Lanham, Berma A.

    This manual, designed to be used with a video tape, provides information for conducting a workshop to familiarize educators with the Educational Resources Information Center (ERIC). Objectives of the workshop include: (1) to develop an understanding of the contents and structure of the ERIC database; (2) to develop an understanding of ERIC as a…

  18. Carbohydrate structure database merged from bacterial, archaeal, plant and fungal parts.

    PubMed

    Toukach, Philip V; Egorova, Ksenia S

    2016-01-01

    The Carbohydrate Structure Databases (CSDBs, http://csdb.glycoscience.ru) store structural, bibliographic, taxonomic, NMR spectroscopic, and other data on natural carbohydrates and their derivatives published in the scientific literature. The CSDB project was launched in 2005 for bacterial saccharides (as BCSDB). Currently, it includes two parts, the Bacterial CSDB and the Plant&Fungal CSDB. In March 2015, these databases were merged to the single CSDB. The combined CSDB includes information on bacterial and archaeal glycans and derivatives (the coverage is close to complete), as well as on plant and fungal glycans and glycoconjugates (almost all structures published up to 1998). CSDB is regularly updated via manual expert annotation of original publications. Both newly annotated data and data imported from other databases are manually curated. The CSDB data are exportable in a number of modern formats, such as GlycoRDF. CSDB provides additional services for simulation of (1)H, (13)C and 2D NMR spectra of saccharides, NMR-based structure prediction, glycan-based taxon clustering and other. PMID:26286194

  19. Carbohydrate structure database merged from bacterial, archaeal, plant and fungal parts

    PubMed Central

    Toukach, Philip V.; Egorova, Ksenia S.

    2016-01-01

    The Carbohydrate Structure Databases (CSDBs, http://csdb.glycoscience.ru) store structural, bibliographic, taxonomic, NMR spectroscopic, and other data on natural carbohydrates and their derivatives published in the scientific literature. The CSDB project was launched in 2005 for bacterial saccharides (as BCSDB). Currently, it includes two parts, the Bacterial CSDB and the Plant&Fungal CSDB. In March 2015, these databases were merged to the single CSDB. The combined CSDB includes information on bacterial and archaeal glycans and derivatives (the coverage is close to complete), as well as on plant and fungal glycans and glycoconjugates (almost all structures published up to 1998). CSDB is regularly updated via manual expert annotation of original publications. Both newly annotated data and data imported from other databases are manually curated. The CSDB data are exportable in a number of modern formats, such as GlycoRDF. CSDB provides additional services for simulation of 1H, 13C and 2D NMR spectra of saccharides, NMR-based structure prediction, glycan-based taxon clustering and other. PMID:26286194

  20. Plant Omics Data Center: An Integrated Web Repository for Interspecies Gene Expression Networks with NLP-Based Curation

    PubMed Central

    Ohyanagi, Hajime; Takano, Tomoyuki; Terashima, Shin; Kobayashi, Masaaki; Kanno, Maasa; Morimoto, Kyoko; Kanegae, Hiromi; Sasaki, Yohei; Saito, Misa; Asano, Satomi; Ozaki, Soichi; Kudo, Toru; Yokoyama, Koji; Aya, Koichiro; Suwabe, Keita; Suzuki, Go; Aoki, Koh; Kubo, Yasutaka; Watanabe, Masao; Matsuoka, Makoto; Yano, Kentaro

    2015-01-01

    Comprehensive integration of large-scale omics resources such as genomes, transcriptomes and metabolomes will provide deeper insights into broader aspects of molecular biology. For better understanding of plant biology, we aim to construct a next-generation sequencing (NGS)-derived gene expression network (GEN) repository for a broad range of plant species. So far we have incorporated information about 745 high-quality mRNA sequencing (mRNA-Seq) samples from eight plant species (Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Sorghum bicolor, Vitis vinifera, Solanum tuberosum, Medicago truncatula and Glycine max) from the public short read archive, digitally profiled the entire set of gene expression profiles, and drawn GENs by using correspondence analysis (CA) to take advantage of gene expression similarities. In order to understand the evolutionary significance of the GENs from multiple species, they were linked according to the orthology of each node (gene) among species. In addition to other gene expression information, functional annotation of the genes will facilitate biological comprehension. Currently we are improving the given gene annotations with natural language processing (NLP) techniques and manual curation. Here we introduce the current status of our analyses and the web database, PODC (Plant Omics Data Center; http://bioinf.mind.meiji.ac.jp/podc/), now open to the public, providing GENs, functional annotations and additional comprehensive omics resources. PMID:25505034

  1. Reflections on curative health care in Nicaragua.

    PubMed Central

    Slater, R G

    1989-01-01

    Improved health care in Nicaragua is a major priority of the Sandinista revolution; it has been pursued by major reforms of the national health care system, something few developing countries have attempted. In addition to its internationally recognized advances in public health, considerable progress has been made in health care delivery by expanding curative medical services through training more personnel and building more facilities to fulfill a commitment to free universal health coverage. The very uneven quality of medical care is the leading problem facing curative medicine now. Underlying factors include the difficulty of adequately training the greatly increased number of new physicians. Misdiagnosis and mismanagement continue to be major problems. The curative medical system is not well coordinated with the preventive sector. Recent innovations include initiation of a "medicina integral" residency, similar to family practice. Despite its inadequacies and the handicaps of war and poverty, the Nicaraguan curative medical system has made important progress. PMID:2705603

  2. EpiFactors: a comprehensive database of human epigenetic factors and complexes

    PubMed Central

    Medvedeva, Yulia A.; Lennartsson, Andreas; Ehsani, Rezvan; Kulakovskiy, Ivan V.; Vorontsov, Ilya E.; Panahandeh, Pouda; Khimulya, Grigory; Kasukawa, Takeya; Drabløs, Finn

    2015-01-01

    Epigenetics refers to stable and long-term alterations of cellular traits that are not caused by changes in the DNA sequence per se. Rather, covalent modifications of DNA and histones affect gene expression and genome stability via proteins that recognize and act upon such modifications. Many enzymes that catalyse epigenetic modifications or are critical for enzymatic complexes have been discovered, and this is encouraging investigators to study the role of these proteins in diverse normal and pathological processes. Rapidly growing knowledge in the area has resulted in the need for a resource that compiles, organizes and presents curated information to the researchers in an easily accessible and user-friendly form. Here we present EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets and products. EpiFactors contains information on 815 proteins, including 95 histones and protamines. For 789 of these genes, we include expressions values across several samples, in particular a collection of 458 human primary cell samples (for approximately 200 cell types, in many cases from three individual donors), covering most mammalian cell steady states, 255 different cancer cell lines (representing approximately 150 cancer subtypes) and 134 human postmortem tissues. Expression values were obtained by the FANTOM5 consortium using Cap Analysis of Gene Expression technique. EpiFactors also contains information on 69 protein complexes that are involved in epigenetic regulation. The resource is practical for a wide range of users, including biologists, pharmacologists and clinicians. Database URL: http://epifactors.autosome.ru PMID:26153137

  3. The Comparative Toxicogenomics Database: update 2011

    PubMed Central

    Davis, Allan Peter; King, Benjamin L.; Mockus, Susan; Murphy, Cynthia G.; Saraceni-Richards, Cynthia; Rosenstein, Michael; Wiegers, Thomas; Mattingly, Carolyn J.

    2011-01-01

    The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the interaction of environmental chemicals with gene products, and their effects on human health. Biocurators at CTD manually curate a triad of chemical–gene, chemical–disease and gene–disease relationships from the literature. These core data are then integrated to construct chemical–gene–disease networks and to predict many novel relationships using different types of associated data. Since 2009, we dramatically increased the content of CTD to 1.4 million chemical–gene–disease data points and added many features, statistical analyses and analytical tools, including GeneComps and ChemComps (to find comparable genes and chemicals that share toxicogenomic profiles), enriched Gene Ontology terms associated with chemicals, statistically ranked chemical–disease inferences, Venn diagram tools to discover overlapping and unique attributes of any set of chemicals, genes or disease, and enhanced gene pathway data content, among other features. Together, this wealth of expanded chemical–gene–disease data continues to help users generate testable hypotheses about the molecular mechanisms of environmental diseases. CTD is freely available at http://ctd.mdibl.org. PMID:20864448

  4. The Comparative Toxicogenomics Database: update 2011.

    PubMed

    Davis, Allan Peter; King, Benjamin L; Mockus, Susan; Murphy, Cynthia G; Saraceni-Richards, Cynthia; Rosenstein, Michael; Wiegers, Thomas; Mattingly, Carolyn J

    2011-01-01

    The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the interaction of environmental chemicals with gene products, and their effects on human health. Biocurators at CTD manually curate a triad of chemical-gene, chemical-disease and gene-disease relationships from the literature. These core data are then integrated to construct chemical-gene-disease networks and to predict many novel relationships using different types of associated data. Since 2009, we dramatically increased the content of CTD to 1.4 million chemical-gene-disease data points and added many features, statistical analyses and analytical tools, including GeneComps and ChemComps (to find comparable genes and chemicals that share toxicogenomic profiles), enriched Gene Ontology terms associated with chemicals, statistically ranked chemical-disease inferences, Venn diagram tools to discover overlapping and unique attributes of any set of chemicals, genes or disease, and enhanced gene pathway data content, among other features. Together, this wealth of expanded chemical-gene-disease data continues to help users generate testable hypotheses about the molecular mechanisms of environmental diseases. CTD is freely available at http://ctd.mdibl.org. PMID:20864448

  5. Uniform curation protocol of metazoan signaling pathways to predict novel signaling components.

    PubMed

    Pálfy, Máté; Farkas, Illés J; Vellai, Tibor; Korcsmáros, Tamás

    2013-01-01

    A relatively large number of signaling databases available today have strongly contributed to our understanding of signaling pathway properties. However, pathway comparisons both within and across databases are currently severely hampered by the large variety of data sources and the different levels of detail of their information content (on proteins and interactions). In this chapter, we present a protocol for a uniform curation method of signaling pathways, which intends to overcome this insufficiency. This uniformly curated database called SignaLink ( http://signalink.org ) allows us to systematically transfer pathway annotations between different species, based on orthology, and thereby to predict novel signaling pathway components. Thus, this method enables the compilation of a comprehensive signaling map of a given species and identification of new potential drug targets in humans. We strongly believe that the strict curation protocol we have established to compile a signaling pathway database can also be applied for the compilation of other (e.g., metabolic) databases. Similarly, the detailed guide to the orthology-based prediction of novel signaling components across species may also be utilized for predicting components of other biological processes. PMID:23715991

  6. Coaches' Manual.

    ERIC Educational Resources Information Center

    National Council of Secondary School Athletic Directors, Washington, DC.

    This manual focuses on the coach's relationships and interactions with students, school personnel, civic groups, and community agencies. The first chapter examines how athletics, as an integral part of education, can make a significant contribution (a) to the development of the individual, (b) in meeting society's needs, and (c) in transmitting…

  7. Computer Manual.

    ERIC Educational Resources Information Center

    Illinois State Office of Education, Springfield.

    This manual designed to provide the teacher with methods of understanding the computer and its potential in the classroom includes four units with exercises and an answer sheet. Unit 1 covers computer fundamentals, the mini computer, programming languages, an introduction to BASIC, and control instructions. Variable names and constants described…

  8. Boilermaking Manual.

    ERIC Educational Resources Information Center

    British Columbia Dept. of Education, Victoria.

    This manual is intended (1) to provide an information resource to supplement the formal training program for boilermaker apprentices; (2) to assist the journeyworker to build on present knowledge to increase expertise and qualify for formal accreditation in the boilermaking trade; and (3) to serve as an on-the-job reference with sound, up-to-date…

  9. Student Manual.

    ERIC Educational Resources Information Center

    Stapleton, Diana L., Comp.

    This manual for student assistants employed in the government document section of the Eastern Kentucky University Library covers policy and procedures and use of the major reference tools in this area. General policies and procedures relating to working hours and conditions, and general responsibilities are discussed, as well as shelving rules and…

  10. Biological databases for human research.

    PubMed

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-02-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. PMID:25712261

  11. Biological Databases for Human Research

    PubMed Central

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-01-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. PMID:25712261

  12. Tools and databases of the KOMICS web portal for preprocessing, mining, and dissemination of metabolomics data.

    PubMed

    Sakurai, Nozomu; Ara, Takeshi; Enomoto, Mitsuo; Motegi, Takeshi; Morishita, Yoshihiko; Kurabayashi, Atsushi; Iijima, Yoko; Ogata, Yoshiyuki; Nakajima, Daisuke; Suzuki, Hideyuki; Shibata, Daisuke

    2014-01-01

    A metabolome--the collection of comprehensive quantitative data on metabolites in an organism--has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data. PMID:24949426

  13. MVsCarta: A protein database of matrix vesicles to aid understanding of biomineralization.

    PubMed

    Cui, Yazhou; Xu, Quan; Luan, Jing; Hu, Shichang; Pan, Jianbo; Han, Jinxiang; Ji, Zhiliang

    2015-06-01

    Matrix vesicles (MVs) are membranous nanovesicles released by chondrocytes, osteoblasts, and odontoblasts. They play a critical role in modulating mineralization. Here, we present a manually curated database of MV proteins, namely MVsCara to provide comprehensive information on MVs of protein components. In the current version, the database contains 2,713 proteins of six organisms identified in bone, cartilage, tooth tissues, and cells capable of producing a mineralized bone matrix. The MVsCarta database is now freely assessed at http://bioinf.xmu.edu.cn/MVsCarta. The search and browse methods were developed for better retrieval of data. In addition, bioinformatic tools like Gene Ontology (GO) analysis, network visualization and protein-protein interaction analysis were implemented for a functional understanding of MVs components. Similar database hasn't been reported yet. We believe that this free web-based database might serve as a useful repository to elucidate the novel function and regulation of MVs during mineralization, and to stimulate the advancement of MV studies. PMID:26166372

  14. FR database 1.0: a resource focused on fruit development and ripening

    PubMed Central

    Yue, Junyang; Ma, Xiaojing; Ban, Rongjun; Huang, Qianli; Wang, Wenjie; Liu, Jia; Liu, Yongsheng

    2015-01-01

    Fruits form unique growing period in the life cycle of higher plants. They provide essential nutrients and have beneficial effects on human health. Characterizing the genes involved in fruit development and ripening is fundamental to understanding the biological process and improving horticultural crops. Although, numerous genes that have been characterized are participated in regulating fruit development and ripening at different stages, no dedicated bioinformatic resource for fruit development and ripening is available. In this study, we have developed such a database, FR database 1.0, using manual curation from 38 423 articles published before 1 April 2014, and integrating protein interactomes and several transcriptome datasets. It provides detailed information for 904 genes derived from 53 organisms reported to participate in fleshy fruit development and ripening. Genes from climacteric and non-climacteric fruits are also annotated, with several interesting Gene Ontology (GO) terms being enriched for these two gene sets and seven ethylene-related GO terms found only in the climacteric fruit group. Furthermore, protein–protein interaction analysis by integrating information from FR database presents the possible function network that affects fleshy fruit size formation. Collectively, FR database will be a valuable platform for comprehensive understanding and future experiments in fruit biology. Database URL: http://www.fruitech.org/ PMID:25725058

  15. The InterPro Database, 2003 brings increased coverage and new features.

    PubMed

    Mulder, Nicola J; Apweiler, Rolf; Attwood, Teresa K; Bairoch, Amos; Barrell, Daniel; Bateman, Alex; Binns, David; Biswas, Margaret; Bradley, Paul; Bork, Peer; Bucher, Phillip; Copley, Richard R; Courcelle, Emmanuel; Das, Ujjwal; Durbin, Richard; Falquet, Laurent; Fleischmann, Wolfgang; Griffiths-Jones, Sam; Haft, Daniel; Harte, Nicola; Hulo, Nicolas; Kahn, Daniel; Kanapin, Alexander; Krestyaninova, Maria; Lopez, Rodrigo; Letunic, Ivica; Lonsdale, David; Silventoinen, Ville; Orchard, Sandra E; Pagni, Marco; Peyruc, David; Ponting, Chris P; Selengut, Jeremy D; Servant, Florence; Sigrist, Christian J A; Vaughan, Robert; Zdobnov, Evgueni M

    2003-01-01

    InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). PMID:12520011

  16. The InterPro Database, 2003 brings increased coverage and new features

    PubMed Central

    Mulder, Nicola J.; Apweiler, Rolf; Attwood, Teresa K.; Bairoch, Amos; Barrell, Daniel; Bateman, Alex; Binns, David; Biswas, Margaret; Bradley, Paul; Bork, Peer; Bucher, Phillip; Copley, Richard R.; Courcelle, Emmanuel; Das, Ujjwal; Durbin, Richard; Falquet, Laurent; Fleischmann, Wolfgang; Griffiths-Jones, Sam; Haft, Daniel; Harte, Nicola; Hulo, Nicolas; Kahn, Daniel; Kanapin, Alexander; Krestyaninova, Maria; Lopez, Rodrigo; Letunic, Ivica; Lonsdale, David; Silventoinen, Ville; Orchard, Sandra E.; Pagni, Marco; Peyruc, David; Ponting, Chris P.; Selengut, Jeremy D.; Servant, Florence; Sigrist, Christian J. A.; Vaughan, Robert; Zdobnov, Evgueni M.

    2003-01-01

    InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). PMID:12520011

  17. Tools and Databases of the KOMICS Web Portal for Preprocessing, Mining, and Dissemination of Metabolomics Data

    PubMed Central

    Enomoto, Mitsuo; Morishita, Yoshihiko; Kurabayashi, Atsushi; Iijima, Yoko; Ogata, Yoshiyuki; Nakajima, Daisuke; Suzuki, Hideyuki; Shibata, Daisuke

    2014-01-01

    A metabolome—the collection of comprehensive quantitative data on metabolites in an organism—has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data. PMID:24949426

  18. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    SciTech Connect

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  19. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  20. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

    PubMed

    Reddy, T B K; Thomas, Alex D; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A; Kyrpides, Nikos C

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  1. HistoneDB 2.0: a histone database with variants—an integrated resource to explore histones and their variants

    PubMed Central

    Draizen, Eli J.; Shaytan, Alexey K.; Mariño-Ramírez, Leonardo; Talbert, Paul B.; Landsman, David; Panchenko, Anna R.

    2016-01-01

    Compaction of DNA into chromatin is a characteristic feature of eukaryotic organisms. The core (H2A, H2B, H3, H4) and linker (H1) histone proteins are responsible for this compaction through the formation of nucleosomes and higher order chromatin aggregates. Moreover, histones are intricately involved in chromatin functioning and provide a means for genome dynamic regulation through specific histone variants and histone post-translational modifications. ‘HistoneDB 2.0 – with variants’ is a comprehensive database of histone protein sequences, classified by histone types and variants. All entries in the database are supplemented by rich sequence and structural annotations with many interactive tools to explore and compare sequences of different variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant. HistoneDB 2.0 is a resource for the interactive comparative analysis of histone protein sequences and their implications for chromatin function. Database URL: http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0 PMID:26989147

  2. Curating NASA's Past, Present, and Future Extraterrestrial Sample Collections

    NASA Technical Reports Server (NTRS)

    McCubbin, F. M.; Allton, J. H.; Evans, C. A.; Fries, M. D.; Nakamura-Messenger, K.; Righter, K.; Zeigler, R. A.; Zolensky, M.; Stansbery, E. K.

    2016-01-01

    The Astromaterials Acquisition and Curation Office (henceforth referred to herein as NASA Curation Office) at NASA Johnson Space Center (JSC) is responsible for curating all of NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials", JSC is charged with "...curation of all extra-terrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including "...documentation, preservation, preparation, and distribution of samples for research, education, and public outreach." Here we describe some of the past, present, and future activities of the NASA Curation Office.

  3. Curcumin Resource Database

    PubMed Central

    Kumar, Anil; Chetia, Hasnahana; Sharma, Swagata; Kabiraj, Debajyoti; Talukdar, Narayan Chandra; Bora, Utpal

    2015-01-01

    Curcumin is one of the most intensively studied diarylheptanoid, Curcuma longa being its principal producer. This apart, a class of promising curcumin analogs has been generated in laboratories, aptly named as Curcuminoids which are showing huge potential in the fields of medicine, food technology, etc. The lack of a universal source of data on curcumin as well as curcuminoids has been felt by the curcumin research community for long. Hence, in an attempt to address this stumbling block, we have developed Curcumin Resource Database (CRDB) that aims to perform as a gateway-cum-repository to access all relevant data and related information on curcumin and its analogs. Currently, this database encompasses 1186 curcumin analogs, 195 molecular targets, 9075 peer reviewed publications, 489 patents and 176 varieties of C. longa obtained by extensive data mining and careful curation from numerous sources. Each data entry is identified by a unique CRDB ID (identifier). Furnished with a user-friendly web interface and in-built search engine, CRDB provides well-curated and cross-referenced information that are hyperlinked with external sources. CRDB is expected to be highly useful to the researchers working on structure as well as ligand-based molecular design of curcumin analogs. Database URL: http://www.crdb.in PMID:26220923

  4. The Listeria monocytogenes strain 10403S BioCyc database.

    PubMed

    Orsi, Renato H; Bergholz, Teresa M; Wiedmann, Martin; Boor, Kathryn J

    2015-01-01

    Listeria monocytogenes is a food-borne pathogen of humans and other animals. The striking ability to survive several stresses usually used for food preservation makes L. monocytogenes one of the biggest concerns to the food industry, while the high mortality of listeriosis in specific groups of humans makes it a great concern for public health. Previous studies have shown that a regulatory network involving alternative sigma (σ) factors and transcription factors is pivotal to stress survival. However, few studies have evaluated at the metabolic networks controlled by these regulatory mechanisms. The L. monocytogenes BioCyc database uses the strain 10403S as a model. Computer-generated initial annotation for all genes also allowed for identification, annotation and display of predicted reactions and pathways carried out by a single cell. Further ongoing manual curation based on published data as well as database mining for selected genes allowed the more refined annotation of functions, which, in turn, allowed for annotation of new pathways and fine-tuning of previously defined pathways to more L. monocytogenes-specific pathways. Using RNA-Seq data, several transcription start sites and promoter regions were mapped to the 10403S genome and annotated within the database. Additionally, the identification of promoter regions and a comprehensive review of available literature allowed the annotation of several regulatory interactions involving σ factors and transcription factors. The L. monocytogenes 10403S BioCyc database is a new resource for researchers studying Listeria and related organisms. It allows users to (i) have a comprehensive view of all reactions and pathways predicted to take place within the cell in the cellular overview, as well as to (ii) upload their own data, such as differential expression data, to visualize the data in the scope of predicted pathways and regulatory networks and to carry on enrichment analyses using several different annotations

  5. National Radiobiology Archives Distributed Access user's manual

    SciTech Connect

    Watson, C.; Smith, S. ); Prather, J. )

    1991-11-01

    This User's Manual describes installation and use of the National Radiobiology Archives (NRA) Distributed Access package. The package consists of a distributed subset of information representative of the NRA databases and database access software which provide an introduction to the scope and style of the NRA Information Systems.

  6. Construction of biological networks from unstructured information based on a semi-automated curation workflow.

    PubMed

    Szostak, Justyna; Ansari, Sam; Madan, Sumit; Fluck, Juliane; Talikka, Marja; Iskandar, Anita; De Leon, Hector; Hofmann-Apitius, Martin; Peitsch, Manuel C; Hoeng, Julia

    2015-01-01

    Capture and representation of scientific knowledge in a structured format are essential to improve the understanding of biological mechanisms involved in complex diseases. Biological knowledge and knowledge about standardized terminologies are difficult to capture from literature in a usable form. A semi-automated knowledge extraction workflow is presented that was developed to allow users to extract causal and correlative relationships from scientific literature and to transcribe them into the computable and human readable Biological Expression Language (BEL). The workflow combines state-of-the-art linguistic tools for recognition of various entities and extraction of knowledge from literature sources. Unlike most other approaches, the workflow outputs the results to a curation interface for manual curation and converts them into BEL documents that can be compiled to form biological networks. We developed a new semi-automated knowledge extraction workflow that was designed to capture and organize scientific knowledge and reduce the required curation skills and effort for this task. The workflow was used to build a network that represents the cellular and molecular mechanisms implicated in atherosclerotic plaque destabilization in an apolipoprotein-E-deficient (ApoE(-/-)) mouse model. The network was generated using knowledge extracted from the primary literature. The resultant atherosclerotic plaque destabilization network contains 304 nodes and 743 edges supported by 33 PubMed referenced articles. A comparison between the semi-automated and conventional curation processes showed similar results, but significantly reduced curation effort for the semi-automated process. Creating structured knowledge from unstructured text is an important step for the mechanistic interpretation and reusability of knowledge. Our new semi-automated knowledge extraction workflow reduced the curation skills and effort required to capture and organize scientific knowledge. The

  7. LiverCancerMarkerRIF: a liver cancer biomarker interactive curation system combining text mining and expert annotations

    PubMed Central

    Dai, Hong-Jie; Wu, Johnny Chi-Yang; Lin, Wei-San; Reyes, Aaron James F.; dela Rosa, Mira Anne C.; Syed-Abdul, Shabbir; Tsai, Richard Tzong-Han; Hsu, Wen-Lian

    2014-01-01

    Biomarkers are biomolecules in the human body that can indicate disease states and abnormal biological processes. Biomarkers are often used during clinical trials to identify patients with cancers. Although biomedical research related to biomarkers has increased over the years and substantial effort has been expended to obtain results in these studies, the specific results obtained often contain ambiguities, and the results might contradict each other. Therefore, the information gathered from these studies must be appropriately integrated and organized to facilitate experimentation on biomarkers. In this study, we used liver cancer as the target and developed a text-mining–based curation system named LiverCancerMarkerRIF, which allows users to retrieve biomarker-related narrations and curators to curate supporting evidence on liver cancer biomarkers directly while browsing PubMed. In contrast to most of the other curation tools that require curators to navigate away from PubMed and accommodate distinct user interfaces or Web sites to complete the curation process, our system provides a user-friendly method for accessing text-mining–aided information and a concise interface to assist curators while they remain at the PubMed Web site. Biomedical text-mining techniques are applied to automatically recognize biomedical concepts such as genes, microRNA, diseases and investigative technologies, which can be used to evaluate the potential of a certain gene as a biomarker. Through the participation in the BioCreative IV user-interactive task, we examined the feasibility of using this novel type of augmented browsing-based curation method, and collaborated with curators to curate biomarker evidential sentences related to liver cancer. The positive feedback received from curators indicates that the proposed method can be effectively used for curation. A publicly available online database containing all the aforementioned information has been constructed at http

  8. DESTAF: a database of text-mined associations for reproductive toxins potentially affecting human fertility.

    PubMed

    Dawe, Adam S; Radovanovic, Aleksandar; Kaur, Mandeep; Sagar, Sunil; Seshadri, Sundararajan V; Schaefer, Ulf; Kamau, Allan A; Christoffels, Alan; Bajic, Vladimir B

    2012-01-01

    The Dragon Exploration System for Toxicants and Fertility (DESTAF) is a publicly available resource which enables researchers to efficiently explore both known and potentially novel information and associations in the field of reproductive toxicology. To create DESTAF we used data from the literature (including over 10500 PubMed abstracts), several publicly available biomedical repositories, and specialized, curated dictionaries. DESTAF has an interface designed to facilitate rapid assessment of the key associations between relevant concepts, allowing for a more in-depth exploration of information based on different gene/protein-, enzyme/metabolite-, toxin/chemical-, disease- or anatomically centric perspectives. As a special feature, DESTAF allows for the creation and initial testing of potentially new association hypotheses that suggest links between biological entities identified through the database. DESTAF, along with a PDF manual, can be found at http://cbrc.kaust.edu.sa/destaf. It is free to academic and non-commercial users and will be updated quarterly. PMID:22198179

  9. AtomPy: an open atomic-data curation environment

    NASA Astrophysics Data System (ADS)

    Bautista, Manuel; Mendoza, Claudio; Boswell, Josiah S; Ajoku, Chukwuemeka

    2014-06-01

    We present a cloud-computing environment for atomic data curation, networking among atomic data providers and users, teaching-and-learning, and interfacing with spectral modeling software. The system is based on Google-Drive Sheets, Pandas (Python Data Analysis Library) DataFrames, and IPython Notebooks for open community-driven curation of atomic data for scientific and technological applications. The atomic model for each ionic species is contained in a multi-sheet Google-Drive workbook, where the atomic parameters from all known public sources are progressively stored. Metadata (provenance, community discussion, etc.) accompanying every entry in the database are stored through Notebooks. Education tools on the physics of atomic processes as well as their relevance to plasma and spectral modeling are based on IPython Notebooks that integrate written material, images, videos, and active computer-tool workflows. Data processing workflows and collaborative software developments are encouraged and managed through the GitHub social network. Relevant issues this platform intends to address are: (i) data quality by allowing open access to both data producers and users in order to attain completeness, accuracy, consistency, provenance and currentness; (ii) comparisons of different datasets to facilitate accuracy assessment; (iii) downloading to local data structures (i.e. Pandas DataFrames) for further manipulation and analysis by prospective users; and (iv) data preservation by avoiding the discard of outdated sets.

  10. RefSeq curation and annotation of antizyme and antizyme inhibitor genes in vertebrates

    PubMed Central

    Rajput, Bhanu; Murphy, Terence D.; Pruitt, Kim D.

    2015-01-01

    Polyamines are ubiquitous cations that are involved in regulating fundamental cellular processes such as cell growth and proliferation; hence, their intracellular concentration is tightly regulated. Antizyme and antizyme inhibitor have a central role in maintaining cellular polyamine levels. Antizyme is unique in that it is expressed via a novel programmed ribosomal frameshifting mechanism. Conventional computational tools are unable to predict a programmed frameshift, resulting in misannotation of antizyme transcripts and proteins on transcript and genomic sequences. Correct annotation of a programmed frameshifting event requires manual evaluation. Our goal was to provide an accurately curated and annotated Reference Sequence (RefSeq) data set of antizyme transcript and protein records across a broad taxonomic scope that would serve as standards for accurate representation of these gene products. As antizyme and antizyme inhibitor proteins are functionally connected, we also curated antizyme inhibitor genes to more fully represent the elegant biology of polyamine regulation. Manual review of genes for three members of the antizyme family and two members of the antizyme inhibitor family in 91 vertebrate organisms resulted in a total of 461 curated RefSeq records. PMID:26170238

  11. RefSeq curation and annotation of antizyme and antizyme inhibitor genes in vertebrates.

    PubMed

    Rajput, Bhanu; Murphy, Terence D; Pruitt, Kim D

    2015-09-01

    Polyamines are ubiquitous cations that are involved in regulating fundamental cellular processes such as cell growth and proliferation; hence, their intracellular concentration is tightly regulated. Antizyme and antizyme inhibitor have a central role in maintaining cellular polyamine levels. Antizyme is unique in that it is expressed via a novel programmed ribosomal frameshifting mechanism. Conventional computational tools are unable to predict a programmed frameshift, resulting in misannotation of antizyme transcripts and proteins on transcript and genomic sequences. Correct annotation of a programmed frameshifting event requires manual evaluation. Our goal was to provide an accurately curated and annotated Reference Sequence (RefSeq) data set of antizyme transcript and protein records across a broad taxonomic scope that would serve as standards for accurate representation of these gene products. As antizyme and antizyme inhibitor proteins are functionally connected, we also curated antizyme inhibitor genes to more fully represent the elegant biology of polyamine regulation. Manual review of genes for three members of the antizyme family and two members of the antizyme inhibitor family in 91 vertebrate organisms resulted in a total of 461 curated RefSeq records. PMID:26170238

  12. TarNet: An Evidence-Based Database for Natural Medicine Research

    PubMed Central

    Ren, Guomin; Sun, Guibo; Sun, Xiaobo

    2016-01-01

    Background Complex diseases seriously threaten human health. Drug discovery approaches based on “single genes, single drugs, and single targets” are limited in targeting complex diseases. The development of new multicomponent drugs for complex diseases is imperative, and the establishment of a suitable solution for drug group-target protein network analysis is a key scientific problem that must be addressed. Herbal medicines have formed the basis of sophisticated systems of traditional medicine and have given rise to some key drugs that remain in use today. The search for new molecules is currently taking a different route, whereby scientific principles of ethnobotany and ethnopharmacognosy are being used by chemists in the discovery of different sources and classes of compounds. Results In this study, we developed TarNet, a manually curated database and platform of traditional medicinal plants with natural compounds that includes potential bio-target information. We gathered information on proteins that are related to or affected by medicinal plant ingredients and data on protein–protein interactions (PPIs). TarNet includes in-depth information on both plant–compound–protein relationships and PPIs. Additionally, TarNet can provide researchers with network construction analyses of biological pathways and protein–protein interactions (PPIs) associated with specific diseases. Researchers can upload a gene or protein list mapped to our PPI database that has been manually curated to generate relevant networks. Multiple functions are accessible for network topological calculations, subnetwork analyses, pathway analyses, and compound–protein relationships. Conclusions TarNet will serve as a useful analytical tool that will provide information on medicinal plant compound-affected proteins (potential targets) and system-level analyses for systems biology and network pharmacology researchers. TarNet is freely available at http://www.herbbol.org:8001/tarnet

  13. The Pfam protein families database: towards a more sustainable future

    PubMed Central

    Finn, Robert D.; Coggill, Penelope; Eberhardt, Ruth Y.; Eddy, Sean R.; Mistry, Jaina; Mitchell, Alex L.; Potter, Simon C.; Punta, Marco; Qureshi, Matloob; Sangrador-Vegas, Amaia; Salazar, Gustavo A.; Tate, John; Bateman, Alex

    2016-01-01

    In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool. PMID:26673716

  14. The Saccharomyces Genome Database: Exploring Biochemical Pathways and Mutant Phenotypes.

    PubMed

    Cherry, J Michael

    2015-12-01

    Many biochemical processes, and the proteins and cofactors involved, have been defined for the eukaryote Saccharomyces cerevisiae. This understanding has been largely derived through the awesome power of yeast genetics. The proteins responsible for the reactions that build complex molecules and generate energy for the cell have been integrated into web-based tools that provide classical views of pathways. The Yeast Pathways in the Saccharomyces Genome Database (SGD) is, however, the only database created from manually curated literature annotations. In this protocol, gene function is explored using phenotype annotations to enable hypotheses to be formulated about a gene's action. A common use of the SGD is to understand more about a gene that was identified via a phenotypic screen or found to interact with a gene/protein of interest. There are still many genes that do not yet have an experimentally defined function and so the information currently available can be used to speculate about their potential function. Typically, computational annotations based on sequence similarity are used to predict gene function. In addition, annotations are sometimes available for phenotypes of mutations in the gene of interest. Integrated results for a few example genes will be explored in this protocol. This will be instructive for the exploration of details that aid the analysis of experimental results and the establishment of connections within the yeast literature. PMID:26631123

  15. The Pfam protein families database: towards a more sustainable future.

    PubMed

    Finn, Robert D; Coggill, Penelope; Eberhardt, Ruth Y; Eddy, Sean R; Mistry, Jaina; Mitchell, Alex L; Potter, Simon C; Punta, Marco; Qureshi, Matloob; Sangrador-Vegas, Amaia; Salazar, Gustavo A; Tate, John; Bateman, Alex

    2016-01-01

    In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool. PMID:26673716

  16. Value, but high costs in post-deposition data curation.

    PubMed

    ten Hoopen, Petra; Amid, Clara; Buttigieg, Pier Luigi; Pafilis, Evangelos; Bravakos, Panos; Cerdeño-Tárraga, Ana M; Gibson, Richard; Kahlke, Tim; Legaki, Aglaia; Narayana Murthy, Kada; Papastefanou, Gabriella; Pereira, Emiliano; Rossello, Marc; Luisa Toribio, Ana; Cochrane, Guy

    2016-01-01

    Discoverability of sequence data in primary data archives is proportional to the richness of contextual information associated with the data. Here, we describe an exercise in the improvement of contextual information surrounding sample records associated with metagenomics sequence reads available in the European Nucleotide Archive. We outline the annotation process and summarize findings of this effort aimed at increasing usability of publicly available environmental data. Furthermore, we emphasize the benefits of such an exercise and detail its costs. We conclude that such a third party annotation approach is expensive and has value as an element of curation, but should form only part of a more sustainable submitter-driven approach. Database URL: http://www.ebi.ac.uk/ena. PMID:26861660

  17. Value, but high costs in post-deposition data curation

    PubMed Central

    ten Hoopen, Petra; Amid, Clara; Luigi Buttigieg, Pier; Pafilis, Evangelos; Bravakos, Panos; Cerdeño-Tárraga, Ana M.; Gibson, Richard; Kahlke, Tim; Legaki, Aglaia; Narayana Murthy, Kada; Papastefanou, Gabriella; Pereira, Emiliano; Rossello, Marc; Luisa Toribio, Ana; Cochrane, Guy

    2016-01-01

    Discoverability of sequence data in primary data archives is proportional to the richness of contextual information associated with the data. Here, we describe an exercise in the improvement of contextual information surrounding sample records associated with metagenomics sequence reads available in the European Nucleotide Archive. We outline the annotation process and summarize findings of this effort aimed at increasing usability of publicly available environmental data. Furthermore, we emphasize the benefits of such an exercise and detail its costs. We conclude that such a third party annotation approach is expensive and has value as an element of curation, but should form only part of a more sustainable submitter-driven approach. Database URL: http://www.ebi.ac.uk/ena PMID:26861660

  18. Astromaterials Acquisition and Curation Office (KT) Overview

    NASA Technical Reports Server (NTRS)

    Allen, Carlton

    2014-01-01

    The Astromaterials Acquisition and Curation Office has the unique responsibility to curate NASA's extraterrestrial samples - from past and forthcoming missions - into the indefinite future. Currently, curation includes documentation, preservation, physical security, preparation, and distribution of samples from the Moon, asteroids, comets, the solar wind, and the planet Mars. Each of these sample sets has a unique history and comes from a unique environment. The curation laboratories and procedures developed over 40 years have proven both necessary and sufficient to serve the evolving needs of a worldwide research community. A new generation of sample return missions to destinations across the solar system is being planned and proposed. The curators are developing the tools and techniques to meet the challenges of these new samples. Extraterrestrial samples pose unique curation requirements. These samples were formed and exist under conditions strikingly different from those on the Earth's surface. Terrestrial contamination would destroy much of the scientific significance of extraterrestrial materials. To preserve the research value of these precious samples, contamination must be minimized, understood, and documented. In addition, the samples must be preserved - as far as possible - from physical and chemical alteration. The elaborate curation facilities at JSC were designed and constructed, and have been operated for many years, to keep sample contamination and alteration to a minimum. Currently, JSC curates seven collections of extraterrestrial samples: (a)) Lunar rocks and soils collected by the Apollo astronauts, (b) Meteorites collected on dedicated expeditions to Antarctica, (c) Cosmic dust collected by high-altitude NASA aircraft,t (d) Solar wind atoms collected by the Genesis spacecraft, (e) Comet particles collected by the Stardust spacecraft, (f) Interstellar dust particles collected by the Stardust spacecraft, and (g) Asteroid soil particles collected

  19. BioModels Database: a repository of mathematical models of biological processes.

    PubMed

    Chelliah, Vijayalakshmi; Laibe, Camille; Le Novère, Nicolas

    2013-01-01

    BioModels Database is a public online resource that allows storing and sharing of published, peer-reviewed quantitative, dynamic models of biological processes. The model components and behaviour are thoroughly checked to correspond the original publication and manually curated to ensure reliability. Furthermore, the model elements are annotated with terms from controlled vocabularies as well as linked to relevant external data resources. This greatly helps in model interpretation and reuse. Models are stored in SBML format, accepted in SBML and CellML formats, and are available for download in various other common formats such as BioPAX, Octave, SciLab, VCML, XPP and PDF, in addition to SBML. The reaction network diagram of the models is also available in several formats. BioModels Database features a search engine, which provides simple and more advanced searches. Features such as online simulation and creation of smaller models (submodels) from the selected model elements of a larger one are provided. BioModels Database can be accessed both via a web interface and programmatically via web services. New models are available in BioModels Database at regular releases, about every 4 months. PMID:23715986

  20. HNdb: an integrated database of gene and protein information on head and neck squamous cell carcinoma

    PubMed Central

    Henrique, Tiago; José Freitas da Silveira, Nelson; Henrique Cunha Volpato, Arthur; Mioto, Mayra Mataruco; Carolina Buzzo Stefanini, Ana; Bachir Fares, Adil; Gustavo da Silva Castro Andrade, João; Masson, Carolina; Verónica Mendoza López, Rossana; Daumas Nunes, Fabio; Paulo Kowalski, Luis; Severino, Patricia; Tajara, Eloiza Helena

    2016-01-01

    The total amount of scientific literature has grown rapidly in recent years. Specifically, there are several million citations in the field of cancer. This makes it difficult, if not impossible, to manually retrieve relevant information on the mechanisms that govern tumor behavior or the neoplastic process. Furthermore, cancer is a complex disease or, more accurately, a set of diseases. The heterogeneity that permeates many tumors is particularly evident in head and neck (HN) cancer, one of the most common types of cancer worldwide. In this study, we present HNdb, a free database that aims to provide a unified and comprehensive resource of information on genes and proteins involved in HN squamous cell carcinoma, covering data on genomics, transcriptomics, proteomics, literature citations and also cross-references of external databases. Different literature searches of MEDLINE abstracts were performed using specific Medical Subject Headings (MeSH terms) for oral, oropharyngeal, hypopharyngeal and laryngeal squamous cell carcinomas. A curated gene-to-publication assignment yielded a total of 1370 genes related to HN cancer. The diversity of results allowed identifying novel and mostly unexplored gene associations, revealing, for example, that processes linked to response to steroid hormone stimulus are significantly enriched in genes related to HN carcinomas. Thus, our database expands the possibilities for gene networks investigation, providing potential hypothesis to be tested. Database URL: http://www.gencapo.famerp.br/hndb PMID:27013077

  1. Kin-Driver: a database of driver mutations in protein kinases.

    PubMed

    Simonetti, Franco L; Tornador, Cristian; Nabau-Moretó, Nuria; Molina-Vila, Miguel A; Marino-Buslje, Cristina

    2014-01-01

    Somatic mutations in protein kinases (PKs) are frequent driver events in many human tumors, while germ-line mutations are associated with hereditary diseases. Here we present Kin-driver, the first database that compiles driver mutations in PKs with experimental evidence demonstrating their functional role. Kin-driver is a manual expert-curated database that pays special attention to activating mutations (AMs) and can serve as a validation set to develop new generation tools focused on the prediction of gain-of-function driver mutations. It also offers an easy and intuitive environment to facilitate the visualization and analysis of mutations in PKs. Because all mutations are mapped onto a multiple sequence alignment, analogue positions between kinases can be identified and tentative new mutations can be proposed for studying by transferring annotation. Finally, our database can also be of use to clinical and translational laboratories, helping them to identify uncommon AMs that can correlate with response to new antitumor drugs. The website was developed using PHP and JavaScript, which are supported by all major browsers; the database was built using MySQL server. Kin-driver is available at: http://kin-driver.leloir.org.ar/ PMID:25414382

  2. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions

    PubMed Central

    Bretaudeau, Anthony; Coste, François; Humily, Florian; Garczarek, Laurence; Le Corguillé, Gildas; Six, Christophe; Ratin, Morgane; Collin, Olivier; Schluchter, Wendy M.; Partensky, Frédéric

    2013-01-01

    CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers. PMID:23175607

  3. Curating NASA's Extraterrestrial Samples - Past, Present, and Future

    NASA Technical Reports Server (NTRS)

    Allen, Carlton; Allton, Judith; Lofgren, Gary; Righter, Kevin; Zolensky, Michael

    2010-01-01

    Curation of extraterrestrial samples is the critical interface between sample return missions and the international research community. The Astromaterials Acquisition and Curation Office at the NASA Johnson Space Center (JSC) is responsible for curating NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials," JSC is charged with ". . . curation of all extraterrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including documentation, preservation, preparation, and distribution of samples for research, education, and public outreach.

  4. Curating NASA's Extraterrestrial Samples - Past, Present, and Future

    NASA Technical Reports Server (NTRS)

    Allen, Carlton; Allton, Judith; Lofgren, Gary; Righter, Kevin; Zolensky, Michael

    2011-01-01

    Curation of extraterrestrial samples is the critical interface between sample return missions and the international research community. The Astromaterials Acquisition and Curation Office at the NASA Johnson Space Center (JSC) is responsible for curating NASA s extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials", JSC is charged with ". . . curation of all extraterrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including "documentation, preservation, preparation, and distribution of samples for research, education, and public outreach."

  5. Curcumin Resource Database.

    PubMed

    Kumar, Anil; Chetia, Hasnahana; Sharma, Swagata; Kabiraj, Debajyoti; Talukdar, Narayan Chandra; Bora, Utpal

    2015-01-01

    Curcumin is one of the most intensively studied diarylheptanoid, Curcuma longa being its principal producer. This apart, a class of promising curcumin analogs has been generated in laboratories, aptly named as Curcuminoids which are showing huge potential in the fields of medicine, food technology, etc. The lack of a universal source of data on curcumin as well as curcuminoids has been felt by the curcumin research community for long. Hence, in an attempt to address this stumbling block, we have developed Curcumin Resource Database (CRDB) that aims to perform as a gateway-cum-repository to access all relevant data and related information on curcumin and its analogs. Currently, this database encompasses 1186 curcumin analogs, 195 molecular targets, 9075 peer reviewed publications, 489 patents and 176 varieties of C. longa obtained by extensive data mining and careful curation from numerous sources. Each data entry is identified by a unique CRDB ID (identifier). Furnished with a user-friendly web interface and in-built search engine, CRDB provides well-curated and cross-referenced information that are hyperlinked with external sources. CRDB is expected to be highly useful to the researchers working on structure as well as ligand-based molecular design of curcumin analogs. PMID:26220923

  6. Using the Reactome Database

    PubMed Central

    Haw, Robin

    2012-01-01

    There is considerable interest in the bioinformatics community in creating pathway databases. The Reactome project (a collaboration between the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University Medical Center and the European Bioinformatics Institute) is one such pathway database and collects structured information on all the biological pathways and processes in the human. It is an expert-authored and peer-reviewed, curated collection of well-documented molecular reactions that span the gamut from simple intermediate metabolism to signaling pathways and complex cellular events. This information is supplemented with likely orthologous molecular reactions in mouse, rat, zebrafish, worm and other model organisms. This unit describes how to use the Reactome database to learn the steps of a biological pathway; navigate and browse through the Reactome database; identify the pathways in which a molecule of interest is involved; use the Pathway and Expression analysis tools to search the database for and visualize possible connections within user-supplied experimental data set and Reactome pathways; and the Species Comparison tool to compare human and model organism pathways. PMID:22700314

  7. The RadAssessor manual

    SciTech Connect

    Seitz, Sharon L.

    2007-02-01

    THIS manual will describe the functions and capabilities that are available from the RadAssessor database and will demonstrate how to retrieve and view its information. You’ll learn how to start the database application, how to log in, how to use the common commands, and how to use the online help if you have a question or need extra guidance. RadAssessor can be viewed from any standard web browser. Therefore, you will not need to install any special software before using RadAssessor.

  8. The IPD and IMGT/HLA database: allele variant databases

    PubMed Central

    Robinson, James; Halliwell, Jason A.; Hayhurst, James D.; Flicek, Paul; Parham, Peter; Marsh, Steven G. E.

    2015-01-01

    The Immuno Polymorphism Database (IPD) was developed to provide a centralized system for the study of polymorphism in genes of the immune system. Through the IPD project we have established a central platform for the curation and publication of locus-specific databases involved either directly or related to the function of the Major Histocompatibility Complex in a number of different species. We have collaborated with specialist groups or nomenclature committees that curate the individual sections before they are submitted to IPD for online publication. IPD consists of five core databases, with the IMGT/HLA Database as the primary database. Through the work of the various nomenclature committees, the HLA Informatics Group and in collaboration with the European Bioinformatics Institute we are able to provide public access to this data through the website http://www.ebi.ac.uk/ipd/. The IPD project continues to develop with new tools being added to address scientific developments, such as Next Generation Sequencing, and to address user feedback and requests. Regular updates to the website ensure that new and confirmatory sequences are dispersed to the immunogenetics community, and the wider research and clinical communities. PMID:25414341

  9. Curating Media Learning: Towards a Porous Expertise

    ERIC Educational Resources Information Center

    McDougall, Julian; Potter, John

    2015-01-01

    This article combines research results from a range of projects with two consistent themes. Firstly, we explore the potential for curation to offer a productive metaphor for the convergence of digital media learning across and between home/lifeworld and formal educational/system-world spaces--or between the public and private spheres. Secondly, we…

  10. Curating and Nudging in Virtual CLIL Environments

    ERIC Educational Resources Information Center

    Nielsen, Helle Lykke

    2014-01-01

    Foreign language teachers can benefit substantially from the notions of curation and nudging when scaffolding CLIL activities on the internet. This article shows how these principles can be integrated into CLILstore, a free multimedia-rich learning tool with seamless access to online dictionaries, and presents feedback from first and second year…

  11. The Curative Fantasy and Psychic Recovery

    PubMed Central

    ORNSTEIN, ANNA

    1992-01-01

    The discovery of selfobject transferences and the interpretation of symptomatic behavior from within the patient’s perspective have altered the conduct of psychoanalytic psychotherapy in fundamental ways. A review of the treatment process from a self psychological perspective serves as a background for pointing to the significance of curative fantasies in the process of recovery. PMID:22700052

  12. Correcting Inconsistencies and Errors in Bacterial Genome Metadata Using an Automated Curation Tool in Excel (AutoCurE).

    PubMed

    Schmedes, Sarah E; King, Jonathan L; Budowle, Bruce

    2015-01-01

    Whole-genome data are invaluable for large-scale comparative genomic studies. Current sequencing technologies have made it feasible to sequence entire bacterial genomes with relative ease and time with a substantially reduced cost per nucleotide, hence cost per genome. More than 3,000 bacterial genomes have been sequenced and are available at the finished status. Publically available genomes can be readily downloaded; however, there are challenges to verify the specific supporting data contained within the download and to identify errors and inconsistencies that may be present within the organizational data content and metadata. AutoCurE, an automated tool for bacterial genome database curation in Excel, was developed to facilitate local database curation of supporting data that accompany downloaded genomes from the National Center for Biotechnology Information. AutoCurE provides an automated approach to curate local genomic databases by flagging inconsistencies or errors by comparing the downloaded supporting data to the genome reports to verify genome name, RefSeq accession numbers, the presence of archaea, BioProject/UIDs, and sequence file descriptions. Flags are generated for nine metadata fields if there are inconsistencies between the downloaded genomes and genomes reports and if erroneous or missing data are evident. AutoCurE is an easy-to-use tool for local database curation for large-scale genome data prior to downstream analyses. PMID:26442252

  13. Manual compactor

    NASA Technical Reports Server (NTRS)

    Stevenson, Grant E. (Inventor)

    1979-01-01

    A manual compactor having two handles each pivoted at one end for movement through adjacent arcs toward and away from each other, such reciprocating activation motion being translated into rotary motion in a single direction by means of ratchet and pawl arrangements about the pivot shaft of each handle, and thenceforth to rotary motion of opposing screws one each of which is driven by each handle, which in turn act through ball nut structures to forcibly draw together plates with force sufficient for compacting, the handles also having provisions for actuating push rod within the handles for the purpose of disengaging the pawls from the ratchets thereby allowing retraction through spring loading of the plates and repositioning of the apparatus for subsequent compacting.

  14. How Workflow Documentation Facilitates Curation Planning

    NASA Astrophysics Data System (ADS)

    Wickett, K.; Thomer, A. K.; Baker, K. S.; DiLauro, T.; Asangba, A. E.

    2013-12-01

    The description of the specific processes and artifacts that led to the creation of a data product provide a detailed picture of data provenance in the form of a workflow. The Site-Based Data Curation project, hosted by the Center for Informatics Research in Science and Scholarship at the University of Illinois, has been investigating how workflows can be used in developing curation processes and policies that move curation "upstream" in the research process. The team has documented an individual workflow for geobiology data collected during a single field trip to Yellowstone National Park. This specific workflow suggests a generalized three-part process for field data collection that comprises three distinct elements: a Planning Stage, a Fieldwork Stage, and a Processing and Analysis Stage. Beyond supplying an account of data provenance, the workflow has allowed the team to identify 1) points of intervention for curation processes and 2) data products that are likely candidates for sharing or deposit. Although these objects may be viewed by individual researchers as 'intermediate' data products, discussions with geobiology researchers have suggested that with appropriate packaging and description they may serve as valuable observational data for other researchers. Curation interventions may include the introduction of regularized data formats during the planning process, data description procedures, the identification and use of established controlled vocabularies, and data quality and validation procedures. We propose a poster that shows the individual workflow and our generalization into a three-stage process. We plan to discuss with attendees how well the three-stage view applies to other types of field-based research, likely points of intervention, and what kinds of interventions are appropriate and feasible in the example workflow.

  15. NASA's Astromaterials Curation Digital Repository: Enabling Research Through Increased Access to Sample Data, Metadata and Imagery

    NASA Astrophysics Data System (ADS)

    Evans, C. A.; Todd, N. S.

    2014-12-01

    The Astromaterials Acquisition & Curation Office at NASA's Johnson Space Center (JSC) is the designated facility for curating all of NASA's extraterrestrial samples. Today, the suite of collections includes the lunar samples from the Apollo missions, cosmic dust particles falling into the Earth's atmosphere, meteorites collected in Antarctica, comet and interstellar dust particles from the Stardust mission, asteroid particles from Japan's Hayabusa mission, solar wind atoms collected during the Genesis mission, and space-exposed hardware from several missions. To support planetary science research on these samples, JSC's Astromaterials Curation Office hosts NASA's Astromaterials Curation digital repository and data access portal [http://curator.jsc.nasa.gov/], providing descriptions of the missions and collections, and critical information about each individual sample. Our office is designing and implementing several informatics initiatives to better serve the planetary research community. First, we are re-hosting the basic database framework by consolidating legacy databases for individual collections and providing a uniform access point for information (descriptions, imagery, classification) on all of our samples. Second, we continue to upgrade and host digital compendia that summarize and highlight published findings on the samples (e.g., lunar samples, meteorites from Mars). We host high resolution imagery of samples as it becomes available, including newly scanned images of historical prints from the Apollo missions. Finally we are creating plans to collect and provide new data, including 3D imagery, point cloud data, micro CT data, and external links to other data sets on selected samples. Together, these individual efforts will provide unprecedented digital access to NASA's Astromaterials, enabling preservation of the samples through more specific and targeted requests, and supporting new planetary science research and collaborations on the samples.

  16. Curation of Microscopic Astromaterials by NASA: "Gathering Dust Since 1981"

    NASA Astrophysics Data System (ADS)

    Frank, D. R.; Bastien, R. K.; Rodriguez, M.; Gonzalez, C.; Todd, N.; Zolensky, M. E.

    2013-09-01

    Applying the philosophy that “Small is Beautiful”, NASA has been collecting and curating microscopic astromaterials since 1981 (“Gathering dust since 1981”). We describe recent curation developments and efforts in these programs.

  17. The Role of Community-Driven Data Curation for Enterprises

    NASA Astrophysics Data System (ADS)

    Curry, Edward; Freitas, Andre; O'Riáin, Sean

    With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.

  18. If we build it, will they come? Curation and use of the ESO telescope bibliography

    NASA Astrophysics Data System (ADS)

    Grothkopf, Uta; Meakins, Silvia; Bordelon, Dominic

    2015-12-01

    The ESO Telescope Bibliography (telbib) is a database of refereed papers published by the ESO users community. It links data in the ESO Science Archive with the published literature, and vice versa. Developed and maintained by the ESO library, telbib also provides insights into the organization's research output and impact as measured through bibliometric studies. Curating telbib is a multi-step process that involves extensive tagging of the database records. Based on selected use cases, this talk will explain how the rich metadata provide parameters for reports and statistics in order to investigate the performance of ESO's facilities and to understand trends and developments in the publishing behaviour of the user community.

  19. The Pathogen-Host Interactions database (PHI-base): additions and future developments

    PubMed Central

    Urban, Martin; Pant, Rashmi; Raghunath, Arathi; Irvine, Alistair G.; Pedro, Helder; Hammond-Kosack, Kim E.

    2015-01-01

    Rapidly evolving pathogens cause a diverse array of diseases and epidemics that threaten crop yield, food security as well as human, animal and ecosystem health. To combat infection greater comparative knowledge is required on the pathogenic process in multiple species. The Pathogen-Host Interactions database (PHI-base) catalogues experimentally verified pathogenicity, virulence and effector genes from bacterial, fungal and protist pathogens. Mutant phenotypes are associated with gene information. The included pathogens infect a wide range of hosts including humans, animals, plants, insects, fish and other fungi. The current version, PHI-base 3.6, available at http://www.phi-base.org, stores information on 2875 genes, 4102 interactions, 110 host species, 160 pathogenic species (103 plant, 3 fungal and 54 animal infecting species) and 181 diseases drawn from 1243 references. Phenotypic and gene function information has been obtained by manual curation of the peer-reviewed literature. A controlled vocabulary consisting of nine high-level phenotype terms permits comparisons and data analysis across the taxonomic space. PHI-base phenotypes were mapped via their associated gene information to reference genomes available in Ensembl Genomes. Virulence genes and hotspots can be visualized directly in genome browsers. Future plans for PHI-base include development of tools facilitating community-led curation and inclusion of the corresponding host target(s). PMID:25414340

  20. The Pathogen-Host Interactions database (PHI-base): additions and future developments.

    PubMed

    Urban, Martin; Pant, Rashmi; Raghunath, Arathi; Irvine, Alistair G; Pedro, Helder; Hammond-Kosack, Kim E

    2015-01-01

    Rapidly evolving pathogens cause a diverse array of diseases and epidemics that threaten crop yield, food security as well as human, animal and ecosystem health. To combat infection greater comparative knowledge is required on the pathogenic process in multiple species. The Pathogen-Host Interactions database (PHI-base) catalogues experimentally verified pathogenicity, virulence and effector genes from bacterial, fungal and protist pathogens. Mutant phenotypes are associated with gene information. The included pathogens infect a wide range of hosts including humans, animals, plants, insects, fish and other fungi. The current version, PHI-base 3.6, available at http://www.phi-base.org, stores information on 2875 genes, 4102 interactions, 110 host species, 160 pathogenic species (103 plant, 3 fungal and 54 animal infecting species) and 181 diseases drawn from 1243 references. Phenotypic and gene function information has been obtained by manual curation of the peer-reviewed literature. A controlled vocabulary consisting of nine high-level phenotype terms permits comparisons and data analysis across the taxonomic space. PHI-base phenotypes were mapped via their associated gene information to reference genomes available in Ensembl Genomes. Virulence genes and hotspots can be visualized directly in genome browsers. Future plans for PHI-base include development of tools facilitating community-led curation and inclusion of the corresponding host target(s). PMID:25414340

  1. Human Transporter Database: Comprehensive Knowledge and Discovery Tools in the Human Transporter Genes

    PubMed Central

    Ye, Adam Y.; Liu, Qing-Rong; Li, Chuan-Yun; Zhao, Min; Qu, Hong

    2014-01-01

    Transporters are essential in homeostatic exchange of endogenous and exogenous substances at the systematic, organic, cellular, and subcellular levels. Gene mutations of transporters are often related to pharmacogenetics traits. Recent developments in high throughput technologies on genomics, transcriptomics and proteomics allow in depth studies of transporter genes in normal cellular processes and diverse disease conditions. The flood of high throughput data have resulted in urgent need for an updated knowledgebase with curated, organized, and annotated human transporters in an easily accessible way. Using a pipeline with the combination of automated keywords query, sequence similarity search and manual curation on transporters, we collected 1,555 human non-redundant transporter genes to develop the Human Transporter Database (HTD) (http://htd.cbi.pku.edu.cn). Based on the extensive annotations, global properties of the transporter genes were illustrated, such as expression patterns and polymorphisms in relationships with their ligands. We noted that the human transporters were enriched in many fundamental biological processes such as oxidative phosphorylation and cardiac muscle contraction, and significantly associated with Mendelian and complex diseases such as epilepsy and sudden infant death syndrome. Overall, HTD provides a well-organized interface to facilitate research communities to search detailed molecular and genetic information of transporters for development of personalized medicine. PMID:24558441

  2. Electronic Commerce user manual

    SciTech Connect

    Not Available

    1992-04-10

    This User Manual supports the Electronic Commerce Standard System. The Electronic Commerce Standard System is being developed for the Department of Defense of the Technology Information Systems Program at the Lawrence Livermore National Laboratory, operated by the University of California for the Department of Energy. The Electronic Commerce Standard System, or EC as it is known, provides the capability for organizations to conduct business electronically instead of through paper transactions. Electronic Commerce and Computer Aided Acquisition and Logistics Support, are two major projects under the DoD`s Corporate Information Management program, whose objective is to make DoD business transactions faster and less costly by using computer networks instead of paper forms and postage. EC runs on computers that use the UNIX operating system and provides a standard set of applications and tools that are bound together by a common command and menu system. These applications and tools may vary according to the requirements of the customer or location and may be customized to meet the specific needs of an organization. Local applications can be integrated into the menu system under the Special Databases & Applications option on the EC main menu. These local applications will be documented in the appendices of this manual. This integration capability provides users with a common environment of standard and customized applications.

  3. Electronic Commerce user manual

    SciTech Connect

    Not Available

    1992-04-10

    This User Manual supports the Electronic Commerce Standard System. The Electronic Commerce Standard System is being developed for the Department of Defense of the Technology Information Systems Program at the Lawrence Livermore National Laboratory, operated by the University of California for the Department of Energy. The Electronic Commerce Standard System, or EC as it is known, provides the capability for organizations to conduct business electronically instead of through paper transactions. Electronic Commerce and Computer Aided Acquisition and Logistics Support, are two major projects under the DoD's Corporate Information Management program, whose objective is to make DoD business transactions faster and less costly by using computer networks instead of paper forms and postage. EC runs on computers that use the UNIX operating system and provides a standard set of applications and tools that are bound together by a common command and menu system. These applications and tools may vary according to the requirements of the customer or location and may be customized to meet the specific needs of an organization. Local applications can be integrated into the menu system under the Special Databases Applications option on the EC main menu. These local applications will be documented in the appendices of this manual. This integration capability provides users with a common environment of standard and customized applications.

  4. OpenTrials: towards a collaborative open database of all available information on all clinical trials.

    PubMed

    Goldacre, Ben; Gray, Jonathan

    2016-01-01

    OpenTrials is a collaborative and open database for all available structured data and documents on all clinical trials, threaded together by individual trial. With a versatile and expandable data schema, it is initially designed to host and match the following documents and data for each trial: registry entries; links, abstracts, or texts of academic journal papers; portions of regulatory documents describing individual trials; structured data on methods and results extracted by systematic reviewers or other researchers; clinical study reports; and additional documents such as blank consent forms, blank case report forms, and protocols. The intention is to create an open, freely re-usable index of all such information and to increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data and drive up standards around open data in evidence-based medicine. The project has phase I funding. This will allow us to create a practical data schema and populate the database initially through web-scraping, basic record linkage techniques, crowd-sourced curation around selected drug areas, and import of existing sources of structured and documents. It will also allow us to create user-friendly web interfaces onto the data and conduct user engagement workshops to optimise the database and interface designs. Where other projects have set out to manually and perfectly curate a narrow range of information on a smaller number of trials, we aim to use a broader range of techniques and attempt to match a very large quantity of information on all trials. We are currently seeking feedback and additional sources of structured data. PMID:27056367

  5. Asphalt Raking. Instructor Manual. Trainee Manual.

    ERIC Educational Resources Information Center

    Laborers-AGC Education and Training Fund, Pomfret Center, CT.

    This packet consists of the instructor and trainee manuals for an asphalt raking course. The instructor manual contains a course schedule for 4 days of instruction, content outline, and instructor outline. The trainee manual is divided into five sections: safety, asphalt basics, placing methods, repair and patching, and clean-up and maintenance.…

  6. An emerging role: the nurse content curator.

    PubMed

    Brooks, Beth A

    2015-01-01

    A new phenomenon, the inverted or "flipped" classroom, assumes that students are no longer acquiring knowledge exclusively through textbooks or lectures. Instead, they are seeking out the vast amount of free information available to them online (the very essence of open source) to supplement learning gleaned in textbooks and lectures. With so much open-source content available to nursing faculty, it benefits the faculty to use readily available, technologically advanced content. The nurse content curator supports nursing faculty in its use of such content. Even more importantly, the highly paid, time-strapped faculty is not spending an inordinate amount of effort surfing for and evaluating content. The nurse content curator does that work, while the faculty uses its time more effectively to help students vet the truth, make meaning of the content, and learn to problem-solve. Brooks. PMID:24935444

  7. Data Curation Education in Research Centers (DCERC)

    NASA Astrophysics Data System (ADS)

    Marlino, M. R.; Mayernik, M. S.; Kelly, K.; Allard, S.; Tenopir, C.; Palmer, C.; Varvel, V. E., Jr.

    2012-12-01

    Digital data both enable and constrain scientific research. Scientists are enabled by digital data to develop new research methods, utilize new data sources, and investigate new topics, but they also face new data collection, management, and preservation burdens. The current data workforce consists primarily of scientists who receive little formal training in data management and data managers who are typically educated through on-the-job training. The Data Curation Education in Research Centers (DCERC) program is investigating a new model for educating data professionals to contribute to scientific research. DCERC is a collaboration between the University of Illinois at Urbana-Champaign Graduate School of Library and Information Science, the University of Tennessee School of Information Sciences, and the National Center for Atmospheric Research. The program is organized around a foundations course in data curation and provides field experiences in research and data centers for both master's and doctoral students. This presentation will outline the aims and the structure of the DCERC program and discuss results and lessons learned from the first set of summer internships in 2012. Four masters students participated and worked with both data mentors and science mentors, gaining first hand experiences in the issues, methods, and challenges of scientific data curation. They engaged in a diverse set of topics, including climate model metadata, observational data management workflows, and data cleaning, documentation, and ingest processes within a data archive. The students learned current data management practices and challenges while developing expertise and conducting research. They also made important contributions to NCAR data and science teams by evaluating data management workflows and processes, preparing data sets to be archived, and developing recommendations for particular data management activities. The master's student interns will return in summer of 2013

  8. A Reflection on a Data Curation Journey

    PubMed Central

    van Zyl, Christa

    2015-01-01

    This commentary is a reflection on experience of data preservation and sharing (i.e., data curation) practices developed in a South African research organization. The lessons learned from this journey have echoes in the findings and recommendations emerging from the present study in Low and Middle-Income Countries (LMIC) and may usefully contribute to more general reflection on the management of change in data practice. PMID:26297756

  9. dbPEC: a comprehensive literature-based database for preeclampsia related genes and phenotypes

    PubMed Central

    Uzun, Alper; Triche, Elizabeth W.; Schuster, Jessica; Dewan, Andrew T.; Padbury, James F.

    2016-01-01

    Preeclampsia is one of the most common causes of fetal and maternal morbidity and mortality in the world. We built a Database for Preeclampsia (dbPEC) consisting of the clinical features, concurrent conditions, published literature and genes associated with Preeclampsia. We included gene sets associated with severity, concurrent conditions, tissue sources and networks. The published scientific literature is the primary repository for all information documenting human disease. We used semantic data mining to retrieve and extract the articles pertaining to preeclampsia-associated genes and performed manual curation. We deposited the articles, genes, preeclampsia phenotypes and other supporting information into the dbPEC. It is publicly available and freely accessible. Previously, we developed a database for preterm birth (dbPTB) using a similar approach. Using the gene sets in dbPTB, we were able to successfully analyze a genome-wide study of preterm birth including 4000 women and children. We identified important genes and pathways associated with preterm birth that were not otherwise demonstrable using genome-wide approaches. dbPEC serves not only as a resources for genes and articles associated with preeclampsia, it is a robust source of gene sets to analyze a wide range of high-throughput data for gene set enrichment analysis. Database URL: http://ptbdb.cs.brown.edu/dbpec/ PMID:26946289

  10. IUPHAR-DB: the IUPHAR database of G protein-coupled receptors and ion channels.

    PubMed

    Harmar, Anthony J; Hills, Rebecca A; Rosser, Edward M; Jones, Martin; Buneman, O Peter; Dunbar, Donald R; Greenhill, Stuart D; Hale, Valerie A; Sharman, Joanna L; Bonner, Tom I; Catterall, William A; Davenport, Anthony P; Delagrange, Philippe; Dollery, Colin T; Foord, Steven M; Gutman, George A; Laudet, Vincent; Neubig, Richard R; Ohlstein, Eliot H; Olsen, Richard W; Peters, John; Pin, Jean-Philippe; Ruffolo, Robert R; Searls, David B; Wright, Mathew W; Spedding, Michael

    2009-01-01

    The IUPHAR database (IUPHAR-DB) integrates peer-reviewed pharmacological, chemical, genetic, functional and anatomical information on the 354 nonsensory G protein-coupled receptors (GPCRs), 71 ligand-gated ion channel subunits and 141 voltage-gated-like ion channel subunits encoded by the human, rat and mouse genomes. These genes represent the targets of approximately one-third of currently approved drugs and are a major focus of drug discovery and development programs in the pharmaceutical industry. IUPHAR-DB provides a comprehensive description of the genes and their functions, with information on protein structure and interactions, ligands, expression patterns, signaling mechanisms, functional assays and biologically important receptor variants (e.g. single nucleotide polymorphisms and splice variants). In addition, the phenotypes resulting from altered gene expression (e.g. in genetically altered animals or in human genetic disorders) are described. The content of the database is peer reviewed by members of the International Union of Basic and Clinical Pharmacology Committee on Receptor Nomenclature and Drug Classification (NC-IUPHAR); the data are provided through manual curation of the primary literature by a network of over 60 subcommittees of NC-IUPHAR. Links to other bioinformatics resources, such as NCBI, Uniprot, HGNC and the rat and mouse genome databases are provided. IUPHAR-DB is freely available at http://www.iuphar-db.org. PMID:18948278

  11. PhenoMiner: from text to a database of phenotypes associated with OMIM diseases

    PubMed Central

    Collier, Nigel; Groza, Tudor; Smedley, Damian; Robinson, Peter N.; Oellrich, Anika; Rebholz-Schuhmann, Dietrich

    2015-01-01

    Analysis of scientific and clinical phenotypes reported in the experimental literature has been curated manually to build high-quality databases such as the Online Mendelian Inheritance in Man (OMIM). However, the identification and harmonization of phenotype descriptions struggles with the diversity of human expressivity. We introduce a novel automated extraction approach called PhenoMiner that exploits full parsing and conceptual analysis. Apriori association mining is then used to identify relationships to human diseases. We applied PhenoMiner to the BMC open access collection and identified 13 636 phenotype candidates. We identified 28 155 phenotype-disorder hypotheses covering 4898 phenotypes and 1659 Mendelian disorders. Analysis showed: (i) the semantic distribution of the extracted terms against linked ontologies; (ii) a comparison of term overlap with the Human Phenotype Ontology (HP); (iii) moderate support for phenotype-disorder pairs in both OMIM and the literature; (iv) strong associations of phenotype-disorder pairs to known disease-genes pairs using PhenoDigm. The full list of PhenoMiner phenotypes (S1), phenotype-disorder associations (S2), association-filtered linked data (S3) and user database documentation (S5) is available as supplementary data and can be downloaded at http://github.com/nhcollier/PhenoMiner under a Creative Commons Attribution 4.0 license. Database URL: phenominer.mml.cam.ac.uk PMID:26507285

  12. SynDB: a Synapse protein DataBase based on synapse ontology.

    PubMed

    Zhang, Wuxue; Zhang, Yong; Zheng, Hui; Zhang, Chen; Xiong, Wei; Olyarchuk, John G; Walker, Michael; Xu, Weifeng; Zhao, Min; Zhao, Shuqi; Zhou, Zhuan; Wei, Liping

    2007-01-01

    A synapse is the junction across which a nerve impulse passes from an axon terminal to a neuron, muscle cell or gland cell. The functions and building molecules of the synapse are essential to almost all neurobiological processes. To describe synaptic structures and functions, we have developed Synapse Ontology (SynO), a hierarchical representation that includes 177 terms with hundreds of synonyms and branches up to eight levels deep. associated 125 additional protein keywords and 109 InterPro domains with these SynO terms. Using a combination of automated keyword searches, domain searches and manual curation, we collected 14,000 non-redundant synapse-related proteins, including 3000 in human. We extensively annotated the proteins with information about sequence, structure, function, expression, pathways, interactions and disease associations and with hyperlinks to external databases. The data are stored and presented in the Synapse protein DataBase (SynDB, http://syndb.cbi.pku.edu.cn). SynDB can be interactively browsed by SynO, Gene Ontology (GO), domain families, species, chromosomal locations or Tribe-MCL clusters. It can also be searched by text (including Boolean operators) or by sequence similarity. SynDB is the most comprehensive database to date for synaptic proteins. PMID:17098931

  13. PhenoMiner: from text to a database of phenotypes associated with OMIM diseases.

    PubMed

    Collier, Nigel; Groza, Tudor; Smedley, Damian; Robinson, Peter N; Oellrich, Anika; Rebholz-Schuhmann, Dietrich

    2015-10-01

    Analysis of scientific and clinical phenotypes reported in the experimental literature has been curated manually to build high-quality databases such as the Online Mendelian Inheritance in Man (OMIM). However, the identification and harmonization of phenotype descriptions struggles with the diversity of human expressivity. We introduce a novel automated extraction approach called PhenoMiner that exploits full parsing and conceptual analysis. Apriori association mining is then used to identify relationships to human diseases. We applied PhenoMiner to the BMC open access collection and identified 13,636 phenotype candidates. We identified 28,155 phenotype-disorder hypotheses covering 4898 phenotypes and 1659 Mendelian disorders. Analysis showed: (i) the semantic distribution of the extracted terms against linked ontologies; (ii) a comparison of term overlap with the Human Phenotype Ontology (HP); (iii) moderate support for phenotype-disorder pairs in both OMIM and the literature; (iv) strong associations of phenotype-disorder pairs to known disease-genes pairs using PhenoDigm. The full list of PhenoMiner phenotypes (S1), phenotype-disorder associations (S2), association-filtered linked data (S3) and user database documentation (S5) is available as supplementary data and can be downloaded at http://github.com/nhcollier/PhenoMiner under a Creative Commons Attribution 4.0 license. Database URL: phenominer.mml.cam.ac.uk. PMID:26507285

  14. UTAB Users Manual. Microcomputer version. First edition, 1991

    SciTech Connect

    Nellessen, J.E.; Fletcher, J.S.

    1991-07-01

    The UTAB database is a computerized information resource that permits the rapid retrieval and comparison of data pertaining to the uptake, accumulation, translocation, adhesion, and biotransformation of both organic chemicals and heavy metals by vascular plants. UTAB can be used to estimate the accumulation of applied chemicals and their biotransformed products in vegetation. Data pertaining to a specific chemical, species, or process (uptake, translocation, etc.) can be obtained from the database. UTAB serves as a rapid means of accessing numerical data without reading the published papers from which the database was derived. The manual provides instructions for use of the microcomputer database UTAB. The UTAB database is designed for use with dBASE 4, commercial software available from the Ashton-Tate Corporation, Torrance, CA. The manual describes the contents of UTAB, dBASE 4 commands required for use of UTAB, and the search procedures in sufficient detail to allow users of the manual to conduct database searches without additional references.

  15. Follicle Online: an integrated database of follicle assembly, development and ovulation.

    PubMed

    Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Cooke, Howard J; Zhang, Yuanwei; Shi, Qinghua

    2015-01-01

    Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database 'Follicle Online' that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43,000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php PMID:25931457

  16. Follicle Online: an integrated database of follicle assembly, development and ovulation

    PubMed Central

    Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Zhang, Yuanwei; Shi, Qinghua

    2015-01-01

    Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database ‘Follicle Online’ that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43 000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php PMID:25931457

  17. The BioGRID interaction database: 2015 update.

    PubMed

    Chatr-Aryamontri, Andrew; Breitkreutz, Bobby-Joe; Oughtred, Rose; Boucher, Lorrie; Heinicke, Sven; Chen, Daici; Stark, Chris; Breitkreutz, Ashton; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Nixon, Julie; Ramage, Lindsay; Winter, Andrew; Sellam, Adnane; Chang, Christie; Hirschman, Jodi; Theesfeld, Chandra; Rust, Jennifer; Livstone, Michael S; Dolinski, Kara; Tyers, Mike

    2015-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein interactions curated from the primary biomedical literature for all major model organism species and humans. As of September 2014, the BioGRID contains 749,912 interactions as drawn from 43,149 publications that represent 30 model organisms. This interaction count represents a 50% increase compared to our previous 2013 BioGRID update. BioGRID data are freely distributed through partner model organism databases and meta-databases and are directly downloadable in a variety of formats. In addition to general curation of the published literature for the major model species, BioGRID undertakes themed curation projects in areas of particular relevance for biomedical sciences, such as the ubiquitin-proteasome system and various human disease-associated interaction networks. BioGRID curation is coordinated through an Interaction Management System (IMS) that facilitates the compilation interaction records through structured evidence codes, phenotype ontologies, and gene annotation. The BioGRID architecture has been improved in order to support a broader range of interaction and post-translational modification types, to allow the representation of more complex multi-gene/protein interactions, to account for cellular phenotypes through structured ontologies, to expedite curation through semi-automated text-mining approaches, and to enhance curation quality control. PMID:25428363

  18. NASA's Earth Observatory Natural Event Tracker: Curating Metadata for Linking Data and Images to Natural Events

    NASA Astrophysics Data System (ADS)

    Ward, K.

    2015-12-01

    On any given date, there are multiple natural events occurring on our planet. Storms, wildfires, volcanoes and algal blooms can be analyzed and represented using multiple dataset parameters. These parameters, in turn, may be visualized in multiple ways and disseminated via multiple web services. Given these multiple-to-multiple relationships, we already have the makings of a microverse of linked data. In an attempt to begin putting this microverse to practical use, NASA's Earth Observatory Group has developed a prototype system called the Earth Observatory Natural Event Tracker (EONET). EONET is a metadata-driven service that is exploring digital curation as a means to adding value to the intersection of natural event-related data and existing web service-enabled visualization systems. A curated natural events database maps specific events to topical groups (e.g., storms, fires, volcanoes), from those groups to related web service visualization systems and, eventually, to the source data products themselves. I will discuss the complexities that arise from attempting to map event types to dataset parameters, and the issues of granularity that come from trying to define exactly what is, and what constrains, a single natural event, particularly in a system where one of the end goals is to provide a group-curated database.

  19. Data Albums: An Event Driven Search, Aggregation and Curation Tool for Earth Science

    NASA Technical Reports Server (NTRS)

    Ramachandran, Rahul; Kulkarni, Ajinkya; Maskey, Manil; Bakare, Rohan; Basyal, Sabin; Li, Xiang; Flynn, Shannon

    2014-01-01

    Approaches used in Earth science research such as case study analysis and climatology studies involve discovering and gathering diverse data sets and information to support the research goals. To gather relevant data and information for case studies and climatology analysis is both tedious and time consuming. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. In cases where researchers are interested in studying a significant event, they have to manually assemble a variety of datasets relevant to it by searching the different distributed data systems. This paper presents a specialized search, aggregation and curation tool for Earth science to address these challenges. The search rool automatically creates curated 'Data Albums', aggregated collections of information related to a specific event, containing links to relevant data files [granules] from different instruments, tools and services for visualization and analysis, and information about the event contained in news reports, images or videos to supplement research analysis. Curation in the tool is driven via an ontology based relevancy ranking algorithm to filter out non relevant information and data.

  20. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking.

    PubMed

    Wang, Mingxun; Carver, Jeremy J; Phelan, Vanessa V; Sanchez, Laura M; Garg, Neha; Peng, Yao; Nguyen, Don Duy; Watrous, Jeramie; Kapono, Clifford A; Luzzatto-Knaan, Tal; Porto, Carla; Bouslimani, Amina; Melnik, Alexey V; Meehan, Michael J; Liu, Wei-Ting; Crüsemann, Max; Boudreau, Paul D; Esquenazi, Eduardo; Sandoval-Calderón, Mario; Kersten, Roland D; Pace, Laura A; Quinn, Robert A; Duncan, Katherine R; Hsu, Cheng-Chih; Floros, Dimitrios J; Gavilan, Ronnie G; Kleigrewe, Karin; Northen, Trent; Dutton, Rachel J; Parrot, Delphine; Carlson, Erin E; Aigle, Bertrand; Michelsen, Charlotte F; Jelsbak, Lars; Sohlenkamp, Christian; Pevzner, Pavel; Edlund, Anna; McLean, Jeffrey; Piel, Jörn; Murphy, Brian T; Gerwick, Lena; Liaw, Chih-Chuang; Yang, Yu-Liang; Humpf, Hans-Ulrich; Maansson, Maria; Keyzers, Robert A; Sims, Amy C; Johnson, Andrew R; Sidebottom, Ashley M; Sedio, Brian E; Klitgaard, Andreas; Larson, Charles B; Boya P, Cristopher A; Torres-Mendoza, Daniel; Gonzalez, David J; Silva, Denise B; Marques, Lucas M; Demarque, Daniel P; Pociute, Egle; O'Neill, Ellis C; Briand, Enora; Helfrich, Eric J N; Granatosky, Eve A; Glukhov, Evgenia; Ryffel, Florian; Houson, Hailey; Mohimani, Hosein; Kharbush, Jenan J; Zeng, Yi; Vorholt, Julia A; Kurita, Kenji L; Charusanti, Pep; McPhail, Kerry L; Nielsen, Kristian Fog; Vuong, Lisa; Elfeki, Maryam; Traxler, Matthew F; Engene, Niclas; Koyama, Nobuhiro; Vining, Oliver B; Baric, Ralph; Silva, Ricardo R; Mascuch, Samantha J; Tomasi, Sophie; Jenkins, Stefan; Macherla, Venkat; Hoffman, Thomas; Agarwal, Vinayak; Williams, Philip G; Dai, Jingqui; Neupane, Ram; Gurr, Joshua; Rodríguez, Andrés M C; Lamsa, Anne; Zhang, Chen; Dorrestein, Kathleen; Duggan, Brendan M; Almaliti, Jehad; Allard, Pierre-Marie; Phapale, Prasad; Nothias, Louis-Felix; Alexandrov, Theodore; Litaudon, Marc; Wolfender, Jean-Luc; Kyle, Jennifer E; Metz, Thomas O; Peryea, Tyler; Nguyen, Dac-Trung; VanLeer, Danielle; Shinn, Paul; Jadhav, Ajit; Müller, Rolf; Waters, Katrina M; Shi, Wenyuan; Liu, Xueting; Zhang, Lixin; Knight, Rob; Jensen, Paul R; Palsson, Bernhard Ø; Pogliano, Kit; Linington, Roger G; Gutiérrez, Marcelino; Lopes, Norberto P; Gerwick, William H; Moore, Bradley S; Dorrestein, Pieter C; Bandeira, Nuno

    2016-08-01

    The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data. PMID:27504778

  1. Research Problems in Data Curation: Outcomes from the Data Curation Education in Research Centers Program

    NASA Astrophysics Data System (ADS)

    Palmer, C. L.; Mayernik, M. S.; Weber, N.; Baker, K. S.; Kelly, K.; Marlino, M. R.; Thompson, C. A.

    2013-12-01

    The need for data curation is being recognized in numerous institutional settings as national research funding agencies extend data archiving mandates to cover more types of research grants. Data curation, however, is not only a practical challenge. It presents many conceptual and theoretical challenges that must be investigated to design appropriate technical systems, social practices and institutions, policies, and services. This presentation reports on outcomes from an investigation of research problems in data curation conducted as part of the Data Curation Education in Research Centers (DCERC) program. DCERC is developing a new model for educating data professionals to contribute to scientific research. The program is organized around foundational courses and field experiences in research and data centers for both master's and doctoral students. The initiative is led by the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign, in collaboration with the School of Information Sciences at the University of Tennessee, and library and data professionals at the National Center for Atmospheric Research (NCAR). At the doctoral level DCERC is educating future faculty and researchers in data curation and establishing a research agenda to advance the field. The doctoral seminar, Research Problems in Data Curation, was developed and taught in 2012 by the DCERC principal investigator and two doctoral fellows at the University of Illinois. It was designed to define the problem space of data curation, examine relevant concepts and theories related to both technical and social perspectives, and articulate research questions that are either unexplored or under theorized in the current literature. There was a particular emphasis on the Earth and environmental sciences, with guest speakers brought in from NCAR, National Snow and Ice Data Center (NSIDC), and Rensselaer Polytechnic Institute. Through the assignments, students

  2. dbMAE: the database of autosomal monoallelic expression

    PubMed Central

    Savova, Virginia; Patsenker, Jon; Vigneau, Sébastien; Gimelbrant, Alexander A.

    2016-01-01

    Recently, data on ‘random’ autosomal monoallelic expression has become available for the entire genome in multiple human and mouse tissues and cell types, creating a need for better access and dissemination. The database of autosomal monoallelic expression (dbMAE; https://mae.hms.harvard.edu) incorporates data from multiple recent reports of genome-wide analyses. These include transcriptome-wide analyses of allelic imbalance in clonal cell populations based on sequence polymorphisms, as well as indirect identification, based on a specific chromatin signature present in MAE gene bodies. Currently, dbMAE contains transcriptome-wide chromatin identification calls for 8 human and 21 mouse tissues, and describes over 16 000 murine and ∼700 human cases of directly measured biased expression, compiled from allele-specific RNA-seq and genotyping array data. All data are manually curated. To ensure cross-publication uniformity, we performed re-analysis of transcriptome-wide RNA-seq data using the same pipeline. Data are accessed through an interface that allows for basic and advanced searches; all source references, including raw data, are clearly described and hyperlinked. This ensures the utility of the resource as an initial screening tool for those interested in investigating the role of monoallelic expression in their specific genes and tissues of interest. PMID:26503248

  3. CancerHSP: anticancer herbs database of systems pharmacology.

    PubMed

    Tao, Weiyang; Li, Bohui; Gao, Shuo; Bai, Yaofei; Shar, Piar Ali; Zhang, Wenjuan; Guo, Zihu; Sun, Ke; Fu, Yingxue; Huang, Chao; Zheng, Chunli; Mu, Jiexin; Pei, Tianli; Wang, Yuan; Li, Yan; Wang, Yonghua

    2015-01-01

    The numerous natural products and their bioactivity potentially afford an extraordinary resource for new drug discovery and have been employed in cancer treatment. However, the underlying pharmacological mechanisms of most natural anticancer compounds remain elusive, which has become one of the major obstacles in developing novel effective anticancer agents. Here, to address these unmet needs, we developed an anticancer herbs database of systems pharmacology (CancerHSP), which records anticancer herbs related information through manual curation. Currently, CancerHSP contains 2439 anticancer herbal medicines with 3575 anticancer ingredients. For each ingredient, the molecular structure and nine key ADME parameters are provided. Moreover, we also provide the anticancer activities of these compounds based on 492 different cancer cell lines. Further, the protein targets of the compounds are predicted by state-of-art methods or collected from literatures. CancerHSP will help reveal the molecular mechanisms of natural anticancer products and accelerate anticancer drug development, especially facilitate future investigations on drug repositioning and drug discovery. CancerHSP is freely available on the web at http://lsp.nwsuaf.edu.cn/CancerHSP.php. PMID:26074488

  4. CancerHSP: anticancer herbs database of systems pharmacology

    NASA Astrophysics Data System (ADS)

    Tao, Weiyang; Li, Bohui; Gao, Shuo; Bai, Yaofei; Shar, Piar Ali; Zhang, Wenjuan; Guo, Zihu; Sun, Ke; Fu, Yingxue; Huang, Chao; Zheng, Chunli; Mu, Jiexin; Pei, Tianli; Wang, Yuan; Li, Yan; Wang, Yonghua

    2015-06-01

    The numerous natural products and their bioactivity potentially afford an extraordinary resource for new drug discovery and have been employed in cancer treatment. However, the underlying pharmacological mechanisms of most natural anticancer compounds remain elusive, which has become one of the major obstacles in developing novel effective anticancer agents. Here, to address these unmet needs, we developed an anticancer herbs database of systems pharmacology (CancerHSP), which records anticancer herbs related information through manual curation. Currently, CancerHSP contains 2439 anticancer herbal medicines with 3575 anticancer ingredients. For each ingredient, the molecular structure and nine key ADME parameters are provided. Moreover, we also provide the anticancer activities of these compounds based on 492 different cancer cell lines. Further, the protein targets of the compounds are predicted by state-of-art methods or collected from literatures. CancerHSP will help reveal the molecular mechanisms of natural anticancer products and accelerate anticancer drug development, especially facilitate future investigations on drug repositioning and drug discovery. CancerHSP is freely available on the web at http://lsp.nwsuaf.edu.cn/CancerHSP.php.

  5. RepeatsDB: a database of tandem repeat protein structures

    PubMed Central

    Di Domenico, Tomás; Potenza, Emilio; Walsh, Ian; Gonzalo Parra, R.; Giollo, Manuel; Minervini, Giovanni; Piovesan, Damiano; Ihsan, Awais; Ferrari, Carlo; Kajava, Andrey V.; Tosatto, Silvio C.E.

    2014-01-01

    RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services. PMID:24311564

  6. Project ADVANCE. Evaluation Manual. Mt. Diablo Unified School District.

    ERIC Educational Resources Information Center

    Tenenbaum, Bonnie

    The organizational structure of this manual parallels the evolutionary stages in school improvement--needs assessment, planning, implementation, and outcomes. The manual provides procedures for data-based management and includes within each section, sample instruments, data collection and analyses procedures, and questions for decision-making. The…

  7. FINE PARTICLE EMISSIONS INFORMATION SYSTEM REFERENCE MANUAL

    EPA Science Inventory

    The report is a basic reference manual on the Fine Particle Emissions Information System (FPEIS), a computerized database on primary fine particle emissions to the atmosphere from stationary point sources. The FPEIS is a component of the Environmental Assessment Data Systems (EAD...

  8. Science Orders Systems and Operations Manual.

    ERIC Educational Resources Information Center

    Kriz, Harry M.

    This manual describes the implementation and operation of SCIENCE ORDERS, an online orders management system used by the Science and Technology Department of Newman Library at Virginia Polytechnic Institute and State University. Operational since January 1985, the system is implemented using the SPIRES database management system and is used to (1)…

  9. IPD—the Immuno Polymorphism Database

    PubMed Central

    Robinson, James; Mistry, Kavita; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G. E.

    2010-01-01

    The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors, IPD-MHC, is a database of sequences of the Major Histocompatibility Complex of different species; IPD-human platelet antigens, alloantigens expressed only on platelets and IPD-ESTDAB, which provides access to the European Searchable Tumour cell-line database, a cell bank of immunologically characterised melanoma cell lines. The data is currently available online from the website and ftp directory. PMID:19875415

  10. IPD--the Immuno Polymorphism Database.

    PubMed

    Robinson, James; Mistry, Kavita; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G E

    2010-01-01

    The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors, IPD-MHC, is a database of sequences of the Major Histocompatibility Complex of different species; IPD-human platelet antigens, alloantigens expressed only on platelets and IPD-ESTDAB, which provides access to the European Searchable Tumour cell-line database, a cell bank of immunologically characterised melanoma cell lines. The data is currently available online from the website and ftp directory. PMID:19875415

  11. SymbioGenomesDB: a database for the integration and access to knowledge on host–symbiont relationships

    PubMed Central

    Reyes-Prieto, Mariana; Vargas-Chávez, Carlos; Latorre, Amparo; Moya, Andrés

    2015-01-01

    Symbiotic relationships occur naturally throughout the tree of life, either in a commensal, mutualistic or pathogenic manner. The genomes of multiple organisms involved in symbiosis are rapidly being sequenced and becoming available, especially those from the microbial world. Currently, there are numerous databases that offer information on specific organisms or models, but none offer a global understanding on relationships between organisms, their interactions and capabilities within their niche, as well as their role as part of a system, in this case, their role in symbiosis. We have developed the SymbioGenomesDB as a community database resource for laboratories which intend to investigate and use information on the genetics and the genomics of organisms involved in these relationships. The ultimate goal of SymbioGenomesDB is to host and support the growing and vast symbiotic–host relationship information, to uncover the genetic basis of such associations. SymbioGenomesDB maintains a comprehensive organization of information on genomes of symbionts from diverse hosts throughout the Tree of Life, including their sequences, their metadata and their genomic features. This catalog of relationships was generated using computational tools, custom R scripts and manual integration of data available in public literature. As a highly curated and comprehensive systems database, SymbioGenomesDB provides web access to all the information of symbiotic organisms, their features and links to the central database NCBI. Three different tools can be found within the database to explore symbiosis-related organisms, their genes and their genomes. Also, we offer an orthology search for one or multiple genes in one or multiple organisms within symbiotic relationships, and every table, graph and output file is downloadable and easy to parse for further analysis. The robust SymbioGenomesDB will be constantly updated to cope with all the data being generated and included in major

  12. SymbioGenomesDB: a database for the integration and access to knowledge on host-symbiont relationships.

    PubMed

    Reyes-Prieto, Mariana; Vargas-Chávez, Carlos; Latorre, Amparo; Moya, Andrés

    2015-01-01

    Symbiotic relationships occur naturally throughout the tree of life, either in a commensal, mutualistic or pathogenic manner. The genomes of multiple organisms involved in symbiosis are rapidly being sequenced and becoming available, especially those from the microbial world. Currently, there are numerous databases that offer information on specific organisms or models, but none offer a global understanding on relationships between organisms, their interactions and capabilities within their niche, as well as their role as part of a system, in this case, their role in symbiosis. We have developed the SymbioGenomesDB as a community database resource for laboratories which intend to investigate and use information on the genetics and the genomics of organisms involved in these relationships. The ultimate goal of SymbioGenomesDB is to host and support the growing and vast symbiotic-host relationship information, to uncover the genetic basis of such associations. SymbioGenomesDB maintains a comprehensive organization of information on genomes of symbionts from diverse hosts throughout the Tree of Life, including their sequences, their metadata and their genomic features. This catalog of relationships was generated using computational tools, custom R scripts and manual integration of data available in public literature. As a highly curated and comprehensive systems database, SymbioGenomesDB provides web access to all the information of symbiotic organisms, their features and links to the central database NCBI. Three different tools can be found within the database to explore symbiosis-related organisms, their genes and their genomes. Also, we offer an orthology search for one or multiple genes in one or multiple organisms within symbiotic relationships, and every table, graph and output file is downloadable and easy to parse for further analysis. The robust SymbioGenomesDB will be constantly updated to cope with all the data being generated and included in major

  13. A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE).

    PubMed

    Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja

    2014-01-01

    Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http

  14. A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)

    PubMed Central

    Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J.; Simonyan, Vahan; Mazumder, Raja

    2014-01-01

    Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http

  15. Digital Management and Curation of the National Rock and Ore Collections at NMNH, Smithsonian

    NASA Astrophysics Data System (ADS)

    Cottrell, E.; Andrews, B.; Sorensen, S. S.; Hale, L. J.

    2011-12-01

    The National Museum of Natural History, Smithsonian Institution, is home to the world's largest curated rock collection. The collection houses 160,680 physical rock and ore specimen lots ("samples"), all of which already have a digital record that can be accessed by the public through a searchable web interface (http://collections.mnh.si.edu/search/ms/). In addition, there are 66 accessions pending that when catalogued will add approximately 60,000 specimen lots. NMNH's collections are digitally managed on the KE EMu° platform which has emerged as the premier system for managing collections in natural history museums worldwide. In 2010 the Smithsonian released an ambitious 5 year Digitization Strategic Plan. In Mineral Sciences, new digitization efforts in the next five years will focus on integrating various digital resources for volcanic specimens. EMu sample records will link to the corresponding records for physical eruption information housed within the database of Smithsonian's Global Volcanism Program (GVP). Linkages are also planned between our digital records and geochemical databases (like EarthChem or PetDB) maintained by third parties. We anticipate that these linkages will increase the use of NMNH collections as well as engender new scholarly directions for research. Another large project the museum is currently undertaking involves the integration of the functionality of in-house designed Transaction Management software with the EMu database. This will allow access to the details (borrower, quantity, date, and purpose) of all loans of a given specimen through its catalogue record. We hope this will enable cross-referencing and fertilization of research ideas while avoiding duplicate efforts. While these digitization efforts are critical, we propose that the greatest challenge to sample curation is not posed by digitization and that a global sample registry alone will not ensure that samples are available for reuse. We suggest instead that the ability

  16. Improving the Acquisition and Management of Sample Curation Data

    NASA Technical Reports Server (NTRS)

    Todd, Nancy S.; Evans, Cindy A.; Labasse, Dan

    2011-01-01

    This paper discusses the current sample documentation processes used during and after a mission, examines the challenges and special considerations needed for designing effective sample curation data systems, and looks at the results of a simulated sample result mission and the lessons learned from this simulation. In addition, it introduces a new data architecture for an integrated sample Curation data system being implemented at the NASA Astromaterials Acquisition and Curation department and discusses how it improves on existing data management systems.

  17. Concrete Practices & Procedures. Instructor Manual. Trainee Manual.

    ERIC Educational Resources Information Center

    Laborers-AGC Education and Training Fund, Pomfret Center, CT.

    This packet consists of the instructor and trainee manuals for a concrete practices and procedures course. The instructor manual contains a schedule for an 80-hour, 10-day course and instructor outline. The outline provides a step-by-step description of the instructor's activities and includes answer sheets to accompany questions on information…

  18. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  19. Crowdsourcing and curation: perspectives from biology and natural language processing

    PubMed Central

    Hirschman, Lynette; Fort, Karën; Boué, Stéphanie; Kyrpides, Nikos; Islamaj Doğan, Rezarta; Cohen, Kevin Bretonnel

    2016-01-01

    Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging ‘the crowd’; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9–11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing. Database URL: http://www.mitre.org/publications/technical-papers/crowdsourcing-and-curation-perspectives PMID:27504010

  20. Crowdsourcing and curation: perspectives from biology and natural language processing.

    PubMed

    Hirschman, Lynette; Fort, Karën; Boué, Stéphanie; Kyrpides, Nikos; Islamaj Doğan, Rezarta; Cohen, Kevin Bretonnel

    2016-01-01

    Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging 'the crowd'; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9-11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.Database URL: http://www.mitre.org/publications/technical-papers/crowdsourcing-and-curation-perspectives. PMID:27504010

  1. Guidelines for establishing locus specific databases.

    PubMed

    Vihinen, Mauno; den Dunnen, Johan T; Dalgleish, Raymond; Cotton, Richard G H

    2012-02-01

    Information about genetic variation has been collected for some 20 years into registries, known as locus specific databases (LSDBs), which nowadays often contain information in addition to the actual genetic variation. Several issues have to be taken into account when considering establishing and maintaining LSDBs and these have been discussed previously in a number of articles describing guidelines and recommendations. This information is widely scattered and, for a newcomer, it would be difficult to obtain the latest information and guidance. Here, a sequence of steps essential for establishing an LSDB is discussed together with guidelines for each step. Curators need to collect information from various sources, code it in systematic way, and distribute to the research and clinical communities. In doing this, ethical issues have to be taken into account. To facilitate integration of information to, for example, analyze genotype-phenotype correlations, systematic data representation using established nomenclatures, data models, and ontologies is essential. LSDB curation and maintenance comprises a number of tasks that can be managed by following logical steps. These resources are becoming ever more important and new curators are essential to ensure that we will have expertly curated databases for all disease-related genes in the near future. PMID:22052659

  2. Electronic Databases.

    ERIC Educational Resources Information Center

    Williams, Martha E.

    1985-01-01

    Presents examples of bibliographic, full-text, and numeric databases. Also discusses how to access these databases online, aids to online retrieval, and several issues and trends (including copyright and downloading, transborder data flow, use of optical disc/videodisc technology, and changing roles in database generation and processing). (JN)

  3. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  4. Nutrient Control Design Manual

    EPA Science Inventory

    The Nutrient Control Design Manual will present an extensive state-of-the-technology review of the engineering design and operation of nitrogen and phosphorous control technologies and techniques applied at municipal wastewater treatment plants (WWTPs). This manual will present ...

  5. DESIGN MANUAL: PHOSPHORUS REMOVAL

    EPA Science Inventory

    This manual summarizes process design information for the best developed methods for removing phosphorus from wastewater. his manual discusses several proven phosphorus removal methods, including phosphorus removal obtainable through biological activity as well as chemical precip...

  6. KDYNA user's manual

    SciTech Connect

    Levatin, J.A.L.; Attia, A.V.; Hallquist, J.O.

    1990-09-28

    This report is a complete user's manual for KDYNA, the Earth Sciences version of DYNA2D. Because most features of DYNA2D have been retained in KDYNA much of this manual is identical to the DYNA2D user's manual.

  7. Nutrient Control Design Manual

    EPA Science Inventory

    The purpose of this EPA design manual is to provide updated, state‐of‐the‐technology design guidance on nitrogen and phosphorus control at municipal Wastewater Treatment Plants (WWTPs). Similar to previous EPA manuals, this manual contains extensive information on the principles ...

  8. New York State Library Data Base Users Manual.

    ERIC Educational Resources Information Center

    New York State Library, Albany.

    This manual is intended to provide users with a description of the information required by the New York State Library to adequately process a database search. Sections cover computer searching, what it and its advantages are; topic suitability; turnaround time for receipt of a database printout; reference interview to determine search topic;…

  9. New York State Library Data Base Users' Manual.

    ERIC Educational Resources Information Center

    New York State Library, Albany.

    This updated manual is intended to provide users with complete descriptions of the databases available for database searches, as well as full details of procedures for submitting search requests to the New York State Library. Two major changes in services scheduled to begin on January 1, 1985, are noted in a brief introduction. Sections cover…

  10. Unlocking the potential of survival data for model organisms through a new database and online analysis platform: SurvCurv

    PubMed Central

    Ziehm, Matthias; Thornton, Janet M

    2013-01-01

    Lifespan measurements, also called survival records, are a key phenotype in research on aging. If external hazards are excluded, aging alone determines the mortality in a population of model organisms. Understanding the biology of aging is highly desirable because of the benefits for the wide range of aging-related diseases. However, it is also extremely challenging because of the underlying complexity. Here, we describe SurvCurv, a new database and online resource focused on model organisms collating survival data for storage and analysis. All data in SurvCurv are manually curated and annotated. The database, available at http://www.ebi.ac.uk/thornton-srv/databases/SurvCurv/, offers various functions including plotting, Cox proportional hazards analysis, mathematical mortality models and statistical tests. It facilitates reanalysis and allows users to analyse their own data and compare it with the largest repository of model-organism data from published experiments, thus unlocking the potential of survival data and demographics in model organisms. PMID:23826631

  11. MediaDB: A Database of Microbial Growth Conditions in Defined Media

    PubMed Central

    Richards, Matthew A.; Cassen, Victor; Heavner, Benjamin D.; Ajami, Nassim E.; Herrmann, Andrea; Simeonidis, Evangelos; Price, Nathan D.

    2014-01-01

    Isolating pure microbial cultures and cultivating them in the laboratory on defined media is used to more fully characterize the metabolism and physiology of organisms. However, identifying an appropriate growth medium for a novel isolate remains a challenging task. Even organisms with sequenced and annotated genomes can be difficult to grow, despite our ability to build genome-scale metabolic networks that connect genomic data with metabolic function. The scientific literature is scattered with information about defined growth media used successfully for cultivating a wide variety of organisms, but to date there exists no centralized repository to inform efforts to cultivate less characterized organisms by bridging the gap between genomic data and compound composition for growth media. Here we present MediaDB, a manually curated database of defined media that have been used for cultivating organisms with sequenced genomes, with an emphasis on organisms with metabolic network models. The database is accessible online, can be queried by keyword searches or downloaded in its entirety, and can generate exportable individual media formulation files. The data assembled in MediaDB facilitate comparative studies of organism growth media, serve as a starting point for formulating novel growth media, and contribute to formulating media for in silico investigation of metabolic networks. MediaDB is freely available for public use at https://mediadb.systemsbiology.net. PMID:25098325

  12. ComPPI: a cellular compartment-specific database for protein–protein interaction network analysis

    PubMed Central

    Veres, Daniel V.; Gyurkó, Dávid M.; Thaler, Benedek; Szalay, Kristóf Z.; Fazekas, Dávid; Korcsmáros, Tamás; Csermely, Peter

    2015-01-01

    Here we present ComPPI, a cellular compartment-specific database of proteins and their interactions enabling an extensive, compartmentalized protein–protein interaction network analysis (URL: http://ComPPI.LinkGroup.hu). ComPPI enables the user to filter biologically unlikely interactions, where the two interacting proteins have no common subcellular localizations and to predict novel properties, such as compartment-specific biological functions. ComPPI is an integrated database covering four species (S. cerevisiae, C. elegans, D. melanogaster and H. sapiens). The compilation of nine protein–protein interaction and eight subcellular localization data sets had four curation steps including a manually built, comprehensive hierarchical structure of >1600 subcellular localizations. ComPPI provides confidence scores for protein subcellular localizations and protein–protein interactions. ComPPI has user-friendly search options for individual proteins giving their subcellular localization, their interactions and the likelihood of their interactions considering the subcellular localization of their interacting partners. Download options of search results, whole-proteomes, organelle-specific interactomes and subcellular localization data are available on its website. Due to its novel features, ComPPI is useful for the analysis of experimental results in biochemistry and molecular biology, as well as for proteome-wide studies in bioinformatics and network science helping cellular biology, medicine and drug design. PMID:25348397

  13. SubtiWiki 2.0--an integrated database for the model organism Bacillus subtilis.

    PubMed

    Michna, Raphael H; Zhu, Bingyao; Mäder, Ulrike; Stülke, Jörg

    2016-01-01

    To understand living cells, we need knowledge of each of their parts as well as about the interactions of these parts. To gain rapid and comprehensive access to this information, annotation databases are required. Here, we present SubtiWiki 2.0, the integrated database for the model bacterium Bacillus subtilis (http://subtiwiki.uni-goettingen.de/). SubtiWiki provides text-based access to published information about the genes and proteins of B. subtilis as well as presentations of metabolic and regulatory pathways. Moreover, manually curated protein-protein interactions diagrams are linked to the protein pages. Finally, expression data are shown with respect to gene expression under 104 different conditions as well as absolute protein quantification for cytoplasmic proteins. To facilitate the mobile use of SubtiWiki, we have now expanded it by Apps that are available for iOS and Android devices. Importantly, the App allows to link private notes and pictures to the gene/protein pages. Today, SubtiWiki has become one of the most complete collections of knowledge on a living organism in one single resource. PMID:26433225

  14. DDMGD: the database of text-mined associations between genes methylated in diseases from different species.

    PubMed

    Bin Raies, Arwa; Mansour, Hicham; Incitti, Roberto; Bajic, Vladimir B

    2015-01-01

    Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases. PMID:25398897

  15. EK3D: an E. coli K antigen 3-dimensional structure database

    PubMed Central

    Kunduru, Bharathi Reddy; Nair, Sanjana Anilkumar; Rathinavelan, Thenmalarchelvi

    2016-01-01

    A very high rate of multidrug resistance (MDR) seen among Gram-negative bacteria such as Escherichia, Klebsiella, Salmonella, Shigella, etc. is a major threat to public health and safety. One of the major virulent determinants of Gram-negative bacteria is capsular polysaccharide or K antigen located on the bacterial outer membrane surface, which is a potential drug & vaccine target. It plays a key role in host–pathogen interactions as well as host immune evasion and thus, mandates detailed structural information. Nonetheless, acquiring structural information of K antigens is not straightforward due to their innate enormous conformational flexibility. Here, we have developed a manually curated database of K antigens corresponding to various E. coli serotypes, which differ from each other in their monosaccharide composition, linkage between the monosaccharides and their stereoisomeric forms. Subsequently, we have modeled their 3D structures and developed an organized repository, namely EK3D that can be accessed through www.iith.ac.in/EK3D/. Such a database would facilitate the development of antibacterial drugs to combat E. coli infections as it has evolved resistance against 2 major drugs namely, third-generation cephalosporins and fluoroquinolones. EK3D also enables the generation of polymeric K antigens of varying lengths and thus, provides comprehensive information about E. coli K antigens. PMID:26615200

  16. MetaboLights: An Open-Access Database Repository for Metabolomics Data.

    PubMed

    Kale, Namrata S; Haug, Kenneth; Conesa, Pablo; Jayseelan, Kalaivani; Moreno, Pablo; Rocca-Serra, Philippe; Nainala, Venkata Chandrasekhar; Spicer, Rachel A; Williams, Mark; Li, Xuefei; Salek, Reza M; Griffin, Julian L; Steinbeck, Christoph

    2016-01-01

    MetaboLights is the first general purpose, open-access database repository for cross-platform and cross-species metabolomics research at the European Bioinformatics Institute (EMBL-EBI). Based upon the open-source ISA framework, MetaboLights provides Metabolomics Standard Initiative (MSI) compliant metadata and raw experimental data associated with metabolomics experiments. Users can upload their study datasets into the MetaboLights Repository. These studies are then automatically assigned a stable and unique identifier (e.g., MTBLS1) that can be used for publication reference. The MetaboLights Reference Layer associates metabolites with metabolomics studies in the archive and is extensively annotated with data fields such as structural and chemical information, NMR and MS spectra, target species, metabolic pathways, and reactions. The database is manually curated with no specific release schedules. MetaboLights is also recommended by journals for metabolomics data deposition. This unit provides a guide to using MetaboLights, downloading experimental data, and depositing metabolomics datasets using user-friendly submission tools. PMID:27010336

  17. Pathway databases and tools for their exploitation: benefits, current limitations and challenges

    PubMed Central

    Bauer-Mehren, Anna; Furlong, Laura I; Sanz, Ferran

    2009-01-01

    In past years, comprehensive representations of cell signalling pathways have been developed by manual curation from literature, which requires huge effort and would benefit from information stored in databases and from automatic retrieval and integration methods. Once a reconstruction of the network of interactions is achieved, analysis of its structural features and its dynamic behaviour can take place. Mathematical modelling techniques are used to simulate the complex behaviour of cell signalling networks, which ultimately sheds light on the mechanisms leading to complex diseases or helps in the identification of drug targets. A variety of databases containing information on cell signalling pathways have been developed in conjunction with methodologies to access and analyse the data. In principle, the scenario is prepared to make the most of this information for the analysis of the dynamics of signalling pathways. However, are the knowledge repositories of signalling pathways ready to realize the systems biology promise? In this article we aim to initiate this discussion and to provide some insights on this issue. PMID:19638971

  18. The BioGRID Interaction Database: 2011 update.

    PubMed

    Stark, Chris; Breitkreutz, Bobby-Joe; Chatr-Aryamontri, Andrew; Boucher, Lorrie; Oughtred, Rose; Livstone, Michael S; Nixon, Julie; Van Auken, Kimberly; Wang, Xiaodong; Shi, Xiaoqi; Reguly, Teresa; Rust, Jennifer M; Winter, Andrew; Dolinski, Kara; Tyers, Mike

    2011-01-01

    The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347,966 interactions (170,162 genetic, 177,804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23,000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48,831 human protein interactions that have been curated from 10,247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions. PMID:21071413

  19. The Role of Preoperative TIPSS to Facilitate Curative Gastric Surgery

    SciTech Connect

    Norton, S.A.; Vickers, J.; Callaway, M.P. Alderson, D.

    2003-08-15

    The use of TIPSS to facilitate radical curative upper gastrointestinal surgery has not been reported. We describe a case in which curative gastric resection was performed for carcinoma of the stomach after a preoperative TIPSS and embolization of a large gastric varix in a patient with portal hypertension.

  20. Plant Genome DataBase Japan (PGDBj): A Portal Website for the Integration of Plant Genome-Related Databases

    PubMed Central

    Asamizu, Erika; Ichihara, Hisako; Nakaya, Akihiro; Nakamura, Yasukazu; Hirakawa, Hideki; Ishii, Takahiro; Tamura, Takuro; Fukami-Kobayashi, Kaoru; Nakajima, Yukari; Tabata, Satoshi

    2014-01-01

    The Plant Genome DataBase Japan (PGDBj, http://pgdbj.jp/?ln=en) is a portal website that aims to integrate plant genome-related information from databases (DBs) and the literature. The PGDBj is comprised of three component DBs and a cross-search engine, which provides a seamless search over the contents of the DBs. The three DBs are as follows. (i) The Ortholog DB, providing gene cluster information based on the amino acid sequence similarity. Over 500,000 amino acid sequences of 20 Viridiplantae species were subjected to reciprocal BLAST searches and clustered. Sequences from plant genome DBs (e.g. TAIR10 and RAP-DB) were also included in the cluster with a direct link to the original DB. (ii) The Plant Resource DB, integrating the SABRE DB, which provides cDNA and genome sequence resources accumulated and maintained in the RIKEN BioResource Center and National BioResource Projects. (iii) The DNA Marker DB, providing manually or automatically curated information of DNA markers, quantitative trait loci and related linkage maps, from the literature and external DBs. As the PGDBj targets various plant species, including model plants, algae, and crops important as food, fodder and biofuel, researchers in the field of basic biology as well as a wide range of agronomic fields are encouraged to perform searches using DNA sequences, gene names, traits and phenotypes of interest. The PGDBj will return the search results from the component DBs and various types of linked external DBs. PMID:24363285

  1. Automated Database Mediation Using Ontological Metadata Mappings

    PubMed Central

    Marenco, Luis; Wang, Rixin; Nadkarni, Prakash

    2009-01-01

    Objective To devise an automated approach for integrating federated database information using database ontologies constructed from their extended metadata. Background One challenge of database federation is that the granularity of representation of equivalent data varies across systems. Dealing effectively with this problem is analogous to dealing with precoordinated vs. postcoordinated concepts in biomedical ontologies. Model Description The authors describe an approach based on ontological metadata mapping rules defined with elements of a global vocabulary, which allows a query specified at one granularity level to fetch data, where possible, from databases within the federation that use different granularities. This is implemented in OntoMediator, a newly developed production component of our previously described Query Integrator System. OntoMediator's operation is illustrated with a query that accesses three geographically separate, interoperating databases. An example based on SNOMED also illustrates the applicability of high-level rules to support the enforcement of constraints that can prevent inappropriate curator or power-user actions. Summary A rule-based framework simplifies the design and maintenance of systems where categories of data must be mapped to each other, for the purpose of either cross-database query or for curation of the contents of compositional controlled vocabularies. PMID:19567801

  2. Hayabusa-returned sample curation in the Planetary Material Sample Curation Facility of JAXA

    NASA Astrophysics Data System (ADS)

    Yada, Toru; Fujimura, Akio; Abe, Masanao; Nakamura, Tomoki; Noguchi, Takaaki; Okazaki, Ryuji; Nagao, Keisuke; Ishibashi, Yukihiro; Shirai, Kei; Zolensky, Michael E.; Sandford, Scott; Okada, Tatsuaki; Uesugi, Masayuki; Karouji, Yuzuru; Ogawa, Maho; Yakame, Shogo; Ueno, Munetaka; Mukai, Toshifumi; Yoshikawa, Makoto; Kawaguchi, Junichiro

    2014-02-01

    Abstract- The Planetary Material Sample Curation Facility of JAXA (PMSCF/JAXA) was established in Sagamihara, Kanagawa, Japan, to curate planetary material samples returned from space in conditions of minimum terrestrial contaminants. The performances for the curation of Hayabusa-returned samples had been checked with a series of comprehensive tests and rehearsals. After the Hayabusa spacecraft had accomplished a round-trip flight to asteroid 25143 Itokawa and returned its reentry capsule to the Earth in June 2010, the reentry capsule was brought back to the PMSCF/JAXA and was put to a series of processes to extract recovered samples from Itokawa. The particles recovered from the sample catcher were analyzed by electron microscope, given their ID, grouped into four categories, and preserved in dimples on quartz slide glasses. Some fraction of them has been distributed for initial analyses at NASA, and will be distributed for international announcement of opportunity (AO), but a certain fraction of them will be preserved in vacuum for future analyses.

  3. The Ribosomal Database Project (RDP).

    PubMed Central

    Maidak, B L; Olsen, G J; Larsen, N; Overbeek, R; McCaughey, M J; Woese, C R

    1996-01-01

    The Ribosomal Database Project (RDP) is a curated database that offers ribosome-related data, analysis services and associated computer programs. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams and various software for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (rdp.life.uiuc.edu), electronic mail (server@rdp.life.uiuc.edu), gopher (rdpgopher.life.uiuc.edu) and World Wide Web (WWW)(http://rdpwww.life.uiuc.edu/). The electronic mail and WWW servers provide ribosomal probe checking, screening for possible chimeric rRNA sequences, automated alignment and approximate phylogenetic placement of user-submitted sequences on an existing phylogenetic tree. PMID:8594608

  4. Statistical databases

    SciTech Connect

    Kogalovskii, M.R.

    1995-03-01

    This paper presents a review of problems related to statistical database systems, which are wide-spread in various fields of activity. Statistical databases (SDB) are referred to as databases that consist of data and are used for statistical analysis. Topics under consideration are: SDB peculiarities, properties of data models adequate for SDB requirements, metadata functions, null-value problems, SDB compromise protection problems, stored data compression techniques, and statistical data representation means. Also examined is whether the present Database Management Systems (DBMS) satisfy the SDB requirements. Some actual research directions in SDB systems are considered.

  5. The LIFEdb database in 2006

    PubMed Central

    Mehrle, Alexander; Rosenfelder, Heiko; Schupp, Ingo; del Val, Coral; Arlt, Dorit; Hahne, Florian; Bechtel, Stephanie; Simpson, Jeremy; Hofmann, Oliver; Hide, Winston; Glatting, Karl-Heinz; Huber, Wolfgang; Pepperkok, Rainer; Poustka, Annemarie; Wiemann, Stefan

    2006-01-01

    LIFEdb () integrates data from large-scale functional genomics assays and manual cDNA annotation with bioinformatics gene expression and protein analysis. New features of LIFEdb include (i) an updated user interface with enhanced query capabilities, (ii) a configurable output table and the option to download search results in XML, (iii) the integration of data from cell-based screening assays addressing the influence of protein-overexpression on cell proliferation and (iv) the display of the relative expression (‘Electronic Northern’) of the genes under investigation using curated gene expression ontology information. LIFEdb enables researchers to systematically select and characterize genes and proteins of interest, and presents data and information via its user-friendly web-based interface. PMID:16381901

  6. Literature mining of genetic variants for curation: quantifying the importance of supplementary material.

    PubMed

    Jimeno Yepes, Antonio; Verspoor, Karin

    2014-01-01

    A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientific literature. Thus, there have been several text mining systems developed to target extraction of mutations and other genetic variation from the literature. We have performed the first study of the use of text mining for the recovery of genetic variants curated directly from the literature. We consider two curated databases, COSMIC (Catalogue Of Somatic Mutations In Cancer) and InSiGHT (International Society for Gastro-intestinal Hereditary Tumours), that contain explicit links to the source literature for each included mutation. Our analysis shows that the recall of the mutations catalogued in the databases using a text mining tool is very low, despite the well-established good performance of the tool and even when the full text of the associated article is available for processing. We demonstrate that this discrepancy can be explained by considering the supplementary material linked to the published articles, not previously considered by text mining tools. Although it is anecdotally known that supplementary material contains 'all of the information', and some researchers have speculated about the role of supplementary material (Schenck et al. Extraction of genetic mutations associated with cancer from public literature. J Health Med Inform 2012;S2:2.), our analysis substantiates the significant extent to which this material is critical. Our results highlight the need for literature mining tools to consider not only the narrative content of a publication but also the full set of material related to a publication. PMID:24520105

  7. Database Manager

    ERIC Educational Resources Information Center

    Martin, Andrew

    2010-01-01

    It is normal practice today for organizations to store large quantities of records of related information as computer-based files or databases. Purposeful information is retrieved by performing queries on the data sets. The purpose of DATABASE MANAGER is to communicate to students the method by which the computer performs these queries. This…

  8. Maize databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  9. The Genesis Mission: Contamination Control and Curation

    NASA Technical Reports Server (NTRS)

    Stansbery, E. K.

    2002-01-01

    The Genesis mission, launched in August 2001, is collecting samples of the solar wind and will return to Earth in 2004. Genesis can be viewed as the most fundamental of NASA's sample return missions because it is expected to provide insight into the initial elemental and isotopic composition of the solar nebula from which all other planetary objects formed. The data from this mission will have a large impact on understanding the origins and diversity of planetary materials. The collectors consist of clean, pure materials into which the solar wind will imbed. Science and engineering issues such as bulk purity, cleanliness, retention of solar wind, and ability to withstand launch and entry drove material choices. Most of the collector materials are installed on array frames that are deployed from a clean science canister. Two of the arrays are continuously exposed for collecting the bulk solar wind; the other three are only exposed during specific solar wind regimes as measured by ion and electron monitors. Other materials are housed as targets at the focal point of an electrostatic mirror, or "concentrator", designed to enhance the flux of specific solar wind species. Johnson Space Center (JSC) has two principal responsibilities for the Genesis mission: contamination control and curation. Precise and accurate measurements of the composition of the solar atoms require that the collector materials be extremely clean and well characterized before launch and during the mission. Early involvement of JSC curation personnel in concept development resulted in a mission designed to minimize contaminants from the spacecraft and operations. A major goal of the Genesis mission is to provide a reservoir of materials for the 21 51 century. When the collector materials are returned to Earth, they must be handled in a clean manner and their condition well documented. Information gained in preliminary examination of the arrays and detailed surveys of each collector will be used to

  10. CSTEM User Manual

    NASA Technical Reports Server (NTRS)

    Hartle, M.; McKnight, R. L.

    2000-01-01

    This manual is a combination of a user manual, theory manual, and programmer manual. The reader is assumed to have some previous exposure to the finite element method. This manual is written with the idea that the CSTEM (Coupled Structural Thermal Electromagnetic-Computer Code) user needs to have a basic understanding of what the code is actually doing in order to properly use the code. For that reason, the underlying theory and methods used in the code are described to a basic level of detail. The manual gives an overview of the CSTEM code: how the code came into existence, a basic description of what the code does, and the order in which it happens (a flowchart). Appendices provide a listing and very brief description of every file used by the CSTEM code, including the type of file it is, what routine regularly accesses the file, and what routine opens the file, as well as special features included in CSTEM.

  11. Association between Perioperative Blood Transfusion and Oncologic Outcomes after Curative Surgery for Renal Cell Carcinoma

    PubMed Central

    Park, Yong Hyun; Kim, Yong-June; Kang, Seok Ho; Kim, Hyeon Hoe; Byun, Seok-Soo; Lee, Ji Youl; Hong, Sung-Hoo

    2016-01-01

    Purpose: We aimed to elucidate the association between perioperative blood transfusion (PBT) and the prognosis of patients undergoing curative surgery for renal cell carcinoma (RCC). Methods: In all, 3,832 patients with RCC who had undergone curative surgery were included in this study from a multicenter database. PBT was defined as the transfusion of packed red blood cells within seven days before surgery, during surgery, or within the postoperative hospitalization period. The association of PBT with oncologic outcomes was evaluated using univariate and multivariate Cox regression analyses, and regression adjustment with propensity score matching. Results: Overall, 11.7% (447/3,832) of patients received PBT. Patients receiving PBT were significantly older at diagnosis, and had lower BMI, higher comorbidities, worse ECOG performance status, and more initial symptoms. Moreover, higher pathologic TNM stage, larger mass size, higher nuclear grade, more sarcomatoid differentiation, and more tumor necrosis were all observed more frequently in patients who received PBT. In univariate analysis, relapse-free survival, cancer-specific survival, and overall survival rates were worse in patients who received PBT; however, these factors became insignificant in the matched pairs after propensity score matching. On multivariate Cox regression analysis and regression adjustment with propensity score matching, significant prognostic effects of PBT on disease relapse, cancer-specific mortality, and all-cause mortality were not observed. Conclusions: This multicenter database analysis demonstrates no significant prognostic association between PBT and oncologic outcomes in patients with RCC. PMID:27313787

  12. Industrial labor relations manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    The NASA Industrial Labor Relations Manual provides internal guidelines and procedures to assist NASA Field Installations in dealing with contractor labor management disputes, Service Contract Act variance hearings, and to provide access of Labor Union Representatives to NASA for the purpose of maintaining schedules and goals in connection with vital NASA programs. This manual will be revised by page changes as revisions become necessary. Initial distribution of this manual has been made to NASA Headquarters and Field Installations.

  13. FMiR: A Curated Resource of Mitochondrial DNA Information for Fish

    PubMed Central

    Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Pati, Rameshwar; Singh, Shri Prakash; Sarkar, Uttam Kumar

    2015-01-01

    Mitochondrial genome sequences have been widely used for evolutionary and phylogenetic studies. Among vertebrates, fish are an important, diverse group, and their mitogenome sequences are growing rapidly in public repositories. To facilitate mitochondrial genome analysis and to explore the valuable genetic information, we developed the Fish Mitogenome Resource (FMiR) database to provide a workbench for mitogenome annotation, species identification and microsatellite marker mining. The microsatellites are also known as simple sequence repeats (SSRs) and used as molecular markers in studies on population genetics, gene duplication and marker assisted selection. Here, easy-to-use tools have been implemented for mining SSRs and for designing primers to identify species/habitat specific markers. In addition, FMiR can analyze complete or partial mitochondrial genome sequence to identify species and to deduce relational distances among sequences across species. The database presently contains curated mitochondrial genomes from 1302 fish species belonging to 297 families and 47 orders reported from saltwater and freshwater ecosystems. In addition, the database covers information on fish species such as conservation status, ecosystem, family, distribution and occurrence downloaded from the FishBase and IUCN Red List databases. Those fish information have been used to browse mitogenome information for the species belonging to a particular category. The database is scalable in terms of content and inclusion of other analytical modules. The FMiR is running under Linux operating platform on high performance server accessible at URL http://mail.nbfgr.res.in/fmir. PMID:26317619

  14. Radiological Control Manual

    SciTech Connect

    Not Available

    1993-04-01

    This manual has been prepared by Lawrence Berkeley Laboratory to provide guidance for site-specific additions, supplements, and clarifications to the DOE Radiological Control Manual. The guidance provided in this manual is based on the requirements given in Title 10 Code of Federal Regulations Part 835, Radiation Protection for Occupational Workers, DOE Order 5480.11, Radiation Protection for Occupational Workers, and the DOE Radiological Control Manual. The topics covered are (1) excellence in radiological control, (2) radiological standards, (3) conduct of radiological work, (4) radioactive materials, (5) radiological health support operations, (6) training and qualification, and (7) radiological records.

  15. Instruct coders' manual

    NASA Technical Reports Server (NTRS)

    Friend, J.

    1971-01-01

    A manual designed both as an instructional manual for beginning coders and as a reference manual for the coding language INSTRUCT, is presented. The manual includes the major programs necessary to implement the teaching system and lists the limitation of current implementation. A detailed description is given of how to code a lesson, what buttons to push, and what utility programs to use. Suggestions for debugging coded lessons and the error messages that may be received during assembly or while running the lesson are given.

  16. EMSL Operations Manual

    SciTech Connect

    Foster, Nancy S.

    2009-03-25

    This manual is a general resource tool to assist EMSL users and Laboratory staff within EMSL locate official policy, practice and subject matter experts. It is not intended to replace or amend any formal Battelle policy or practice. Users of this manual should rely only on Battelle’s Standard Based Management System (SBMS) for official policy. No contractual commitment or right of any kind is created by this manual. Battelle management reserves the right to alter, change, or delete any information contained within this manual without prior notice.

  17. EMSL Operations Manual

    SciTech Connect

    Foster, Nancy S.

    2009-06-18

    This manual is a general resource tool to assist EMSL users and Laboratory staff within EMSL locate official policy, practice and subject matter experts. It is not intended to replace or amend any formal Battelle policy or practice. Users of this manual should rely only on Battelle’s Standard Based Management System (SBMS) for official policy. No contractual commitment or right of any kind is created by this manual. Battelle management reserves the right to alter, change, or delete any information contained within this manual without prior notice.

  18. Enriching Earthdata by Improving Content Curation

    NASA Astrophysics Data System (ADS)

    Bagwell, R.; Wong, M. M.; Murphy, K. J.

    2014-12-01

    Since the launch of Earthdata in the later part of 2011, there has been an emphasis on improving the user experience and providing more enriched content to the user, ultimately with the focus to bring the "pixels to the people" or to ensure that a user clicks the fewest amount of times to get to the data, tools, or information which they seek. Earthdata was founded to be a single source of information for Earth Observing System Data and Information System (EOSDIS) components and services as a conglomeration between over 15 different websites. With an increased focus on access to Earth science data, the recognition is now on transforming Earthdata from a static website to one that is a dynamic, data-driven site full of enriched content.In the near future, Earthdata will have a number of components that will drive the access to the data, such as Earthdata Search, the Common Metadata Repository (CMR), and a redesign of the Earthdata website. The focus on content curation will be to leverage the use of these components to provide an enriched content environment and a better overall user experience, with an emphasis on Earthdata being "powered by EOSDIS" components and services.

  19. A curated census of autophagy-modulating proteins and small molecules: candidate targets for cancer therapy.

    PubMed

    Lorenzi, Philip L; Claerhout, Sofie; Mills, Gordon B; Weinstein, John N

    2014-07-01

    Autophagy, a programmed process in which cell contents are delivered to lysosomes for degradation, appears to have both tumor-suppressive and tumor-promoting functions; both stimulation and inhibition of autophagy have been reported to induce cancer cell death, and particular genes and proteins have been associated both positively and negatively with autophagy. To provide a basis for incisive analysis of those complexities and ambiguities and to guide development of new autophagy-targeted treatments for cancer, we have compiled a comprehensive, curated inventory of autophagy modulators by integrating information from published siRNA screens, multiple pathway analysis algorithms, and extensive, manually curated text-mining of the literature. The resulting inventory includes 739 proteins and 385 chemicals (including drugs, small molecules, and metabolites). Because autophagy is still at an early stage of investigation, we provide extensive analysis of our sources of information and their complex relationships with each other. We conclude with a discussion of novel strategies that could potentially be used to target autophagy for cancer therapy. PMID:24906121

  20. MaizeGDB Community Curation Tools

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB (http://www.maizegdb.org) is the community database for maize genetics and genomics. The success of the MaizeGDB project largely can be attributed to the involvement of the community of maize geneticists. Members of the community have (1) made their data available by contributing to MaizeGD...

  1. Demonstration and Validation Assets: User Manual Development

    SciTech Connect

    2008-06-30

    This report documents the development of a database-supported user manual for DEMVAL assets in the NSTI area of operations and focuses on providing comprehensive user information on DEMVAL assets serving businesses with national security technology applications in southern New Mexico. The DEMVAL asset program is being developed as part of the NSPP, funded by both Department of Energy (DOE) and NNSA. This report describes the development of a comprehensive user manual system for delivering indexed DEMVAL asset information to be used in marketing and visibility materials and to NSTI clients, prospective clients, stakeholders, and any person or organization seeking it. The data about area DEMVAL asset providers are organized in an SQL database with updateable application structure that optimizes ease of access and customizes search ability for the user.

  2. Agile data management for curation of genomes to watershed datasets

    NASA Astrophysics Data System (ADS)

    Varadharajan, C.; Agarwal, D.; Faybishenko, B.; Versteeg, R.

    2015-12-01

    A software platform is being developed for data management and assimilation [DMA] as part of the U.S. Department of Energy's Genomes to Watershed Sustainable Systems Science Focus Area 2.0. The DMA components and capabilities are driven by the project science priorities and the development is based on agile development techniques. The goal of the DMA software platform is to enable users to integrate and synthesize diverse and disparate field, laboratory, and simulation datasets, including geological, geochemical, geophysical, microbiological, hydrological, and meteorological data across a range of spatial and temporal scales. The DMA objectives are (a) developing an integrated interface to the datasets, (b) storing field monitoring data, laboratory analytical results of water and sediments samples collected into a database, (c) providing automated QA/QC analysis of data and (d) working with data providers to modify high-priority field and laboratory data collection and reporting procedures as needed. The first three objectives are driven by user needs, while the last objective is driven by data management needs. The project needs and priorities are reassessed regularly with the users. After each user session we identify development priorities to match the identified user priorities. For instance, data QA/QC and collection activities have focused on the data and products needed for on-going scientific analyses (e.g. water level and geochemistry). We have also developed, tested and released a broker and portal that integrates diverse datasets from two different databases used for curation of project data. The development of the user interface was based on a user-centered design process involving several user interviews and constant interaction with data providers. The initial version focuses on the most requested feature - i.e. finding the data needed for analyses through an intuitive interface. Once the data is found, the user can immediately plot and download data

  3. CottonGen: a genomics, genetics and breeding database for cotton research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, vis...

  4. MaizeGDB update: New tools, data, and interface for the maize model organism database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB is a highly curated, community-oriented database and informatics service to researchers focused on the crop plant and model organism Zea mays ssp. mays. Although some form of the maize community database has existed over the last 25 years, there have only been two major releases. In 1991, ...

  5. The plant phenological online database (PPODB): an online database for long-term phenological data.

    PubMed

    Dierenbach, Jonas; Badeck, Franz-W; Schaber, Jörg

    2013-09-01

    We present an online database that provides unrestricted and free access to over 16 million plant phenological observations from over 8,000 stations in Central Europe between the years 1880 and 2009. Unique features are (1) a flexible and unrestricted access to a full-fledged database, allowing for a wide range of individual queries and data retrieval, (2) historical data for Germany before 1951 ranging back to 1880, and (3) more than 480 curated long-term time series covering more than 100 years for individual phenological phases and plants combined over Natural Regions in Germany. Time series for single stations or Natural Regions can be accessed through a user-friendly graphical geo-referenced interface. The joint databases made available with the plant phenological database PPODB render accessible an important data source for further analyses of long-term changes in phenology. The database can be accessed via www.ppodb.de . PMID:23512285

  6. Phylesystem: a git-based data store for community-curated phylogenetic estimates

    PubMed Central

    McTavish, Emily Jane; Hinchliff, Cody E.; Allman, James F.; Brown, Joseph W.; Cranston, Karen A.; Rees, Jonathan A.; Smith, Stephen A.

    2015-01-01

    Motivation: Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al., 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. Results: Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git’s version control features. Hosting this data store on GitHub (http://github.com/) provides open access to the data store using tools familiar to many developers. We have deployed a server running the ‘phylesystem-api’, which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. Availability and implementation: Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api. The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem. A web

  7. Nuts and Bolts — Techniques for Genesis Sample Curation

    NASA Astrophysics Data System (ADS)

    Burkett, P. J.; Rodriguez, M. C.; Allton, J. H.

    2011-03-01

    The Genesis curation staff at NASA JSC provides samples and data for analysis. We are showing: 1) techniques for characterization and measurement of shards; 2) allocation methods; and 3) status of the catalog by collector material, regime, and size.

  8. BIOMARKERS DATABASE

    EPA Science Inventory

    This database was developed by assembling and evaluating the literature relevant to human biomarkers. It catalogues and evaluates the usefulness of biomarkers of exposure, susceptibility and effect which may be relevant for a longitudinal cohort study. In addition to describing ...

  9. Boating Safety Training Manual.

    ERIC Educational Resources Information Center

    Coast Guard, Washington, DC.

    The training manual serves as the text for the Coast Guard's boating safety 32-hour course and for the D-8 Qualification Code Recertification Course. The manual is designed for self-study or for use with an instructor-led course. Each chapter concludes with a quiz to be used as a review of chaper content. Opening chapters review the use of the…

  10. Home Maintenance Manual.

    ERIC Educational Resources Information Center

    Richmond, Jim; And Others

    This manual, written especially for the Navajo and Hopi Indian Relocation Commission, is a simply worded, step-by-step guide to home maintenance for new homeowners. It can be used for self-study or it can serve as instructional material for a training class on home ownership. The manual is organized in nine sections that cover the following…

  11. Manual for Refugee Sponsorship.

    ERIC Educational Resources Information Center

    Brewer, Kathleen; Taran, Patrick A.

    This manual provides guidelines for religious congregations interested in setting up programs to sponsor refugees. The manual describes the psychological implications of the refugee experience and discusses initial steps in organizing for sponsorship. Detailed information is provided for sponsors regarding: finances and mobilization of resources;…

  12. School Fire Safety Manual.

    ERIC Educational Resources Information Center

    Arkansas State Dept. of Education, Little Rock. General Education Div.

    This manual provides the background information necessary for the planning of school fire safety programs by local school officials, particularly in Arkansas. The manual first discusses the need for such programs and cites the Arkansas state law regarding them. Policies established by the Arkansas State Board of Education to implement the legal…

  13. School District Energy Manual.

    ERIC Educational Resources Information Center

    Association of School Business Officials International, Reston, VA.

    This manual serves as an energy conservation reference and management guide for school districts. The School District Energy Program (SDEP) is designed to provide information and/or assistance to school administrators planning to implement a comprehensive energy management program. The manual consists of 15 parts. Part 1 describes the SDEP; Parts…

  14. Functional Handwriting Manual.

    ERIC Educational Resources Information Center

    Metzger, Louise; Lehotsky, Rutheda R.

    An inservice project to review the functional handwriting being taught in the Williamsport, Pennsylvania, school district produced a handwriting manual that provides teachers and students with models of letter forms and instructional exercises leading to the development of an individualized style of handwriting. The manual describes student…

  15. NYS Foster Parent Manual

    ERIC Educational Resources Information Center

    McBride, Rebecca

    2007-01-01

    This manual was developed for use in foster parents' day-to-day life with the children in their care. It gives them practical information on topics like medical care, payments, and the role of the court, and also provides guidance on areas like welcoming a child, discipline, and parent visits. The manual emphasizes the role of foster parents in…

  16. Indoor Air Quality Manual.

    ERIC Educational Resources Information Center

    Baldwin Union Free School District, NY.

    This manual identifies ways to improve a school's indoor air quality (IAQ) and discusses practical actions that can be carried out by school staff in managing air quality. The manual includes discussions of the many sources contributing to school indoor air pollution and the preventive planning for each including renovation and repair work,…

  17. Learning Resources Evaluations Manual.

    ERIC Educational Resources Information Center

    Nunes, Evelyn H., Ed.

    This manual contains evaluations of 196 instructional products listed in Virginia's Adult Basic Education Curricula Resource Catalog. It is intended as a convenient reference manual for making informed decisions concerning materials for adult learners in adult basic education, English-as-a-Second-Language instruction, and general educational…

  18. Materials inventory management manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    This NASA Materials Inventory Management Manual (NHB 4100.1) is issued pursuant to Section 203(c)(1) of the National Aeronautics and Space Act of 1958 (42 USC 2473). It sets forth policy, performance standards, and procedures governing the acquisition, management and use of materials. This Manual is effective upon receipt.

  19. Circulation Aide Training Manual.

    ERIC Educational Resources Information Center

    Bergeson, Alan O.

    This training manual provides instruction on shelving and other duties for student assistants in the learning resources center at the College of Dupage, located in Illinois. It is noted that prospective student circulation aides are required to read the manual and pass a written test on policies and procedures before they are allowed to shelve…

  20. EPA VAN OPERATIONAL MANUAL

    EPA Science Inventory

    The manual generally describes the EPA Van, and discusses both its energy control system and Van operation. The manual includes instructions for the Van's transportation, setup, safety, troubleshooting, and maintenance. The Van is a mobile research unit, designed for testing in v...

  1. Biology Laboratory Safety Manual.

    ERIC Educational Resources Information Center

    Case, Christine L.

    The Centers for Disease Control (CDC) recommends that schools prepare or adapt a biosafety manual, and that instructors develop a list of safety procedures applicable to their own lab and distribute it to each student. In this way, safety issues will be brought to each student's attention. This document is an example of such a manual. It contains…

  2. Technical Manual. The ACT®

    ERIC Educational Resources Information Center

    ACT, Inc., 2014

    2014-01-01

    This manual contains technical information about the ACT® college readiness assessment. The principal purpose of this manual is to document the technical characteristics of the ACT in light of its intended purposes. ACT regularly conducts research as part of the ongoing formative evaluation of its programs. The research is intended to ensure that…

  3. Miami University Information Manual.

    ERIC Educational Resources Information Center

    Miami Univ., Oxford, OH.

    The 1975 information manual is designed to provide current data on policies, procedures, services, facilities, organization and governance of Miami University and, through the extensive index, quick access to this information. The manual is complementary to the university catalog and directory. Information relating to students is in the Student…

  4. Dental Charting. Student's Manual.

    ERIC Educational Resources Information Center

    Weaver, Trudy Karlene; Apfel, Maura

    This manual is part of a series dealing with skills and information needed by students in dental assisting. The individualized student materials are suitable for classroom, laboratory, or cooperative training programs. This student manual contains four units covering the following topics: dental anatomical terminology; tooth numbering systems;…

  5. The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands.

    PubMed

    Southan, Christopher; Sharman, Joanna L; Benson, Helen E; Faccenda, Elena; Pawson, Adam J; Alexander, Stephen P H; Buneman, O Peter; Davenport, Anthony P; McGrath, John C; Peters, John A; Spedding, Michael; Catterall, William A; Fabbro, Doriano; Davies, Jamie A

    2016-01-01

    The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, http://www.guidetopharmacology.org) provides expert-curated molecular interactions between successful and potential drugs and their targets in the human genome. Developed by the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS), this resource, and its earlier incarnation as IUPHAR-DB, is described in our 2014 publication. This update incorporates changes over the intervening seven database releases. The unique model of content capture is based on established and new target class subcommittees collaborating with in-house curators. Most information comes from journal articles, but we now also index kinase cross-screening panels. Targets are specified by UniProtKB IDs. Small molecules are defined by PubChem Compound Identifiers (CIDs); ligand capture also includes peptides and clinical antibodies. We have extended the capture of ligands and targets linked via published quantitative binding data (e.g. Ki, IC50 or Kd). The resulting pharmacological relationship network now defines a data-supported druggable genome encompassing 7% of human proteins. The database also provides an expanded substrate for the biennially published compendium, the Concise Guide to PHARMACOLOGY. This article covers content increase, entity analysis, revised curation strategies, new website features and expanded download options. PMID:26464438

  6. The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands

    PubMed Central

    Southan, Christopher; Sharman, Joanna L.; Benson, Helen E.; Faccenda, Elena; Pawson, Adam J.; Alexander, Stephen P. H.; Buneman, O. Peter; Davenport, Anthony P.; McGrath, John C.; Peters, John A.; Spedding, Michael; Catterall, William A.; Fabbro, Doriano; Davies, Jamie A.

    2016-01-01

    The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, http://www.guidetopharmacology.org) provides expert-curated molecular interactions between successful and potential drugs and their targets in the human genome. Developed by the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS), this resource, and its earlier incarnation as IUPHAR-DB, is described in our 2014 publication. This update incorporates changes over the intervening seven database releases. The unique model of content capture is based on established and new target class subcommittees collaborating with in-house curators. Most information comes from journal articles, but we now also index kinase cross-screening panels. Targets are specified by UniProtKB IDs. Small molecules are defined by PubChem Compound Identifiers (CIDs); ligand capture also includes peptides and clinical antibodies. We have extended the capture of ligands and targets linked via published quantitative binding data (e.g. Ki, IC50 or Kd). The resulting pharmacological relationship network now defines a data-supported druggable genome encompassing 7% of human proteins. The database also provides an expanded substrate for the biennially published compendium, the Concise Guide to PHARMACOLOGY. This article covers content increase, entity analysis, revised curation strategies, new website features and expanded download options. PMID:26464438

  7. Semantic mediation in the national geologic map database (US)

    USGS Publications Warehouse

    Percy, D.; Richard, S.; Soller, D.

    2008-01-01

    Controlled language is the primary challenge in merging heterogeneous databases of geologic information. Each agency or organization produces databases with different schema, and different terminology for describing the objects within. In order to make some progress toward merging these databases using current technology, we have developed software and a workflow that allows for the "manual semantic mediation" of these geologic map databases. Enthusiastic support from many state agencies (stakeholders and data stewards) has shown that the community supports this approach. Future implementations will move toward a more Artificial Intelligence-based approach, using expert-systems or knowledge-bases to process data based on the training sets we have developed manually.

  8. Supervision: Trainer's Manual and Resource Manual.

    ERIC Educational Resources Information Center

    Eliasoph, Beverly; And Others

    This manual is designed to train those helping professionals who carry supervisory responsibility for the work of counselors in drug or other substance abuse programs. The purpose of this training course include identification and desceiption of supervision forms, processes and skills, and development of supervisory competencies useful in the work…

  9. Petty Cash. Student's Manual and Instructor's Manual.

    ERIC Educational Resources Information Center

    McElveen, Peggy C.

    Both a set of student materials and an instructor's manual on maintaining a petty cash fund are included in this packet, which is one of a series. The student materials include a pretest, five learning activities which contain the information and forms needed to complete the activities, a student self-check, with each activity, and a posttest. The…

  10. Taking Inventory. Student's Manual and Instructor's Manual.

    ERIC Educational Resources Information Center

    Hamer, Jean

    Supporting performance objective 56 of the V-TECS (Vocational-Technical Education Consortium of States) Secretarial Catalog, both a set of student materials and an instructor's manual on taking inventory are included in this packet. (The packet is the first in a set of nine on performing computational clerical activities--CE 016 951-959.) The…

  11. Writing Checks. Student's Manual and Instructor's Manual.

    ERIC Educational Resources Information Center

    Hamer, Jean

    Supporting performance objective 54 of the V-TECS (Vocational-Technical Education Consortium of States) Secretarial Catalog, both a set of student materials and an instructor's manual on writing checks are included in this packet. (The packet is the sixth in a set of nine on performing computational clerical activities--CE 016 951-959.) The…

  12. Filing Geographically. Student's Manual and Instructor's Manual.

    ERIC Educational Resources Information Center

    McLeod, Sadie

    Supporting performance objective 23 of the V-TECS (Vocational-Technical Education Consortium of States) Secretarial Catalog, both a set of student materials and an instructor's manual on filing materials geographically are included in this packet. (The packet is the fifth in a set of nine on maintaining files and a library--CE 016 939-947.) The…

  13. Filing Numerically. Student's Manual and Instructor's Manual.

    ERIC Educational Resources Information Center

    McLeod, Sadie

    Supporting performance objective 22 of the V-TECS (Vocational-Technical Education Consortium of States) Secretarial Catalog, both a set of student materials and an instructor's manual on filing materials numerically are included in this packet. (The packet is the fourth in a set of nine on maintaining files and a library--CE 016 939-947.) The…

  14. Chain Feeding. Student's Manual and Instructor's Manual.

    ERIC Educational Resources Information Center

    Snapp, Jane

    Supporting performance objectives 82, 76, and 85 of the V-TECS (Vocational-Technical Education Consortium of States) Secretarial Catalog, both a set of student materials and an instructor's manual on chain feeding techniques and correction methods are included in this packet. (The packet is the sixth in a set of fifteen on typewriting--CE 016…

  15. Completing Invoices. Student's Manual and Instructor's Manual.

    ERIC Educational Resources Information Center

    Hamer, Jean

    Supporting performance objective 46 of the V-TECS (Vocational-Technical Education Consortium of States) Secretarial Catalog, both a set of student materials and an instructor's manual on completing invoices are included in this packet. (The packet is the fifth in a set of nine on performing computational clerical activities--CE 016 951-959.) The…

  16. DESIGN MANUAL: MUNICIPAL WASTEWATER DISINFECTION

    EPA Science Inventory

    This manual provides a comprehensive source of information to be used in the design of disinfection facilities for municipal wastewater treatment plants. he manual includes design information on halogenation/dehalogenation, ozonation, and ultraviolet radiation. he manual presents...

  17. OCDB: a database collecting genes, miRNAs and drugs for obsessive-compulsive disorder

    PubMed Central

    Privitera, Anna P.; Distefano, Rosario; Wefer, Hugo A.; Ferro, Alfredo; Pulvirenti, Alfredo; Giugno, Rosalba

    2015-01-01

    Obsessive-compulsive disorder (OCD) is a psychiatric condition characterized by intrusive and unwilling thoughts (obsessions) giving rise to anxiety. The patients feel obliged to perform a behavior (compulsions) induced by the obsessions. The World Health Organization ranks OCD as one of the 10 most disabling medical conditions. In the class of Anxiety Disorders, OCD is a pathology that shows an hereditary component. Consequently, an online resource collecting and integrating scientific discoveries and genetic evidence about OCD would be helpful to improve the current knowledge on this disorder. We have developed a manually curated database, OCD Database (OCDB), collecting the relations between candidate genes in OCD, microRNAs (miRNAs) involved in the pathophysiology of OCD and drugs used in its treatments. We have screened articles from PubMed and MEDLINE. For each gene, the bibliographic references with a brief description of the gene and the experimental conditions are shown. The database also lists the polymorphisms within genes and its chromosomal regions. OCDB data is enriched with both validated and predicted miRNA-target and drug-target information. The transcription factors regulations, which are also included, are taken from David and TransmiR. Moreover, a scoring function ranks the relevance of data in the OCDB context. The database is also integrated with the main online resources (PubMed, Entrez-gene, HGNC, dbSNP, DrugBank, miRBase, PubChem, Kegg, Disease-ontology and ChEBI). The web interface has been developed using phpMyAdmin and Bootstrap software. This allows (i) to browse data by category and (ii) to navigate in the database by searching genes, miRNAs, drugs, SNPs, regions, drug targets and articles. The data can be exported in textual format as well as the whole database in.sql or tabular format. OCDB is an essential resource to support genome-wide analysis, genetic and pharmacological studies. It also facilitates the evaluation of genetic data

  18. Curating NASA's Future Extraterrestrial Sample Collections: How Do We Achieve Maximum Proficiency?

    NASA Technical Reports Server (NTRS)

    McCubbin, Francis; Evans, Cynthia; Zeigler, Ryan; Allton, Judith; Fries, Marc; Righter, Kevin; Zolensky, Michael

    2016-01-01

    The Astromaterials Acquisition and Curation Office (henceforth referred to herein as NASA Curation Office) at NASA Johnson Space Center (JSC) is responsible for curating all of NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials", JSC is charged with "The curation of all extraterrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including "... documentation, preservation, preparation, and distribution of samples for research, education, and public outreach." Here we describe some of the ongoing efforts to ensure that the future activities of the NASA Curation Office are working towards a state of maximum proficiency.

  19. Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems

    PubMed Central

    Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K.; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C.; Hoeng, Julia

    2015-01-01

    With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com PMID:25887162

  20. SubtiWiki 2.0—an integrated database for the model organism Bacillus subtilis

    PubMed Central

    Michna, Raphael H.; Zhu, Bingyao; Mäder, Ulrike; Stülke, Jörg

    2016-01-01

    To understand living cells, we need knowledge of each of their parts as well as about the interactions of these parts. To gain rapid and comprehensive access to this information, annotation databases are required. Here, we present SubtiWiki 2.0, the integrated database for the model bacterium Bacillus subtilis (http://subtiwiki.uni-goettingen.de/). SubtiWiki provides text-based access to published information about the genes and proteins of B. subtilis as well as presentations of metabolic and regulatory pathways. Moreover, manually curated protein-protein interactions diagrams are linked to the protein pages. Finally, expression data are shown with respect to gene expression under 104 different conditions as well as absolute protein quantification for cytoplasmic proteins. To facilitate the mobile use of SubtiWiki, we have now expanded it by Apps that are available for iOS and Android devices. Importantly, the App allows to link private notes and pictures to the gene/protein pages. Today, SubtiWiki has become one of the most complete collections of knowledge on a living organism in one single resource. PMID:26433225

  1. hERGAPDbase: a database documenting hERG channel inhibitory potentials and APD-prolongation activities of chemical compounds.

    PubMed

    Hishigaki, Haretsugu; Kuhara, Satoru

    2011-01-01

    Drug-induced QT interval prolongation is one of the most common reasons for the withdrawal of drugs from the market. In the past decade, at least nine drugs, i.e. terfenadine, astemizole, grepafloxacin, terodiline, droperidol, lidoflazine, sertindole, levomethadyl and cisapride, have been removed from the market or their use has been severely restricted because of drug-induced QT interval prolongation. Therefore, this irregularity is a major safety concern in the case of drugs submitted for regulatory approval. The most common mechanism of drug-induced QT interval prolongation may be drug-related inhibition of the human ether-á-go-go-related gene (hERG) channel, which subsequently results in prolongation of the cardiac action potential duration (APD). hERGAPDbase is a database of electrophysiological experimental data documenting potential hERG channel inhibitory actions and the APD-prolongation activities of chemical compounds. All data entries are manually collected from scientific papers and curated by a person. With hERGAPDbase, we aim to provide useful information for chemical and pharmacological scientists and enable easy access to electrophysiological experimental data on chemical compounds. Database URL: http://www.grt.kyushu-u.ac.jp/hergapdbase/. PMID:21586548

  2. ARACNe-based inference, using curated microarray data, of Arabidopsis thaliana root transcriptional regulatory networks

    PubMed Central

    2014-01-01

    Background Uncovering the complex transcriptional regulatory networks (TRNs) that underlie plant and animal development remains a challenge. However, a vast amount of data from public microarray experiments is available, which can be subject to inference algorithms in order to recover reliable TRN architectures. Results In this study we present a simple bioinformatics methodology that uses public, carefully curated microarray data and the mutual information algorithm ARACNe in order to obtain a database of transcriptional interactions. We used data from Arabidopsis thaliana root samples to show that the transcriptional regulatory networks derived from this database successfully recover previously identified root transcriptional modules and to propose new transcription factors for the SHORT ROOT/SCARECROW and PLETHORA pathways. We further show that these networks are a powerful tool to integrate and analyze high-throughput expression data, as exemplified by our analysis of a SHORT ROOT induction time-course microarray dataset, and are a reliable source for the prediction of novel root gene functions. In particular, we used our database to predict novel genes involved in root secondary cell-wall synthesis and identified the MADS-box TF XAL1/AGL12 as an unexpected participant in this process. Conclusions This study demonstrates that network inference using carefully curated microarray data yields reliable TRN architectures. In contrast to previous efforts to obtain root TRNs, that have focused on particular functional modules or tissues, our root transcriptional interactions provide an overview of the transcriptional pathways present in Arabidopsis thaliana roots and will likely yield a plethora of novel hypotheses to be tested experimentally. PMID:24739361

  3. Experiment Databases

    NASA Astrophysics Data System (ADS)

    Vanschoren, Joaquin; Blockeel, Hendrik

    Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.

  4. AromaDeg, a novel database for phylogenomics of aerobic bacterial degradation of aromatics

    PubMed Central

    Duarte, Márcia; Jauregui, Ruy; Vilchez-Vargas, Ramiro; Junca, Howard; Pieper, Dietmar H.

    2014-01-01

    Understanding prokaryotic transformation of recalcitrant pollutants and the in-situ metabolic nets require the integration of massive amounts of biological data. Decades of biochemical studies together with novel next-generation sequencing data have exponentially increased information on aerobic aromatic degradation pathways. However, the majority of protein sequences in public databases have not been experimentally characterized and homology-based methods are still the most routinely used approach to assign protein function, allowing the propagation of misannotations. AromaDeg is a web-based resource targeting aerobic degradation of aromatics that comprises recently updated (September 2013) and manually curated databases constructed based on a phylogenomic approach. Grounded in phylogenetic analyses of protein sequences of key catabolic protein families and of proteins of documented function, AromaDeg allows query and data mining of novel genomic, metagenomic or metatranscriptomic data sets. Essentially, each query sequence that match a given protein family of AromaDeg is associated to a specific cluster of a given phylogenetic tree and further function annotation and/or substrate specificity may be inferred from the neighboring cluster members with experimentally validated function. This allows a detailed characterization of individual protein superfamilies as well as high-throughput functional classifications. Thus, AromaDeg addresses the deficiencies of homology-based protein function prediction, combining phylogenetic tree construction and integration of experimental data to obtain more accurate annotations of new biological data related to aerobic aromatic biodegradation pathways. We pursue in future the expansion of AromaDeg to other enzyme families involved in aromatic degradation and its regular update. Database URL: http://aromadeg.siona.helmholtz-hzi.de PMID:25468931

  5. Animal model integration to AutDB, a genetic database for autism

    PubMed Central

    2011-01-01

    Background In the post-genomic era, multi-faceted research on complex disorders such as autism has generated diverse types of molecular information related to its pathogenesis. The rapid accumulation of putative candidate genes/loci for Autism Spectrum Disorders (ASD) and ASD-related animal models poses a major challenge for systematic analysis of their content. We previously created the Autism Database (AutDB) to provide a publicly available web portal for ongoing collection, manual annotation, and visualization of genes linked to ASD. Here, we describe the design, development, and integration of a new module within AutDB for ongoing collection and comprehensive cataloguing of ASD-related animal models. Description As with the original AutDB, all data is extracted from published, peer-reviewed scientific literature. Animal models are annotated with a new standardized vocabulary of phenotypic terms developed by our researchers which is designed to reflect the diverse clinical manifestations of ASD. The new Animal Model module is seamlessly integrated to AutDB for dissemination of diverse information related to ASD. Animal model entries within the new module are linked to corresponding candidate genes in the original "Human Gene" module of the resource, thereby allowing for cross-modal navigation between gene models and human gene studies. Although the current release of the Animal Model module is restricted to mouse models, it was designed with an expandable framework which can easily incorporate additional species and non-genetic etiological models of autism in the future. Conclusions Importantly, this modular ASD database provides a platform from which data mining, bioinformatics, and/or computational biology strategies may be adopted to develop predictive disease models that may offer further insights into the molecular underpinnings of this disorder. It also serves as a general model for disease-driven databases curating phenotypic characteristics of

  6. Creating a VAPEPS database: A VAPEPS tutorial

    NASA Technical Reports Server (NTRS)

    Graves, George

    1989-01-01

    A procedural method is outlined for creating a Vibroacoustic Payload Environment Prediction System (VAPEPS) Database. The method of presentation employs flowcharts of sequential VAPEPS Commands used to create a VAPEPS Database. The commands are accompanied by explanatory text to the right of the command in order to minimize the need for repetitive reference to the VAPEPS user's manual. The method is demonstrated by examples of varying complexity. It is assumed that the reader has acquired a basic knowledge of the VAPEPS software program.

  7. IPD—the Immuno Polymorphism Database

    PubMed Central

    Robinson, James; Waller, Matthew J.; Stoehr, Peter; Marsh, Steven G. E.

    2005-01-01

    The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors; IPD-MHC, a database of sequences of the Major Histocompatibility Complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC share the same database structure. The sharing of a common database structure makes it easier to implement common tools for data submission and retrieval. The data are currently available online from the website and ftp directory; files will also be made available in different formats to download from the website and ftp server. The data will also be included in SRS, BLAST and FASTA search engines at the European Bioinformatics Institute. PMID:15608253

  8. IPD--the Immuno Polymorphism Database.

    PubMed

    Robinson, James; Waller, Matthew J; Stoehr, Peter; Marsh, Steven G E

    2005-01-01

    The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors; IPD-MHC, a database of sequences of the Major Histocompatibility Complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC share the same database structure. The sharing of a common database structure makes it easier to implement common tools for data submission and retrieval. The data are currently available online from the website and ftp directory; files will also be made available in different formats to download from the website and ftp server. The data will also be included in SRS, BLAST and FASTA search engines at the European Bioinformatics Institute. PMID:15608253

  9. Hayabusa Recovery, Curation and Preliminary Sample Analysis: Lessons Learned from Recent Sample Return Mission

    NASA Technical Reports Server (NTRS)

    Zolensky, Michael E.

    2011-01-01

    I describe lessons learned from my participation on the Hayabusa Mission, which returned regolith grains from asteroid Itokawa in 2010 [1], comparing this with the recently returned Stardust Spacecraft, which sampled the Jupiter Family comet Wild 2. Spacecraft Recovery Operations: The mission Science and Curation teams must actively participate in planning, testing and implementing spacecraft recovery operations. The crash of the Genesis spacecraft underscored the importance of thinking through multiple contingency scenarios and practicing field recovery for these potential circumstances. Having the contingency supplies on-hand was critical, and at least one full year of planning for Stardust and Hayabusa recovery operations was necessary. Care must be taken to coordinate recovery operations with local organizations and inform relevant government bodies well in advance. Recovery plans for both Stardust and Hayabusa had to be adjusted for unexpectedly wet landing site conditions. Documentation of every step of spacecraft recovery and deintegration was necessary, and collection and analysis of launch and landing site soils was critical. We found the operation of the Woomera Text Range (South Australia) to be excellent in the case of Hayabusa, and in many respects this site is superior to the Utah Test and Training Range (used for Stardust) in the USA. Recovery operations for all recovered spacecraft suffered from the lack of a hermetic seal for the samples. Mission engineers should be pushed to provide hermetic seals for returned samples. Sample Curation Issues: More than two full years were required to prepare curation facilities for Stardust and Hayabusa. Despite this seemingly adequate lead time, major changes to curation procedures were required once the actual state of the returned samples became apparent. Sample databases must be fully implemented before sample return for Stardust we did not adequately think through all of the possible sub sampling and

  10. Salinas : theory manual.

    SciTech Connect

    Walsh, Timothy Francis; Reese, Garth M.; Bhardwaj, Manoj Kumar

    2004-08-01

    This manual describes the theory behind many of the constructs in Salinas. For a more detailed description of how to use Salinas , we refer the reader to Salinas, User's Notes. Many of the constructs in Salinas are pulled directly from published material. Where possible, these materials are referenced herein. However, certain functions in Salinas are specific to our implementation. We try to be far more complete in those areas. The theory manual was developed from several sources including general notes, a programer-notes manual, the user's notes and of course the material in the open literature.

  11. Fire Protection Program Manual

    SciTech Connect

    Sharry, J A

    2012-05-18

    This manual documents the Lawrence Livermore National Laboratory (LLNL) Fire Protection Program. Department of Energy (DOE) Orders 420.1B, Facility Safety, requires LLNL to have a comprehensive and effective fire protection program that protects LLNL personnel and property, the public and the environment. The manual provides LLNL and its facilities with general information and guidance for meeting DOE 420.1B requirements. The recommended readers for this manual are: fire protection officers, fire protection engineers, fire fighters, facility managers, directorage assurance managers, facility coordinators, and ES and H team members.

  12. The MIntAct Project and Molecular Interaction Databases.

    PubMed

    Licata, Luana; Orchard, Sandra

    2016-01-01

    Molecular interaction databases collect, organize, and enable the analysis of the increasing amounts of molecular interaction data being produced and published as we move towards a more complete understanding of the interactomes of key model organisms. The organization of these data in a structured format supports analyses such as the modeling of pairwise relationships between interactors into interaction networks and is a powerful tool for understanding the complex molecular machinery of the cell. This chapter gives an overview of the principal molecular interaction databases, in particular the IMEx databases, and their curation policies, use of standardized data formats and quality control rules. Special attention is given to the MIntAct project, in which IntAct and MINT joined forces to create a single resource to improve curation and software development efforts. This is exemplified as a model for the future of molecular interaction data collation and dissemination. PMID:27115627

  13. VAPEPS user's reference manual, version 5.0

    NASA Technical Reports Server (NTRS)

    Park, D. M.

    1988-01-01

    This is the reference manual for the VibroAcoustic Payload Environment Prediction System (VAPEPS). The system consists of a computer program and a vibroacoustic database. The purpose of the system is to collect measurements of vibroacoustic data taken from flight events and ground tests, and to retrieve this data and provide a means of using the data to predict future payload environments. This manual describes the operating language of the program. Topics covered include database commands, Statistical Energy Analysis (SEA) prediction commands, stress prediction command, and general computational commands.

  14. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  15. The Earth System Modeling Framework and Earth System Curator: Software Components as Building Blocks of Community

    NASA Astrophysics Data System (ADS)

    Deluca, C.; Balaji, V.; da Silva, A.; Dunlap, R.; Hill, C.; Mark, L.; Mechoso, C. R.; Middleton, D.; Nikonov, S.; Rugaber, S.; Suarez, M.

    2006-05-01

    The Earth System Modeling Framework (ESMF) is an established U.S. initiative to develop high performance common modeling infrastructure for climate and weather models. ESMF is the technical foundation for the NASA Modeling, Analysis, and Prediction (MAP) Climate Variability and Change program and the DoD Battlespace Environments Institute (BEI). It has been incorporated into the Community Climate System Model (CCSM), the Weather Research and Forecast (WRF) Model, NOAA NCEP and GFDL models, Army, Navy, and Air Force models, and many others. The new, NSF-funded Earth System Curator is a related database and toolkit that will store information about model configurations, prepare models for execution, and run them locally or in a distributed fashion. The key concept that underlies both ESMF and the Earth System Curator is that of software components. Components are software units that are "composable", meaning they can be combined to form coupled applications. These components may be representations of physical domains, such as atmospheres or oceans; processes within particular domains such as atmospheric radiation or chemistry; or computational functions, such as data assimilation or I/O. ESMF provides interfaces, an architecture, and tools for structuring components hierarchically to form complex, coupled modeling applications. The Earth System Curator will enable modelers to describe, archive, search, compose, and run ESMF and similar components. Together these projects encourage a new paradigm for modeling: one in which the community can draw from a federation of many interoperable components in order to create and deploy applications. The goal is to enable a network of collaborations and new scientific opportunities for the Earth modeling community.

  16. Peace Corps Aquaculture Training Manual. Training Manual T0057.

    ERIC Educational Resources Information Center

    Peace Corps, Washington, DC. Information Collection and Exchange Div.

    This Peace Corps training manual was developed from two existing manuals to provide a comprehensive training program in fish production for Peace Corps volunteers. The manual encompasses the essential elements of the University of Oklahoma program that has been training volunteers in aquaculture for 25 years. The 22 chapters of the manual are…

  17. Interactive Office user's manual

    NASA Technical Reports Server (NTRS)

    Montgomery, Edward E.; Lowers, Benjamin; Nabors, Terri L.

    1990-01-01

    Given here is a user's manual for Interactive Office (IO), an executive office tool for organization and planning, written specifically for Macintosh. IO is a paperless management tool to automate a related group of individuals into one productive system.

  18. Geochemical engineering reference manual

    SciTech Connect

    Owen, L.B.; Michels, D.E.

    1984-01-01

    The following topics are included in this manual: physical and chemical properties of geothermal brine and steam, scale and solids control, processing spent brine for reinjection, control of noncondensable gas emissions, and goethermal mineral recovery. (MHR)

  19. Curation of Antarctic Meteorites at NASA Johnson Space Center

    NASA Technical Reports Server (NTRS)

    McBride, K. M.; Satterwhite, C. E.; Righter, Kevin

    2010-01-01

    The U.S. Antarctic meteorite program began in the 1970 s and has provided more than 18,000 samples in over three decades. The program is based on a three agency agreement between NASA, the National Science Foundation, and the Smithsonian Institution. The collection, stored at the Johnson Space Center and the Smithsonian, is one of the largest collections of meteorites in the world and features samples from the moon and Mars, asteroids, and material from the early solar system. A brief consideration of the collection shows that it contains 92.2% ordinary chondrites, 3.2% carbonaceous chondrites, 3.7% achondrites (1.7% HED), as well as many puzzling ungrouped meteorites. JSC has sent splits of nearly 20,000 meteorite samples to more than 500 scientists around the world since 1977. After the meteorites are collected in Antarctica, they are shipped frozen to JSC in Houston, usually arriving in April following the field season. The Astromaterials Curation Office at JSC is responsible for: - receiving the frozen meteorites. - staging: repackaging and changing the samples field identification numbers with official names. - submitting the names to the Nomenclature Committee of the Meteoritical Society for approval as new meteorites. - initial processing: weighing, measuring, describing and photographing the sample and providing a chip for classification to the Smithsonian Institution staff. - the issuing of two newsletters per year, announcing hundreds of new meteorites. - the handling of requests from the scientific community and the allocation of those requests that are approved. - providing supplies and tools for the field team such as teflon bags and tape, aluminum foil, clean tweezers and tongs. - maintaining the meteorite database with more than 76,000 sample splits. - making petrographic thin and thick sections for the JSC library and scientific investigators. - providing storage and handling of the meteorites in a class 10,000 clean room. Samples that have not been

  20. InvFEST, a database integrating information of polymorphic inversions in the human genome

    PubMed Central

    Martínez-Fundichely, Alexander; Casillas, Sònia; Egea, Raquel; Ràmia, Miquel; Barbadilla, Antonio; Pantano, Lorena; Puig, Marta; Cáceres, Mario

    2014-01-01

    The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome. PMID:24253300

  1. EHFPI: a database and analysis resource of essential host factors for pathogenic infection.

    PubMed

    Liu, Yang; Xie, Dafei; Han, Lu; Bai, Hui; Li, Fei; Wang, Shengqi; Bo, Xiaochen

    2015-01-01

    High-throughput screening and computational technology has greatly changed the face of microbiology in better understanding pathogen-host interactions. Genome-wide RNA interference (RNAi) screens have given rise to a new class of host genes designated as Essential Host Factors (EHFs), whose knockdown effects significantly influence pathogenic infections. Therefore, we present the first release of a manually-curated bioinformatics database and analysis resource EHFPI (Essential Host Factors for Pathogenic Infection, http://biotech.bmi.ac.cn/ehfpi). EHFPI captures detailed article, screen, pathogen and phenotype annotation information for a total of 4634 EHF genes of 25 clinically important pathogenic species. Notably, EHFPI also provides six powerful and data-integrative analysis tools, i.e. EHF Overlap Analysis, EHF-pathogen Network Analysis, Gene Enrichment Analysis, Pathogen Interacting Proteins (PIPs) Analysis, Drug Target Analysis and GWAS Candidate Gene Analysis, which advance the comprehensive understanding of the biological roles of EHF genes, as in diverse perspectives of protein-protein interaction network, drug targets and diseases/traits. The EHFPI web interface provides appropriate tools that allow efficient query of EHF data and visualization of custom-made analysis results. EHFPI data and tools shall keep available without charge and serve the microbiology, biomedicine and pharmaceutics research communities, to finally facilitate the development of diagnostics, prophylactics and therapeutics for human pathogens. PMID:25414353

  2. InvFEST, a database integrating information of polymorphic inversions in the human genome.

    PubMed

    Martínez-Fundichely, Alexander; Casillas, Sònia; Egea, Raquel; Ràmia, Miquel; Barbadilla, Antonio; Pantano, Lorena; Puig, Marta; Cáceres, Mario

    2014-01-01

    The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome. PMID:24253300

  3. EHFPI: a database and analysis resource of essential host factors for pathogenic infection

    PubMed Central

    Liu, Yang; Xie, Dafei; Han, Lu; Bai, Hui; Li, Fei; Wang, Shengqi; Bo, Xiaochen

    2015-01-01

    High-throughput screening and computational technology has greatly changed the face of microbiology in better understanding pathogen–host interactions. Genome-wide RNA interference (RNAi) screens have given rise to a new class of host genes designated as Essential Host Factors (EHFs), whose knockdown effects significantly influence pathogenic infections. Therefore, we present the first release of a manually-curated bioinformatics database and analysis resource EHFPI (Essential Host Factors for Pathogenic Infection, http://biotech.bmi.ac.cn/ehfpi). EHFPI captures detailed article, screen, pathogen and phenotype annotation information for a total of 4634 EHF genes of 25 clinically important pathogenic species. Notably, EHFPI also provides six powerful and data-integrative analysis tools, i.e. EHF Overlap Analysis, EHF-pathogen Network Analysis, Gene Enrichment Analysis, Pathogen Interacting Proteins (PIPs) Analysis, Drug Target Analysis and GWAS Candidate Gene Analysis, which advance the comprehensive understanding of the biological roles of EHF genes, as in diverse perspectives of protein–protein interaction network, drug targets and diseases/traits. The EHFPI web interface provides appropriate tools that allow efficient query of EHF data and visualization of custom-made analysis results. EHFPI data and tools shall keep available without charge and serve the microbiology, biomedicine and pharmaceutics research communities, to finally facilitate the development of diagnostics, prophylactics and therapeutics for human pathogens. PMID:25414353

  4. Re-inventing Data Libraries: Ensuring Continuing Access To Curated (Value-added) Data

    NASA Astrophysics Data System (ADS)

    Burnhill, P.; Medyckyj-Scott, D.

    2008-12-01

    How many years of inexperience do we need in using, and in particular sharing, digital data generated by others? That history pre-dates, but must also gain leverage from, the emergence of the digital library. Much of this sharing was done within research groups but recent attention to spatial data infrastructure highlights the importance of achieving several 'right mixes': * between Internet-standards, geo-specific referencing, and domain-specific vocabulary (cf ontology); * between attention to user-focus'd services and machine-to-machine interoperability; * between the demands of current high-quality services, the practice of data curation, and the need for long term preservation. This presentation will draw upon ideas and experience data library services in research universities, a national (UK) academic data centre, and developments in digital curation. It will be argued that the 1980s term 'data library' has some polemic value in that we have yet to learn what it means to 'do library' for data: more than "a bit like inter-galactic library loan", perhaps. Illustration will be drawn from multi-faceted database of digitized boundaries (UKBORDERS), through the first Internet map delivery of national mapping agency data (Digimap), to strategic positioning to help geo-enable academic and scientific data and so enhance research (in the UK, in Europe, and beyond).

  5. The IST-LISBON database on LXCat

    NASA Astrophysics Data System (ADS)

    Alves, L. L.

    2014-12-01

    LXCat is a web-based, community-wide project on the curation of data needed in the modelling of low-temperature plasmas. LXCat is organized in databases, contributed by members of the community around the world and indicated by the contributor's chosen title. This paper presents the status of the data available on the IST-LISBON database with LXCat. IST-LISBON contains up-to-date electron-neutral collisional data (together with the measured swarm parameters used to validate these data) resulting from the research effort of the Group of Gas Discharges and Gaseous Electronics with Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico, Lisbon, Portugal. Presently, the IST-LISBON database includes complete and consistent sets of electron scattering cross sections for argon, helium, nitrogen, oxygen, hydrogen and methane.

  6. Human protein reference database--2006 update.

    PubMed

    Mishra, Gopa R; Suresh, M; Kumaran, K; Kannabiran, N; Suresh, Shubha; Bala, P; Shivakumar, K; Anuradha, N; Reddy, Raghunath; Raghavan, T Madhan; Menon, Shalini; Hanumanthu, G; Gupta, Malvika; Upendran, Sapna; Gupta, Shweta; Mahesh, M; Jacob, Bincy; Mathew, Pinky; Chatterjee, Pritam; Arun, K S; Sharma, Salil; Chandrika, K N; Deshpande, Nandan; Palvankar, Kshitish; Raghavnath, R; Krishnakanth, R; Karathia, Hiren; Rekha, B; Nayak, Rashmi; Vishnupriya, G; Kumar, H G Mohan; Nagini, M; Kumar, G S Sameer; Jose, Rojan; Deepthi, P; Mohan, S Sujatha; Gandhi, T K B; Harsha, H C; Deshpande, Krishna S; Sarker, Malabika; Prasad, T S Keshava; Pandey, Akhilesh

    2006-01-01

    Human Protein Reference Database (HPRD) (http://www.hprd.org) was developed to serve as a comprehensive collection of protein features, post-translational modifications (PTMs) and protein-protein interactions. Since the original report, this database has increased to >20 000 proteins entries and has become the largest database for literature-derived protein-protein interactions (>30 000) and PTMs (>8000) for human proteins. We have also introduced several new features in HPRD including: (i) protein isoforms, (ii) enhanced search options, (iii) linking of pathway annotations and (iv) integration of a novel browser, GenProt Viewer (http://www.genprot.org), developed by us that allows integration of genomic and proteomic information. With the continued support and active participation by the biomedical community, we expect HPRD to become a unique source of curated information for the human proteome and spur biomedical discoveries based on integration of genomic, transcriptomic and proteomic data. PMID:16381900

  7. OCDB: a database collecting genes, miRNAs and drugs for obsessive-compulsive disorder.

    PubMed

    Privitera, Anna P; Distefano, Rosario; Wefer, Hugo A; Ferro, Alfredo; Pulvirenti, Alfredo; Giugno, Rosalba

    2015-01-01

    Obsessive-compulsive disorder (OCD) is a psychiatric condition characterized by intrusive and unwilling thoughts (obsessions) giving rise to anxiety. The patients feel obliged to perform a behavior (compulsions) induced by the obsessions. The World Health Organization ranks OCD as one of the 10 most disabling medical conditions. In the class of Anxiety Disorders, OCD is a pathology that shows an hereditary component. Consequently, an online resource collecting and integrating scientific discoveries and genetic evidence about OCD would be helpful to improve the current knowledge on this disorder. We have developed a manually curated database, OCD Database (OCDB), collecting the relations between candidate genes in OCD, microRNAs (miRNAs) involved in the pathophysiology of OCD and drugs used in its treatments. We have screened articles from PubMed and MEDLINE. For each gene, the bibliographic references with a brief description of the gene and the experimental conditions are shown. The database also lists the polymorphisms within genes and its chromosomal regions. OCDB data is enriched with both validated and predicted miRNA-target and drug-target information. The transcription factors regulations, which are also included, are taken from David and TransmiR. Moreover, a scoring function ranks the relevance of data in the OCDB context. The database is also integrated with the main online resources (PubMed, Entrez-gene, HGNC, dbSNP, DrugBank, miRBase, PubChem, Kegg, Disease-ontology and ChEBI). The web interface has been developed using phpMyAdmin and Bootstrap software. This allows (i) to browse data by category and (ii) to navigate in the database by searching genes, miRNAs, drugs, SNPs, regions, drug targets and articles. The data can be exported in textual format as well as the whole database in.sql or tabular format. OCDB is an essential resource to support genome-wide analysis, genetic and pharmacological studies. It also facilitates the evaluation of genetic data

  8. Italian Rett database and biobank.

    PubMed

    Sampieri, Katia; Meloni, Ilaria; Scala, Elisa; Ariani, Francesca; Caselli, Rossella; Pescucci, Chiara; Longo, Ilaria; Artuso, Rosangela; Bruttini, Mirella; Mencarelli, Maria Antonietta; Speciale, Caterina; Causarano, Vincenza; Hayek, Giuseppe; Zappella, Michele; Renieri, Alessandra; Mari, Francesca

    2007-04-01

    Rett syndrome is the second most common cause of severe mental retardation in females, with an incidence of approximately 1 out of 10,000 live female births. In addition to the classic form, a number of Rett variants have been described. MECP2 gene mutations are responsible for about 90% of classic cases and for a lower percentage of variant cases. Recently, CDKL5 mutations have been identified in the early onset seizures variant and other atypical Rett patients. While the high percentage of MECP2 mutations in classic patients supports the hypothesis of a single disease gene, the low frequency of mutated variant cases suggests genetic heterogeneity. Since 1998, we have performed clinical evaluation and molecular analysis of a large number of Italian Rett patients. The Italian Rett Syndrome (RTT) database has been developed to share data and samples of our RTT collection with the scientific community (http://www.biobank.unisi.it). This is the first RTT database that has been connected with a biobank. It allows the user to immediately visualize the list of available RTT samples and, using the "Search by" tool, to rapidly select those with specific clinical and molecular features. By contacting bank curators, users can request the samples of interest for their studies. This database encourages collaboration projects with clinicians and researchers from around the world and provides important resources that will help to better define the pathogenic mechanisms underlying Rett syndrome. PMID:17186495

  9. HMDB: the Human Metabolome Database

    PubMed Central

    Wishart, David S.; Tzur, Dan; Knox, Craig; Eisner, Roman; Guo, An Chi; Young, Nelson; Cheng, Dean; Jewell, Kevin; Arndt, David; Sawhney, Summit; Fung, Chris; Nikolai, Lisa; Lewis, Mike; Coutouly, Marie-Aude; Forsythe, Ian; Tang, Peter; Shrivastava, Savita; Jeroncic, Kevin; Stothard, Paul; Amegbey, Godwin; Block, David; Hau, David. D.; Wagner, James; Miniaci, Jessica; Clements, Melisa; Gebremedhin, Mulu; Guo, Natalie; Zhang, Ying; Duggan, Gavin E.; MacInnis, Glen D.; Weljie, Alim M.; Dowlatabadi, Reza; Bamforth, Fiona; Clive, Derrick; Greiner, Russ; Li, Liang; Marrie, Tom; Sykes, Brian D.; Vogel, Hans J.; Querengesser, Lori

    2007-01-01

    The Human Metabolome Database (HMDB) is currently the most complete and comprehensive curated collection of human metabolite and human metabolism data in the world. It contains records for more than 2180 endogenous metabolites with information gathered from thousands of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the HMDB also contains an extensive collection of experimental metabolite concentration data compiled from hundreds of mass spectra (MS) and Nuclear Magnetic resonance (NMR) metabolomic analyses performed on urine, blood and cerebrospinal fluid samples. This is further supplemented with thousands of NMR and MS spectra collected on purified, reference metabolites. Each metabolite entry in the HMDB contains an average of 90 separate data fields including a comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, biofluid concentrations, disease associations, pathway information, enzyme data, gene sequence data, SNP and mutation data as well as extensive links to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided. The HMDB is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community. The HMDB is available at: PMID:17202168

  10. HMDB: the Human Metabolome Database.

    PubMed

    Wishart, David S; Tzur, Dan; Knox, Craig; Eisner, Roman; Guo, An Chi; Young, Nelson; Cheng, Dean; Jewell, Kevin; Arndt, David; Sawhney, Summit; Fung, Chris; Nikolai, Lisa; Lewis, Mike; Coutouly, Marie-Aude; Forsythe, Ian; Tang, Peter; Shrivastava, Savita; Jeroncic, Kevin; Stothard, Paul; Amegbey, Godwin; Block, David; Hau, David D; Wagner, James; Miniaci, Jessica; Clements, Melisa; Gebremedhin, Mulu; Guo, Natalie; Zhang, Ying; Duggan, Gavin E; Macinnis, Glen D; Weljie, Alim M; Dowlatabadi, Reza; Bamforth, Fiona; Clive, Derrick; Greiner, Russ; Li, Liang; Marrie, Tom; Sykes, Brian D; Vogel, Hans J; Querengesser, Lori

    2007-01-01

    The Human Metabolome Database (HMDB) is currently the most complete and comprehensive curated collection of human metabolite and human metabolism data in the world. It contains records for more than 2180 endogenous metabolites with information gathered from thousands of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the HMDB also contains an extensive collection of experimental metabolite concentration data compiled from hundreds of mass spectra (MS) and Nuclear Magnetic resonance (NMR) metabolomic analyses performed on urine, blood and cerebrospinal fluid samples. This is further supplemented with thousands of NMR and MS spectra collected on purified, reference metabolites. Each metabolite entry in the HMDB contains an average of 90 separate data fields including a comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, biofluid concentrations, disease associations, pathway information, enzyme data, gene sequence data, SNP and mutation data as well as extensive links to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided. The HMDB is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community. The HMDB is available at: www.hmdb.ca. PMID:17202168

  11. Public Archiving and Curation of Spacewatch Data

    NASA Astrophysics Data System (ADS)

    Larsen, Jeffrey A.; McMillan, Robert S.; Bressi, Terrence H.; Mastaler, Ronald A.; Scotti, James V.; Tubbiolo, Andrew F.

    2015-11-01

    Image data from Spacewatch's astrometry of asteroids date back to 1985. At this meeting we introduce data from the most voluminous mode of operation of Spacewatch to the web for public access. The survey with the Spacewatch 0.9-meter telescope has good astrometric and photometric accuracy and revisits the same cohorts of main belt asteroids at 4-day intervals by migrating the telescope pointings appropriately. This pattern has made possible multi-night prediscovery detections ("precoveries") of Near Earth Objects (NEOs) when they were distant, slowly moving, and therefore originally unnoticed, and is a similarly unique asset to other researches in the temporal domain. Limiting V magnitude is 20-21.5 and sky coverage is 1400 square degrees per lunation, three times per position. This survey has been in operation uniformly with the same equipment and procedure from 2003 to the present (2015), producing some 17 TB of imaging data.Processing includes documentation of instrumental parameters, bias subtraction, flat-fielding, defringing, positional registration, astrometric mapping, and indexing relevant image parameters to a searchable database. Tools for finding images that contain moving objects will be demonstrated at the meeting. Examples of applications of these data are prediscovery observations of NEOs and comets to improve knowledge of the objects' orbits. Asteroids whose orbits and albedos suggest that they might be dormant comets can also be checked for cometary features. Beyond the solar system, the cadence of the Spacewatch mosaic data will provide photometric sampling of variable stars and galaxies on time scales from tens of minutes to 12 years, a range rarely available from databases of this type.Support of Spacewatch was/is from a JPL subcontract (2010-2011), NASA/NEOO grants, the Lunar and Planetary Laboratory, Steward Observatory, Kitt Peak National Observatory, the Brinson Foundation of Chicago, IL, the estates of R. S. Vail and R. L. Waland, and

  12. Data mining in forensic image databases

    NASA Astrophysics Data System (ADS)

    Geradts, Zeno J.; Bijhold, Jurrien

    2002-07-01

    Forensic Image Databases appear in a wide variety. The oldest computer database is with fingerprints. Other examples of databases are shoeprints, handwriting, cartridge cases, toolmarks drugs tablets and faces. In these databases searches are conducted on shape, color and other forensic features. There exist a wide variety of methods for searching in images in these databases. The result will be a list of candidates that should be compared manually. The challenge in forensic science is to combine the information acquired. The combination of the shape of a partial shoe print with information on a cartridge case can result in stronger evidence. It is expected that searching in the combination of these databases with other databases (e.g. network traffic information) more crimes will be solved. Searching in image databases is still difficult, as we can see in databases of faces. Due to lighting conditions and altering of the face by aging, it is nearly impossible to find a right face from a database of one million faces in top position by a image searching method, without using other information. The methods for data mining in images in databases (e.g. MPEG-7 framework) are discussed, and the expectations of future developments are presented in this study.

  13. Drinking Water Treatability Database (Database)

    EPA Science Inventory

    The drinking Water Treatability Database (TDB) will provide data taken from the literature on the control of contaminants in drinking water, and will be housed on an interactive, publicly-available USEPA web site. It can be used for identifying effective treatment processes, rec...

  14. KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    PubMed Central

    2010-01-01

    Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. Description Here we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM) from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched. The results were stored in a database and are available as "KID the KInetic Database" via the internet. Conclusions The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public Licence and available on

  15. Librarians Prefer Italian Food: An Alternative Approach to Introducing Database.

    ERIC Educational Resources Information Center

    Trickey, Keith V.

    1990-01-01

    Describes the development of a manual participative game that was designed to demonstrate the functions of a database. Use of the game in a variety of academic contexts is discussed, including technical college students and degree students; and use of the data with software to create a database is described. (LRW)

  16. Organic Contamination Baseline Study on NASA JSC Astromaterial Curation Gloveboxes

    NASA Technical Reports Server (NTRS)

    Calaway, Michael J.; Allton, J. H.; Allen, C. C.; Burkett, P. J.

    2013-01-01

    Future planned sample return missions to carbon-rich asteroids and Mars in the next two decades will require strict handling and curation protocols as well as new procedures for reducing organic contamination. After the Apollo program, astromaterial collections have mainly been concerned with inorganic contamination [1-4]. However, future isolation containment systems for astromaterials, possibly nitrogen enriched gloveboxes, must be able to reduce organic and inorganic cross-contamination. In 2012, a baseline study was orchestrated to establish the current state of organic cleanliness in gloveboxes used by NASA JSC astromaterials curation labs that could be used as a benchmark for future mission designs.

  17. Linking Geobiology Fieldwork and Data Curation Through Workflow Documentation

    NASA Astrophysics Data System (ADS)

    Thomer, A.; Baker, K. S.; Jett, J. G.; Gordon, S.; Palmer, C. L.

    2014-12-01

    Describing the specific processes and artifacts that lead to the creation of data products provides a detailed picture of data provenance in the form of a high-level workflow. The resulting diagram identifies:1. "points of intervention" at which curation processes can be moved upstream, and 2. data products that may be important for sharing and preservation. The Site-Based Data Curation project, an Institute of Museum and Library Services-funded project hosted by the Center for Informatics Research in Science and Scholarship at the University of Illinois, previously inferred a geobiologist's planning, field and laboratory workflows through close study of the data products produced during a single field trip to Yellowstone National Park (Wickett et al, 2013). We have since built on this work by documenting post hoc curation processes, and integrating them with the existing workflow. By holistically considering both data collection and curation, we are able to identify concrete steps that scientists can take to begin curating data in the field. This field-to-repository workflow represents a first step toward a more comprehensive and nuanced model of the research data lifecycle. Using our initial three-phase workflow, we identify key data products to prioritize for curation, and the points at which data curation best practices integrate with research processes with minimal interruption. We then document the processes that make key data products sharable and ready for preservation. We append the resulting curatorial phases to the field data collection workflow: Data Staging, Data Standardizing and Data Packaging. These refinements demonstrate:1) the interdependence of research and curatorial phases;2) the links between specific research products, research phases and curatorial processes; 3) the interdependence of laboratory-specific standards and community-wide best practices. We propose a poster that shows the six-phase workflow described above. We plan to discuss

  18. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation

    PubMed Central

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. PMID:27016698

  19. A multilocus sequence typing method and curated database for Mycoplasma bovis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Mycoplasma bovis is a primary agent of mastitis, pneumonia and arthritis in cattle and is the bacterium isolated most frequently from the polymicrobial syndrome known as bovine respiratory disease complex (BRDC). Recently, M. bovis has emerged as a significant problem in bison, causing necrotic pha...

  20. A curated multi-locus sequence typing (MLST) database for Haemophilus parasuis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background. Haemophilus parasuis is the etiologic agent of Glasser's disease and pneumonia in swine. Serotyping has traditionally been used for classification of strains but results are subjective and not highly reproducible and the required reagents are expensive to produce, not widely available, a...

  1. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.

    PubMed

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as 'CHEMICAL-1 compared to CHEMICAL-2' With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical-disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. PMID:27016698

  2. IUPHAR-DB: updated database content and new features

    PubMed Central

    Sharman, Joanna L.; Benson, Helen E.; Pawson, Adam J.; Lukito, Veny; Mpamhanga, Chidochangu P.; Bombail, Vincent; Davenport, Anthony P.; Peters, John A.; Spedding, Michael; Harmar, Anthony J.; NC-IUPHAR

    2013-01-01

    The International Union of Basic and Clinical Pharmacology (IUPHAR) database, IUPHAR-DB (http://www.iuphar-db.org) is an open access, online database providing detailed, expert-driven annotation of the primary literature on human and rodent receptors and other drug targets, together with the substances that act on them. The present release includes information on the products of 646 genes from four major protein classes (G protein-coupled receptors, nuclear hormone receptors, voltage- and ligand-gated ion channels) and ∼3180 bioactive molecules (endogenous ligands, licensed drugs and key pharmacological tools) that interact with them. We have described previously the classification and curation of data for small molecule ligands in the database; in this update we have annotated 366 endogenous peptide ligands with their amino acid sequences, post-translational modifications, links to precursor genes, species differences and relationships with other molecules in the database (e.g. those derived from the same precursor). We have also matched targets with their endogenous ligands (peptides and small molecules), with particular attention paid to identifying bioactive peptide ligands generated by post-translational modification of precursor proteins. Other improvements to the database include enhanced information on the clinical relevance of targets and ligands in the database, more extensive links to other databases and a pilot project for the curation of enzymes as drug targets. PMID:23087376

  3. IUPHAR-DB: updated database content and new features.

    PubMed

    Sharman, Joanna L; Benson, Helen E; Pawson, Adam J; Lukito, Veny; Mpamhanga, Chidochangu P; Bombail, Vincent; Davenport, Anthony P; Peters, John A; Spedding, Michael; Harmar, Anthony J

    2013-01-01

    The International Union of Basic and Clinical Pharmacology (IUPHAR) database, IUPHAR-DB (http://www.iuphar-db.org) is an open access, online database providing detailed, expert-driven annotation of the primary literature on human and rodent receptors and other drug targets, together with the substances that act on them. The present release includes information on the products of 646 genes from four major protein classes (G protein-coupled receptors, nuclear hormone receptors, voltage- and ligand-gated ion channels) and ∼3180 bioactive molecules (endogenous ligands, licensed drugs and key pharmacological tools) that interact with them. We have described previously the classification and curation of data for small molecule ligands in the database; in this update we have annotated 366 endogenous peptide ligands with their amino acid sequences, post-translational modifications, links to precursor genes, species differences and relationships with other molecules in the database (e.g. those derived from the same precursor). We have also matched targets with their endogenous ligands (peptides and small molecules), with particular attention paid to identifying bioactive peptide ligands generated by post-translational modification of precursor proteins. Other improvements to the database include enhanced information on the clinical relevance of targets and ligands in the database, more extensive links to other databases and a pilot project for the curation of enzymes as drug targets. PMID:23087376

  4. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

    PubMed

    Davis, Allan Peter; Wiegers, Thomas C; King, Benjamin L; Wiegers, Jolene; Grondin, Cynthia J; Sciaky, Daniela; Johnson, Robin J; Mattingly, Carolyn J

    2016-01-01

    Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO) file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil) and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis) as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and pharmaceutical drug makers

  5. Aircraft operations management manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    The NASA aircraft operations program is a multifaceted, highly diverse entity that directly supports the agency mission in aeronautical research and development, space science and applications, space flight, astronaut readiness training, and related activities through research and development, program support, and mission management aircraft operations flights. Users of the program are interagency, inter-government, international, and the business community. This manual provides guidelines to establish policy for the management of NASA aircraft resources, aircraft operations, and related matters. This policy is an integral part of and must be followed when establishing field installation policy and procedures covering the management of NASA aircraft operations. Each operating location will develop appropriate local procedures that conform with the requirements of this handbook. This manual should be used in conjunction with other governing instructions, handbooks, and manuals.

  6. Nuclear material operations manual

    SciTech Connect

    Tyler, R.P.

    1981-02-01

    This manual provides a concise and comprehensive documentation of the operating procedures currently practiced at Sandia National Laboratories with regard to the management, control, and accountability of nuclear materials. The manual is divided into chapters which are devoted to the separate functions performed in nuclear material operations-management, control, accountability, and safeguards, and the final two chapters comprise a document which is also issued separately to provide a summary of the information and operating procedures relevant to custodians and users of radioactive and nuclear materials. The manual also contains samples of the forms utilized in carrying out nuclear material activities. To enhance the clarity of presentation, operating procedures are presented in the form of playscripts in which the responsible organizations and necessary actions are clearly delineated in a chronological fashion from the initiation of a transaction to its completion.

  7. Ethics manual: fifth edition.

    PubMed

    Snyder, Lois; Leffler, Cathy

    2005-04-01

    Medicine, law, and social values are not static. Reexamining the ethical tenets of medical practice and their application in new circumstances is a necessary exercise. The fifth edition of the College's Ethics Manual covers emerging issues in medical ethics and revisits old ones. It reflects on many of the ethical tensions faced by internists and their patients and attempts to shed light on how existing principles extend to emerging concerns. In addition, by reiterating ethical principles that have provided guidance in resolving past ethical problems, the Manual may help physicians avert future problems. The Manual is not a substitute for the experience and integrity of individual physicians, but it may serve as a reminder of the shared obligations and duties of the medical profession. PMID:15809467

  8. Salinas : theory manual.

    SciTech Connect

    Walsh, Timothy Francis; Reese, Garth M.; Bhardwaj, Manoj Kumar

    2011-11-01

    Salinas provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of structural systems. This manual describes the theory behind many of the constructs in Salinas. For a more detailed description of how to use Salinas, we refer the reader to Salinas, User's Notes. Many of the constructs in Salinas are pulled directly from published material. Where possible, these materials are referenced herein. However, certain functions in Salinas are specific to our implementation. We try to be far more complete in those areas. The theory manual was developed from several sources including general notes, a programmer notes manual, the user's notes and of course the material in the open literature.

  9. Developing a policy manual.

    PubMed

    Hotta, Tracey A

    2013-01-01

    Do you really need to have a policy and procedure in the office? Frequently they are seen sitting on the shelf, collecting dust. The answer is yes for a number of very important reasons. A policy and procedure manual is a tool to set guidelines and expectations on the basis of the mission and vision of the office. A well-written manual is a powerful training tool for new staff so they can get a feel for the office culture. Furthermore, it is a provincial or state legislative requirement that can reduce management's concern about potential legal issues or problems. If an office does not have a manual to set guidelines, the employees may be forced to make their own decisions to solve problems, which can often result in confusion, inconsistencies, and mistakes. PMID:23446507

  10. CARFMAP: A Curated Pathway Map of Cardiac Fibroblasts

    PubMed Central

    Nim, Hieu T.; Furtado, Milena B.; Costa, Mauro W.; Kitano, Hiroaki; Rosenthal, Nadia A.; Boyd, Sarah E.

    2015-01-01

    The adult mammalian heart contains multiple cell types that work in unison under tightly regulated conditions to maintain homeostasis. Cardiac fibroblasts are a significant and unique population of non-muscle cells in the heart that have recently gained substantial interest in the cardiac biology community. To better understand this renaissance cell, it is essential to systematically survey what has been known in the literature about the cellular and molecular processes involved. We have built CARFMAP (http://visionet.erc.monash.edu.au/CARFMAP), an interactive cardiac fibroblast pathway map derived from the biomedical literature using a software-assisted manual data collection approach. CARFMAP is an information-rich interactive tool that enables cardiac biologists to explore the large body of literature in various creative ways. There is surprisingly little overlap between the cardiac fibroblast pathway map, a foreskin fibroblast pathway map, and a whole mouse organism signalling pathway map from the REACTOME database. Among the use cases of CARFMAP is a common task in our cardiac biology laboratory of identifying new genes that are (1) relevant to cardiac literature, and (2) differentially regulated in high-throughput assays. From the expression profiles of mouse cardiac and tail fibroblasts, we employed CARFMAP to characterise cardiac fibroblast pathways. Using CARFMAP in conjunction with transcriptomic data, we generated a stringent list of six genes that would not have been singled out using bioinformatics analyses alone. Experimental validation showed that five genes (Mmp3, Il6, Edn1, Pdgfc and Fgf10) are differentially regulated in the cardiac fibroblast. CARFMAP is a powerful tool for systems analyses of cardiac fibroblasts, facilitating systems-level cardiovascular research. PMID:26673252

  11. Data Albums: An Event Driven Search, Aggregation and Curation Tool for Earth Science

    NASA Technical Reports Server (NTRS)

    Ramachandran, Rahul; Kulkarni, Ajinkya; Maskey, Manil; Bakare, Rohan; Basyal, Sabin; Li, Xiang; Flynn, Shannon

    2014-01-01

    One of the largest continuing challenges in any Earth science investigation is the discovery and access of useful science content from the increasingly large volumes of Earth science data and related information available. Approaches used in Earth science research such as case study analysis and climatology studies involve gathering discovering and gathering diverse data sets and information to support the research goals. Research based on case studies involves a detailed description of specific weather events using data from different sources, to characterize physical processes in play for a specific event. Climatology-based research tends to focus on the representativeness of a given event, by studying the characteristics and distribution of a large number of events. This allows researchers to generalize characteristics such as spatio-temporal distribution, intensity, annual cycle, duration, etc. To gather relevant data and information for case studies and climatology analysis is both tedious and time consuming. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. Those who know exactly the datasets of interest can obtain the specific files they need using these systems. However, in cases where researchers are interested in studying a significant event, they have to manually assemble a variety of datasets relevant to it by searching the different distributed data systems. In these cases, a search process needs to be organized around the event rather than observing instruments. In addition, the existing data systems assume users have sufficient knowledge regarding the domain vocabulary to be able to effectively utilize their catalogs. These systems do not support new or interdisciplinary researchers who may be unfamiliar with the domain terminology. This paper presents a specialized search, aggregation and curation tool for Earth science to address these existing

  12. The immune epitope database (IEDB) 3.0.

    PubMed

    Vita, Randi; Overton, James A; Greenbaum, Jason A; Ponomarenko, Julia; Clark, Jason D; Cantrell, Jason R; Wheeler, Daniel K; Gabbard, Joseph L; Hix, Deborah; Sette, Alessandro; Peters, Bjoern

    2015-01-01

    The IEDB, www.iedb.org, contains information on immune epitopes--the molecular targets of adaptive immune responses--curated from the published literature and submitted by National Institutes of Health funded epitope discovery efforts. From 2004 to 2012 the IEDB curation of journal articles published since 1960 has caught up to the present day, with >95% of relevant published literature manually curated amounting to more than 15,000 journal articles and more than 704,000 experiments to date. The revised curation target since 2012 has been to make recent research findings quickly available in the IEDB and thereby ensure that it continues to be an up-to-date resource. Having gathered a comprehensive dataset in the IEDB, a complete redesign of the query and reporting interface has been performed in the IEDB 3.0 release to improve how end users can access this information in an intuitive and biologically accurate manner. We here present this most recent release of the IEDB and describe the user testing procedures as well as the use of external ontologies that have enabled it. PMID:25300482

  13. The immune epitope database (IEDB) 3.0

    PubMed Central

    Vita, Randi; Overton, James A.; Greenbaum, Jason A.; Ponomarenko, Julia; Clark, Jason D.; Cantrell, Jason R.; Wheeler, Daniel K.; Gabbard, Joseph L.; Hix, Deborah; Sette, Alessandro; Peters, Bjoern

    2015-01-01

    The IEDB, www.iedb.org, contains information on immune epitopes—the molecular targets of adaptive immune responses—curated from the published literature and submitted by National Institutes of Health funded epitope discovery efforts. From 2004 to 2012 the IEDB curation of journal articles published since 1960 has caught up to the present day, with >95% of relevant published literature manually curated amounting to more than 15 000 journal articles and more than 704 000 experiments to date. The revised curation target since 2012 has been to make recent research findings quickly available in the IEDB and thereby ensure that it continues to be an up-to-date resource. Having gathered a comprehensive dataset in the IEDB, a complete redesign of the query and reporting interface has been performed in the IEDB 3.0 release to improve how end users can access this information in an intuitive and biologically accurate manner. We here present this most recent release of the IEDB and describe the user testing procedures as well as the use of external ontologies that have enabled it. PMID:25300482

  14. Quality Assurance Manual

    SciTech Connect

    McGarrah, J.E.

    1995-05-01

    In order to provide clients with quality products and services, Pacific Northwest Laboratory (PNL) has established and implemented a formal quality assurance program. These management controls are documented in this manual (PNL-MA-70) and its accompanying standards and procedures. The QA Program meets the basic requirements and supplements of ANSI/ASME NQA-1, Quality Assurance Program Requirements for Nuclear Facilities, as interpreted for PNL activities. Additional, the quality requirements are augmented to include the Total Quality approach defined in the Department of Energy Order 5700.6C, Quality Assurance. This manual provides requirements and an overview of the administrative procedures that apply to projects and activities.

  15. Fastener Design Manual

    NASA Technical Reports Server (NTRS)

    Barrett, Richard T.

    1990-01-01

    This manual was written for design engineers to enable them to choose appropriate fasteners for their designs. Subject matter includes fastener material selection, platings, lubricants, corrosion, locking methods, washers, inserts, thread types and classes, fatigue loading, and fastener torque. A section on design criteria covers the derivation of torque formulas, loads on a fastener group, combining simultaneous shear and tension loads, pullout load for tapped holes, grip length, head styles, and fastener strengths. The second half of this manual presents general guidelines and selection criteria for rivets and lockbolts.

  16. Manuals of Cultural Systems

    NASA Astrophysics Data System (ADS)

    Ballonoff, Paul

    2014-10-01

    Ethnography often studies social networks including empirical descriptions of marriages and families. We initially concentrate on a special subset of networks which we call configurations. We show that descriptions of the possible outcomes of viable histories form a manual, and an orthoalgebra. We then study cases where family sizes vary, and show that this also forms a manual. In fact, it demonstrates adiabatic invariance, a property often associated with physical system conservation laws, and which here expresses conservation of the viability of a cultural system.

  17. Equipment Management Manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    The NASA Equipment Management Manual (NHB 4200.1) is issued pursuant to Section 203(c)(1) of the National Aeronautics and Space Act of 1958, as amended (42 USC 2473), and sets forth policy, uniform performance standards, and procedural guidance to NASA personnel for the acquisition, management, and use of NASA-owned equipment. This revision is effective upon receipt. This is a controlled manual, issued in loose-leaf form, and revised through page changes. Additional copies for internal use may be obtained through normal distribution.

  18. New Roles for New Times: Digital Curation for Preservation

    ERIC Educational Resources Information Center

    Walters, Tyler; Skinner, Katherine

    2011-01-01

    Digital curation refers to the actions people take to maintain and add value to digital information over its lifecycle, including the processes used when creating digital content. Digital preservation focuses on the "series of managed activities necessary to ensure continued access to digital materials for as long as necessary." In this report,…

  19. Geospatial Data Curation at the University of Idaho

    ERIC Educational Resources Information Center

    Kenyon, Jeremy; Godfrey, Bruce; Eckwright, Gail Z.

    2012-01-01

    The management and curation of digital geospatial data has become a central concern for many academic libraries. Geospatial data is a complex type of data critical to many different disciplines, and its use has become more expansive in the past decade. The University of Idaho Library maintains a geospatial data repository called the Interactive…

  20. Meteorites - The Significance of Collection and Curation and Future Developments

    NASA Astrophysics Data System (ADS)

    Smith, Caroline

    2015-03-01

    Meteorites are some of the most important and valuable rocks available for scientific study. Approximately 43,000 meteorites are known on Earth and are egeologicalf samples of extraterrestrial bodies - meteorites are known to originate from asteroids, the Moon, Mars and possibly comets. With expanding exploration of our Solar System, meteorites provide the eground truthf to compare data collected by robotic missions with results gained from a variety of more accurate and precise techniques using laboratories on Earth. This talk will give an introduction to the history of meteorite science and the importance of meteorite collections to the field of meteoritics, planetary and solar system science. Curation of extraterrestrial samples is a particularly pertinent issue, especially with regards to particularly rare samples such as those from Mars like the recent Tissint meteorite. Future sample return missions to asteroids and Mars also pose siginificant challenges around the curation of these precious materials. Issues surrounding the curation of samples and how curation and curatorial actions can influence scientific studies will also be discussed.