Science.gov

Sample records for manually curated database

  1. TRIP database 2.0: a manually curated information hub for accessing TRP channel interaction network.

    PubMed

    Shin, Young-Cheul; Shin, Soo-Yong; Chun, Jung Nyeo; Cho, Hyeon Sung; Lim, Jin Muk; Kim, Hong-Gee; So, Insuk; Kwon, Dongseop; Jeon, Ju-Hong

    2012-01-01

    Transient receptor potential (TRP) channels are a family of Ca(2+)-permeable cation channels that play a crucial role in biological and disease processes. To advance TRP channel research, we previously created the TRIP (TRansient receptor potential channel-Interacting Protein) Database, a manually curated database that compiles scattered information on TRP channel protein-protein interactions (PPIs). However, the database needs to be improved for information accessibility and data utilization. Here, we present the TRIP Database 2.0 (http://www.trpchannel.org) in which many helpful, user-friendly web interfaces have been developed to facilitate knowledge acquisition and inspire new approaches to studying TRP channel functions: 1) the PPI information found in the supplementary data of referred articles was curated; 2) the PPI summary matrix enables users to intuitively grasp overall PPI information; 3) the search capability has been expanded to retrieve information from 'PubMed' and 'PIE the search' (a specialized search engine for PPI-related articles); and 4) the PPI data are available as sif files for network visualization and analysis using 'Cytoscape'. Therefore, our TRIP Database 2.0 is an information hub that works toward advancing data-driven TRP channel research.

  2. NRLiSt BDB, the manually curated nuclear receptors ligands and structures benchmarking database.

    PubMed

    Lagarde, Nathalie; Ben Nasr, Nesrine; Jérémie, Aurore; Guillemain, Hélène; Laville, Vincent; Labib, Taoufik; Zagury, Jean-François; Montes, Matthieu

    2014-04-10

    Nuclear receptors (NRs) constitute an important class of drug targets. We created the most exhaustive NR-focused benchmarking database to date, the NRLiSt BDB (NRs ligands and structures benchmarking database). The 9905 compounds and 339 structures of the NRLiSt BDB are ready for structure-based and ligand-based virtual screening. In the present study, we detail the protocol used to generate the NRLiSt BDB and its features. We also give some examples of the errors that we found in ChEMBL that convinced us to manually review all original papers. Since extensive and manually curated experimental data about NR ligands and structures are provided in the NRLiSt BDB, it should become a powerful tool to assess the performance of virtual screening methods on NRs, to assist the understanding of NR's function and modulation, and to support the discovery of new drugs targeting NRs. NRLiSt BDB is freely available online at http://nrlist.drugdesign.fr .

  3. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs

    PubMed Central

    Wang, Peng; Zhi, Hui; Zhang, Yunpeng; Liu, Yue; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Ning, Shangwei; Li, Xia

    2015-01-01

    In this study, we describe miRSponge, a manually curated database, which aims at providing an experimentally supported resource for microRNA (miRNA) sponges. Recent evidence suggests that miRNAs are themselves regulated by competing endogenous RNAs (ceRNAs) or ‘miRNA sponges’ that contain miRNA binding sites. These competitive molecules can sequester miRNAs to prevent them interacting with their natural targets to play critical roles in various biological and pathological processes. It has become increasingly important to develop a high quality database to record and store ceRNA data to support future studies. To this end, we have established the experimentally supported miRSponge database that contains data on 599 miRNA-sponge interactions and 463 ceRNA relationships from 11 species following manual curating from nearly 1200 published articles. Database classes include endogenously generated molecules including coding genes, pseudogenes, long non-coding RNAs and circular RNAs, along with exogenously introduced molecules including viral RNAs and artificial engineered sponges. Approximately 70% of the interactions were identified experimentally in disease states. miRSponge provides a user-friendly interface for convenient browsing, retrieval and downloading of dataset. A submission page is also included to allow researchers to submit newly validated miRNA sponge data. Database URL: http://www.bio-bigdata.net/miRSponge. PMID:26424084

  4. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs.

    PubMed

    Wang, Peng; Zhi, Hui; Zhang, Yunpeng; Liu, Yue; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Ning, Shangwei; Li, Xia

    2015-01-01

    In this study, we describe miRSponge, a manually curated database, which aims at providing an experimentally supported resource for microRNA (miRNA) sponges. Recent evidence suggests that miRNAs are themselves regulated by competing endogenous RNAs (ceRNAs) or 'miRNA sponges' that contain miRNA binding sites. These competitive molecules can sequester miRNAs to prevent them interacting with their natural targets to play critical roles in various biological and pathological processes. It has become increasingly important to develop a high quality database to record and store ceRNA data to support future studies. To this end, we have established the experimentally supported miRSponge database that contains data on 599 miRNA-sponge interactions and 463 ceRNA relationships from 11 species following manual curating from nearly 1200 published articles. Database classes include endogenously generated molecules including coding genes, pseudogenes, long non-coding RNAs and circular RNAs, along with exogenously introduced molecules including viral RNAs and artificial engineered sponges. Approximately 70% of the interactions were identified experimentally in disease states. miRSponge provides a user-friendly interface for convenient browsing, retrieval and downloading of dataset. A submission page is also included to allow researchers to submit newly validated miRNA sponge data. Database URL: http://www.bio-bigdata.net/miRSponge.

  5. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs.

    PubMed

    Wang, Peng; Zhi, Hui; Zhang, Yunpeng; Liu, Yue; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Ning, Shangwei; Li, Xia

    2015-01-01

    In this study, we describe miRSponge, a manually curated database, which aims at providing an experimentally supported resource for microRNA (miRNA) sponges. Recent evidence suggests that miRNAs are themselves regulated by competing endogenous RNAs (ceRNAs) or 'miRNA sponges' that contain miRNA binding sites. These competitive molecules can sequester miRNAs to prevent them interacting with their natural targets to play critical roles in various biological and pathological processes. It has become increasingly important to develop a high quality database to record and store ceRNA data to support future studies. To this end, we have established the experimentally supported miRSponge database that contains data on 599 miRNA-sponge interactions and 463 ceRNA relationships from 11 species following manual curating from nearly 1200 published articles. Database classes include endogenously generated molecules including coding genes, pseudogenes, long non-coding RNAs and circular RNAs, along with exogenously introduced molecules including viral RNAs and artificial engineered sponges. Approximately 70% of the interactions were identified experimentally in disease states. miRSponge provides a user-friendly interface for convenient browsing, retrieval and downloading of dataset. A submission page is also included to allow researchers to submit newly validated miRNA sponge data. Database URL: http://www.bio-bigdata.net/miRSponge. PMID:26424084

  6. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers

    PubMed Central

    Ning, Shangwei; Zhang, Jizhou; Wang, Peng; Zhi, Hui; Wang, Jianjian; Liu, Yue; Gao, Yue; Guo, Maoni; Yue, Ming; Wang, Lihua; Li, Xia

    2016-01-01

    Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) is a manually curated database of cancer-associated long non-coding RNAs (lncRNAs) with experimental support that aims to provide a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. LncRNAs represent a large category of functional RNA molecules that play a significant role in human cancers. A curated collection and summary of deregulated lncRNAs in cancer is essential to thoroughly understand the mechanisms and functions of lncRNAs. Here, we developed the Lnc2Cancer database, which contains 1057 manually curated associations between 531 lncRNAs and 86 human cancers. Each association includes lncRNA and cancer name, the lncRNA expression pattern, experimental techniques, a brief functional description, the original reference and additional annotation information. Lnc2Cancer provides a user-friendly interface to conveniently browse, retrieve and download data. Lnc2Cancer also offers a submission page for researchers to submit newly validated lncRNA-cancer associations. With the rapidly increasing interest in lncRNAs, Lnc2Cancer will significantly improve our understanding of lncRNA deregulation in cancer and has the potential to be a timely and valuable resource. PMID:26481356

  7. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers.

    PubMed

    Ning, Shangwei; Zhang, Jizhou; Wang, Peng; Zhi, Hui; Wang, Jianjian; Liu, Yue; Gao, Yue; Guo, Maoni; Yue, Ming; Wang, Lihua; Li, Xia

    2016-01-01

    Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) is a manually curated database of cancer-associated long non-coding RNAs (lncRNAs) with experimental support that aims to provide a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. LncRNAs represent a large category of functional RNA molecules that play a significant role in human cancers. A curated collection and summary of deregulated lncRNAs in cancer is essential to thoroughly understand the mechanisms and functions of lncRNAs. Here, we developed the Lnc2Cancer database, which contains 1057 manually curated associations between 531 lncRNAs and 86 human cancers. Each association includes lncRNA and cancer name, the lncRNA expression pattern, experimental techniques, a brief functional description, the original reference and additional annotation information. Lnc2Cancer provides a user-friendly interface to conveniently browse, retrieve and download data. Lnc2Cancer also offers a submission page for researchers to submit newly validated lncRNA-cancer associations. With the rapidly increasing interest in lncRNAs, Lnc2Cancer will significantly improve our understanding of lncRNA deregulation in cancer and has the potential to be a timely and valuable resource.

  8. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers.

    PubMed

    Ning, Shangwei; Zhang, Jizhou; Wang, Peng; Zhi, Hui; Wang, Jianjian; Liu, Yue; Gao, Yue; Guo, Maoni; Yue, Ming; Wang, Lihua; Li, Xia

    2016-01-01

    Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) is a manually curated database of cancer-associated long non-coding RNAs (lncRNAs) with experimental support that aims to provide a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. LncRNAs represent a large category of functional RNA molecules that play a significant role in human cancers. A curated collection and summary of deregulated lncRNAs in cancer is essential to thoroughly understand the mechanisms and functions of lncRNAs. Here, we developed the Lnc2Cancer database, which contains 1057 manually curated associations between 531 lncRNAs and 86 human cancers. Each association includes lncRNA and cancer name, the lncRNA expression pattern, experimental techniques, a brief functional description, the original reference and additional annotation information. Lnc2Cancer provides a user-friendly interface to conveniently browse, retrieve and download data. Lnc2Cancer also offers a submission page for researchers to submit newly validated lncRNA-cancer associations. With the rapidly increasing interest in lncRNAs, Lnc2Cancer will significantly improve our understanding of lncRNA deregulation in cancer and has the potential to be a timely and valuable resource. PMID:26481356

  9. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation.

    PubMed

    Thangakani, A Mary; Nagarajan, R; Kumar, Sandeep; Sakthivel, R; Velmurugan, D; Gromiha, M Michael

    2016-01-01

    Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are critical to research on protein misfolding diseases, such as Alzheimer's and Parkinson's, as well as biotechnological production of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from experimental studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 experimentally observed aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of additional information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Experimentally validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways. CPAD is freely available at http://www.iitm.ac.in/bioinfo/CPAD/. The potential applications of CPAD have also been discussed. PMID:27043825

  10. EpiDBase: a manually curated database for small molecule modulators of epigenetic landscape

    PubMed Central

    Loharch, Saurabh; Bhutani, Isha; Jain, Kamal; Gupta, Pawan; Sahoo, Debendra K.; Parkesh, Raman

    2015-01-01

    We have developed EpiDBase (www.epidbase.org), an interactive database of small molecule ligands of epigenetic protein families by bringing together experimental, structural and chemoinformatic data in one place. Currently, EpiDBase encompasses 5784 unique ligands (11 422 entries) of various epigenetic markers such as writers, erasers and readers. The EpiDBase includes experimental IC50 values, ligand molecular weight, hydrogen bond donor and acceptor count, XlogP, number of rotatable bonds, number of aromatic rings, InChIKey, two-dimensional and three-dimensional (3D) chemical structures. A catalog of all epidbase ligands based on the molecular weight is also provided. A structure editor is provided for 3D visualization of ligands. EpiDBase is integrated with tools like text search, disease-specific search, advanced search, substructure, and similarity analysis. Advanced analysis can be performed using substructure and OpenBabel-based chemical similarity fingerprints. The EpiDBase is curated to identify unique molecular scaffolds. Initially, molecules were selected by removing peptides, macrocycles and other complex structures and then processed for conformational sampling by generating 3D conformers. Subsequent filtering through Zinc Is Not Commercial (ZINC: a free database of commercially available compounds for virtual screening) and Lilly MedChem regular rules retained many distinctive drug-like molecules. These molecules were then analyzed for physicochemical properties using OpenBabel descriptors and clustered using various methods such as hierarchical clustering, binning partition and multidimensional scaling. EpiDBase provides comprehensive resources for further design, development and refinement of small molecule modulators of epigenetic markers. Database URL: www.epidbase.org PMID:25776023

  11. ONRLDB--manually curated database of experimentally validated ligands for orphan nuclear receptors: insights into new drug discovery.

    PubMed

    Nanduri, Ravikanth; Bhutani, Isha; Somavarapu, Arun Kumar; Mahajan, Sahil; Parkesh, Raman; Gupta, Pawan

    2015-01-01

    Orphan nuclear receptors are potential therapeutic targets. The Orphan Nuclear Receptor Ligand Binding Database (ONRLDB) is an interactive, comprehensive and manually curated database of small molecule ligands targeting orphan nuclear receptors. Currently, ONRLDB consists of ∼11,000 ligands, of which ∼6500 are unique. All entries include information for the ligand, such as EC50 and IC50, number of aromatic rings and rotatable bonds, XlogP, hydrogen donor and acceptor count, molecular weight (MW) and structure. ONRLDB is a cross-platform database, where either the cognate small molecule modulators of a receptor or the cognate receptors to a ligand can be searched. The database can be searched using three methods: text search, advanced search or similarity search. Substructure search, cataloguing tools, and clustering tools can be used to perform advanced analysis of the ligand based on chemical similarity fingerprints, hierarchical clustering, binning partition and multidimensional scaling. These tools, together with the Tree function provided, deliver an interactive platform and a comprehensive resource for identification of common and unique scaffolds. As demonstrated, ONRLDB is designed to allow selection of ligands based on various properties and for designing novel ligands or to improve the existing ones. Database URL: http://www.onrldb.org/.

  12. ONRLDB—manually curated database of experimentally validated ligands for orphan nuclear receptors: insights into new drug discovery

    PubMed Central

    Nanduri, Ravikanth; Bhutani, Isha; Somavarapu, Arun Kumar; Mahajan, Sahil; Parkesh, Raman; Gupta, Pawan

    2015-01-01

    Orphan nuclear receptors are potential therapeutic targets. The Orphan Nuclear Receptor Ligand Binding Database (ONRLDB) is an interactive, comprehensive and manually curated database of small molecule ligands targeting orphan nuclear receptors. Currently, ONRLDB consists of ∼11 000 ligands, of which ∼6500 are unique. All entries include information for the ligand, such as EC50 and IC50, number of aromatic rings and rotatable bonds, XlogP, hydrogen donor and acceptor count, molecular weight (MW) and structure. ONRLDB is a cross-platform database, where either the cognate small molecule modulators of a receptor or the cognate receptors to a ligand can be searched. The database can be searched using three methods: text search, advanced search or similarity search. Substructure search, cataloguing tools, and clustering tools can be used to perform advanced analysis of the ligand based on chemical similarity fingerprints, hierarchical clustering, binning partition and multidimensional scaling. These tools, together with the Tree function provided, deliver an interactive platform and a comprehensive resource for identification of common and unique scaffolds. As demonstrated, ONRLDB is designed to allow selection of ligands based on various properties and for designing novel ligands or to improve the existing ones. Database URL: http://www.onrldb.org/ PMID:26637529

  13. EpiDBase: a manually curated database for small molecule modulators of epigenetic landscape.

    PubMed

    Loharch, Saurabh; Bhutani, Isha; Jain, Kamal; Gupta, Pawan; Sahoo, Debendra K; Parkesh, Raman

    2015-01-01

    We have developed EpiDBase (www.epidbase.org), an interactive database of small molecule ligands of epigenetic protein families by bringing together experimental, structural and chemoinformatic data in one place. Currently, EpiDBase encompasses 5784 unique ligands (11 422 entries) of various epigenetic markers such as writers, erasers and readers. The EpiDBase includes experimental IC(50) values, ligand molecular weight, hydrogen bond donor and acceptor count, XlogP, number of rotatable bonds, number of aromatic rings, InChIKey, two-dimensional and three-dimensional (3D) chemical structures. A catalog of all epidbase ligands based on the molecular weight is also provided. A structure editor is provided for 3D visualization of ligands. EpiDBase is integrated with tools like text search, disease-specific search, advanced search, substructure, and similarity analysis. Advanced analysis can be performed using substructure and OpenBabel-based chemical similarity fingerprints. The EpiDBase is curated to identify unique molecular scaffolds. Initially, molecules were selected by removing peptides, macrocycles and other complex structures and then processed for conformational sampling by generating 3D conformers. Subsequent filtering through Zinc Is Not Commercial (ZINC: a free database of commercially available compounds for virtual screening) and Lilly MedChem regular rules retained many distinctive drug-like molecules. These molecules were then analyzed for physicochemical properties using OpenBabel descriptors and clustered using various methods such as hierarchical clustering, binning partition and multidimensional scaling. EpiDBase provides comprehensive resources for further design, development and refinement of small molecule modulators of epigenetic markers.

  14. 3CDB: a manually curated database of chromosome conformation capture data.

    PubMed

    Yun, Xiaoxiao; Xia, Lili; Tang, Bixia; Zhang, Hui; Li, Feifei; Zhang, Zhihua

    2016-01-01

    Chromosome conformation capture (3C) is a biochemical technology to analyse contact frequencies between selected genomic sites in a cell population. Its recent genomic variants, e.g. Hi-C/ chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the study of nuclear organization at an unprecedented level. However, due to the inherent low resolution and ultrahigh cost of Hi-C/ChIA-PET, 3C is still the gold standard for determining interactions between given regulatory DNA elements, such as enhancers and promoters. Therefore, we developed a database of 3C determined functional chromatin interactions (3CDB;http://3cdb.big.ac.cn). To construct 3CDB, we searched PubMed and Google Scholar with carefully designed keyword combinations and retrieved more than 5000 articles from which we manually extracted 3319 interactions in 17 species. Moreover, we proposed a systematic evaluation scheme for data reliability and classified the interactions into four categories. Contact frequencies are not directly comparable as a result of various modified 3C protocols employed among laboratories. Our evaluation scheme provides a plausible solution to this long-standing problem in the field. A user-friendly web interface was designed to assist quick searches in 3CDB. We believe that 3CDB will provide fundamental information for experimental design and phylogenetic analysis, as well as bridge the gap between molecular and systems biologists who must now contend with noisy high-throughput data.Database URL:http://3cdb.big.ac.cn.

  15. 3CDB: a manually curated database of chromosome conformation capture data

    PubMed Central

    Yun, Xiaoxiao; Xia, Lili; Tang, Bixia; Zhang, Hui; Li, Feifei; Zhang, Zhihua

    2016-01-01

    Chromosome conformation capture (3C) is a biochemical technology to analyse contact frequencies between selected genomic sites in a cell population. Its recent genomic variants, e.g. Hi-C/ chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the study of nuclear organization at an unprecedented level. However, due to the inherent low resolution and ultrahigh cost of Hi-C/ChIA-PET, 3C is still the gold standard for determining interactions between given regulatory DNA elements, such as enhancers and promoters. Therefore, we developed a database of 3C determined functional chromatin interactions (3CDB; http://3cdb.big.ac.cn). To construct 3CDB, we searched PubMed and Google Scholar with carefully designed keyword combinations and retrieved more than 5000 articles from which we manually extracted 3319 interactions in 17 species. Moreover, we proposed a systematic evaluation scheme for data reliability and classified the interactions into four categories. Contact frequencies are not directly comparable as a result of various modified 3C protocols employed among laboratories. Our evaluation scheme provides a plausible solution to this long-standing problem in the field. A user-friendly web interface was designed to assist quick searches in 3CDB. We believe that 3CDB will provide fundamental information for experimental design and phylogenetic analysis, as well as bridge the gap between molecular and systems biologists who must now contend with noisy high-throughput data. Database URL: http://3cdb.big.ac.cn PMID:27081154

  16. Curation accuracy of model organism databases.

    PubMed

    Keseler, Ingrid M; Skrzypek, Marek; Weerasinghe, Deepika; Chen, Albert Y; Fulcher, Carol; Li, Gene-Wei; Lemmer, Kimberly C; Mladinich, Katherine M; Chow, Edmond D; Sherlock, Gavin; Karp, Peter D

    2014-01-01

    Manual extraction of information from the biomedical literature-or biocuration-is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org//

  17. VaDE: a manually curated database of reproducible associations between various traits and human genomic polymorphisms

    PubMed Central

    Nagai, Yoko; Takahashi, Yasuko; Imanishi, Tadashi

    2015-01-01

    Genome-wide association studies (GWASs) have identified numerous single nucleotide polymorphisms (SNPs) associated with the development of common diseases. However, it is clear that genetic risk factors of common diseases are heterogeneous among human populations. Therefore, we developed a database of genomic polymorphisms that are reproducibly associated with disease susceptibilities, drug responses and other traits for each human population: ‘VarySysDB Disease Edition’ (VaDE; http://bmi-tokai.jp/VaDE/). SNP-trait association data were obtained from the National Human Genome Research Institute GWAS (NHGRI GWAS) catalog and RAvariome, and we added detailed information of sample populations by curating original papers. In addition, we collected and curated original papers, and registered the detailed information of SNP-trait associations in VaDE. Then, we evaluated reproducibility of associations in each population by counting the number of significantly associated studies. VaDE provides literature-based SNP-trait association data and functional genomic region annotation for SNP functional research. SNP functional annotation data included experimental data of the ENCODE project, H-InvDB transcripts and the 1000 Genome Project. A user-friendly web interface was developed to assist quick search, easy download and fast swapping among viewers. We believe that our database will contribute to the future establishment of personalized medicine and increase our understanding of genetic factors underlying diseases. PMID:25361969

  18. Supporting the curation of biological databases with reusable text mining.

    PubMed

    Miotto, Olivo; Tan, Tin Wee; Brusic, Vladimir

    2005-01-01

    Curators of biological databases transfer knowledge from scientific publications, a laborious and expensive manual process. Machine learning algorithms can reduce the workload of curators by filtering relevant biomedical literature, though their widespread adoption will depend on the availability of intuitive tools that can be configured for a variety of tasks. We propose a new method for supporting curators by means of document categorization, and describe the architecture of a curator-oriented tool implementing this method using techniques that require no computational linguistic or programming expertise. To demonstrate the feasibility of this approach, we prototyped an application of this method to support a real curation task: identifying PubMed abstracts that contain allergen cross-reactivity information. We tested the performance of two different classifier algorithms (CART and ANN), applied to both composite and single-word features, using several feature scoring functions. Both classifiers exceeded our performance targets, the ANN classifier yielding the best results. These results show that the method we propose can deliver the level of performance needed to assist database curation.

  19. Manual classification strategies in the ECOD database

    PubMed Central

    Cheng, Hua; Liao, Yuxing; Schaeffer, R. Dustin; Grishin, Nick V.

    2015-01-01

    ECOD (Evolutionary Classification Of protein Domains) is a comprehensive and up-to-date protein structure classification database. The majority of new structures released from the PDB (Protein Data Bank) every week already have close homologs in the ECOD hierarchy and thus can be reliably partitioned into domains and classified by software without manual intervention. However, those proteins that lack confidently detectable homologs require careful analysis by experts. Although many bioinformatics resources rely on expert curation to some degree, specific examples of how this curation occurs and in what cases it is necessary are not always described. Here, we illustrate the manual classification strategy in ECOD by example, focusing on two major issues in protein classification: domain partitioning and the relationship between homology and similarity scores. Most examples show recently released and manually classified PDB structures. We discuss multi-domain proteins, discordance between sequence and structural similarities, difficulties with assessing homology with scores, and integral membrane proteins homologous to soluble proteins. By timely assimilation of newly available structures into its hierarchy, ECOD strives to provide a most accurate and updated view of the protein structure world as a result of combined computational and expert-driven analysis. PMID:25917548

  20. DIDA: A curated and annotated digenic diseases database.

    PubMed

    Gazzo, Andrea M; Daneels, Dorien; Cilia, Elisa; Bonduelle, Maryse; Abramowicz, Marc; Van Dooren, Sonia; Smits, Guillaume; Lenaerts, Tom

    2016-01-01

    DIDA (DIgenic diseases DAtabase) is a novel database that provides for the first time detailed information on genes and associated genetic variants involved in digenic diseases, the simplest form of oligogenic inheritance. The database is accessible via http://dida.ibsquare.be and currently includes 213 digenic combinations involved in 44 different digenic diseases. These combinations are composed of 364 distinct variants, which are distributed over 136 distinct genes. The web interface provides browsing and search functionalities, as well as documentation and help pages, general database statistics and references to the original publications from which the data have been collected. The possibility to submit novel digenic data to DIDA is also provided. Creating this new repository was essential as current databases do not allow one to retrieve detailed records regarding digenic combinations. Genes, variants, diseases and digenic combinations in DIDA are annotated with manually curated information and information mined from other online resources. Next to providing a unique resource for the development of new analysis methods, DIDA gives clinical and molecular geneticists a tool to find the most comprehensive information on the digenic nature of their diseases of interest. PMID:26481352

  1. A Curated Database of Rodent Uterotrophic Bioactivity

    PubMed Central

    Kleinstreuer, Nicole C.; Ceger, Patricia C.; Allen, David G.; Strickland, Judy; Chang, Xiaoqing; Hamm, Jonathan T.; Casey, Warren M.

    2015-01-01

    Background: Novel in vitro methods are being developed to identify chemicals that may interfere with estrogen receptor (ER) signaling, but the results are difficult to put into biological context because of reliance on reference chemicals established using results from other in vitro assays and because of the lack of high-quality in vivo reference data. The Organisation for Economic Co-operation and Development (OECD)-validated rodent uterotrophic bioassay is considered the “gold standard” for identifying potential ER agonists. Objectives: We performed a comprehensive literature review to identify and evaluate data from uterotrophic studies and to analyze study variability. Methods: We reviewed 670 articles with results from 2,615 uterotrophic bioassays using 235 unique chemicals. Study descriptors, such as species/strain, route of administration, dosing regimen, lowest effect level, and test outcome, were captured in a database of uterotrophic results. Studies were assessed for adherence to six criteria that were based on uterotrophic regulatory test guidelines. Studies meeting all six criteria (458 bioassays on 118 unique chemicals) were considered guideline-like (GL) and were subsequently analyzed. Results: The immature rat model was used for 76% of the GL studies. Active outcomes were more prevalent across rat models (74% active) than across mouse models (36% active). Of the 70 chemicals with at least two GL studies, 18 (26%) had discordant outcomes and were classified as both active and inactive. Many discordant results were attributable to differences in study design (e.g., injection vs. oral dosing). Conclusions: This uterotrophic database provides a valuable resource for understanding in vivo outcome variability and for evaluating the performance of in vitro assays that measure estrogenic activity. Citation: Kleinstreuer NC, Ceger PC, Allen DG, Strickland J, Chang X, Hamm JT, Casey WM. 2016. A curated database of rodent uterotrophic bioactivity. Environ

  2. From manual curation to visualization of gene families and networks across Solanaceae plant species.

    PubMed

    Pujar, Anuradha; Menda, Naama; Bombarely, Aureliano; Edwards, Jeremy D; Strickler, Susan R; Mueller, Lukas A

    2013-01-01

    High-quality manual annotation methods and practices need to be scaled to the increased rate of genomic data production. Curation based on gene families and gene networks is one approach that can significantly increase both curation efficiency and quality. The Sol Genomics Network (SGN; http://solgenomics.net) is a comparative genomics platform, with genetic, genomic and phenotypic information of the Solanaceae family and its closely related species that incorporates a community-based gene and phenotype curation system. In this article, we describe a manual curation system for gene families aimed at facilitating curation, querying and visualization of gene interaction patterns underlying complex biological processes, including an interface for efficiently capturing information from experiments with large data sets reported in the literature. Well-annotated multigene families are useful for further exploration of genome organization and gene evolution across species. As an example, we illustrate the system with the multigene transcription factor families, WRKY and Small Auxin Up-regulated RNA (SAUR), which both play important roles in responding to abiotic stresses in plants. Database URL: http://solgenomics.net/

  3. A computational platform to maintain and migrate manual functional annotations for BioCyc databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Model organism databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integratio...

  4. PhenoMiner: quantitative phenotype curation at the rat genome database.

    PubMed

    Laulederkind, Stanley J F; Liu, Weisong; Smith, Jennifer R; Hayman, G Thomas; Wang, Shur-Jen; Nigam, Rajni; Petri, Victoria; Lowry, Timothy F; de Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2013-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses >40 000 rat gene records as well as human and mouse orthologs, >2000 rat and 1900 human quantitative trait loci (QTLs) records and >2900 rat strain records. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. Recently, a project was initiated at RGD to incorporate quantitative phenotype data for rat strains, in addition to the currently existing qualitative phenotype data for rat strains, QTLs and genes. A specialized curation tool was designed to generate manual annotations with up to six different ontologies/vocabularies used simultaneously to describe a single experimental value from the literature. Concurrently, three of those ontologies needed extensive addition of new terms to move the curation forward. The curation interface development, as well as ontology development, was an ongoing process during the early stages of the PhenoMiner curation project. Database URL: http://rgd.mcw.edu.

  5. OntoMate: a text-mining tool aiding curation at the Rat Genome Database.

    PubMed

    Liu, Weisong; Laulederkind, Stanley J F; Hayman, G Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu.

  6. OntoMate: a text-mining tool aiding curation at the Rat Genome Database

    PubMed Central

    Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558

  7. Chemical biology databases: from aggregation, curation to representation.

    PubMed

    Audouze, Karine; Taboureau, Olivier

    2015-07-01

    Systems chemical biology offers a novel way of approaching drug discovery by developing models that consider the global physiological environment of protein targets and their perturbations by drugs. However, the integration of all these data needs curation and standardization with an appropriate representation in order to get relevant interpretations. In this mini review, we present some databases and services, which integrated together with computational tools and data standardization, could assist scientists in decision making during the different drug development process.

  8. Chemical databases: curation or integration by user-defined equivalence?

    PubMed

    Hersey, Anne; Chambers, Jon; Bellis, Louisa; Patrícia Bento, A; Gaulton, Anna; Overington, John P

    2015-07-01

    There is a wealth of valuable chemical information in publicly available databases for use by scientists undertaking drug discovery. However finite curation resource, limitations of chemical structure software and differences in individual database applications mean that exact chemical structure equivalence between databases is unlikely to ever be a reality. The ability to identify compound equivalence has been made significantly easier by the use of the International Chemical Identifier (InChI), a non-proprietary line-notation for describing a chemical structure. More importantly, advances in methods to identify compounds that are the same at various levels of similarity, such as those containing the same parent component or having the same connectivity, are now enabling related compounds to be linked between databases where the structure matches are not exact. PMID:26194583

  9. Imitating Manual Curation of Text-Mined Facts in Biomedicine

    PubMed Central

    Rodriguez-Esteban, Raul; Iossifov, Ivan; Rzhetsky, Andrey

    2006-01-01

    Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts—to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations), we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95). Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine. PMID:16965176

  10. Imitating manual curation of text-mined facts in biomedicine.

    PubMed

    Rodriguez-Esteban, Raul; Iossifov, Ivan; Rzhetsky, Andrey

    2006-09-01

    Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations), we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95). Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.

  11. SPIKE: a database of highly curated human signaling pathways.

    PubMed

    Paz, Arnon; Brownstein, Zippora; Ber, Yaara; Bialik, Shani; David, Eyal; Sagir, Dorit; Ulitsky, Igor; Elkon, Ran; Kimchi, Adi; Avraham, Karen B; Shiloh, Yosef; Shamir, Ron

    2011-01-01

    The rapid accumulation of knowledge on biological signaling pathways and their regulatory mechanisms has highlighted the need for specific repositories that can store, organize and allow retrieval of pathway information in a way that will be useful for the research community. SPIKE (Signaling Pathways Integrated Knowledge Engine; http://www.cs.tau.ac.il/&~spike/) is a database for achieving this goal, containing highly curated interactions for particular human pathways, along with literature-referenced information on the nature of each interaction. To make database population and pathway comprehension straightforward, a simple yet informative data model is used, and pathways are laid out as maps that reflect the curator’s understanding and make the utilization of the pathways easy. The database currently focuses primarily on pathways describing DNA damage response, cell cycle, programmed cell death and hearing related pathways. Pathways are regularly updated, and additional pathways are gradually added. The complete database and the individual maps are freely exportable in several formats. The database is accompanied by a stand-alone software tool for analysis and dynamic visualization of pathways.

  12. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community.

    PubMed

    Arnaud, Martha B; Chibucos, Marcus C; Costanzo, Maria C; Crabtree, Jonathan; Inglis, Diane O; Lotia, Adil; Orvis, Joshua; Shah, Prachi; Skrzypek, Marek S; Binkley, Gail; Miyasato, Stuart R; Wortman, Jennifer R; Sherlock, Gavin

    2010-01-01

    The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.

  13. Curating and Preserving the Big Canopy Database System: an Active Curation Approach using SEAD

    NASA Astrophysics Data System (ADS)

    Myers, J.; Cushing, J. B.; Lynn, P.; Weiner, N.; Ovchinnikova, A.; Nadkarni, N.; McIntosh, A.

    2015-12-01

    Modern research is increasingly dependent upon highly heterogeneous data and on the associated cyberinfrastructure developed to organize, analyze, and visualize that data. However, due to the complexity and custom nature of such combined data-software systems, it can be very challenging to curate and preserve them for the long term at reasonable cost and in a way that retains their scientific value. In this presentation, we describe how this challenge was met in preserving the Big Canopy Database (CanopyDB) system using an agile approach and leveraging the Sustainable Environment - Actionable Data (SEAD) DataNet project's hosted data services. The CanopyDB system was developed over more than a decade at Evergreen State College to address the needs of forest canopy researchers. It is an early yet sophisticated exemplar of the type of system that has become common in biological research and science in general, including multiple relational databases for different experiments, a custom database generation tool used to create them, an image repository, and desktop and web tools to access, analyze, and visualize this data. SEAD provides secure project spaces with a semantic content abstraction (typed content with arbitrary RDF metadata statements and relationships to other content), combined with a standards-based curation and publication pipeline resulting in packaged research objects with Digital Object Identifiers. Using SEAD, our cross-project team was able to incrementally ingest CanopyDB components (images, datasets, software source code, documentation, executables, and virtualized services) and to iteratively define and extend the metadata and relationships needed to document them. We believe that both the process, and the richness of the resultant standards-based (OAI-ORE) preservation object, hold lessons for the development of best-practice solutions for preserving scientific data in association with the tools and services needed to derive value from it.

  14. HPIDB 2.0: a curated database for host-pathogen interactions.

    PubMed

    Ammari, Mais G; Gresham, Cathy R; McCarthy, Fiona M; Nanduri, Bindu

    2016-01-01

    Identification and analysis of host-pathogen interactions (HPI) is essential to study infectious diseases. However, HPI data are sparse in existing molecular interaction databases, especially for agricultural host-pathogen systems. Therefore, resources that annotate, predict and display the HPI that underpin infectious diseases are critical for developing novel intervention strategies. HPIDB 2.0 (http://www.agbase.msstate.edu/hpi/main.html) is a resource for HPI data, and contains 45, 238 manually curated entries in the current release. Since the first description of the database in 2010, multiple enhancements to HPIDB data and interface services were made that are described here. Notably, HPIDB 2.0 now provides targeted biocuration of molecular interaction data. As a member of the International Molecular Exchange consortium, annotations provided by HPIDB 2.0 curators meet community standards to provide detailed contextual experimental information and facilitate data sharing. Moreover, HPIDB 2.0 provides access to rapidly available community annotations that capture minimum molecular interaction information to address immediate researcher needs for HPI network analysis. In addition to curation, HPIDB 2.0 integrates HPI from existing external sources and contains tools to infer additional HPI where annotated data are scarce. Compared to other interaction databases, our data collection approach ensures HPIDB 2.0 users access the most comprehensive HPI data from a wide range of pathogens and their hosts (594 pathogen and 70 host species, as of February 2016). Improvements also include enhanced search capacity, addition of Gene Ontology functional information, and implementation of network visualization. The changes made to HPIDB 2.0 content and interface ensure that users, especially agricultural researchers, are able to easily access and analyse high quality, comprehensive HPI data. All HPIDB 2.0 data are updated regularly, are publically available for direct

  15. HPIDB 2.0: a curated database for host–pathogen interactions

    PubMed Central

    Ammari, Mais G.; Gresham, Cathy R.; McCarthy, Fiona M.; Nanduri, Bindu

    2016-01-01

    Identification and analysis of host–pathogen interactions (HPI) is essential to study infectious diseases. However, HPI data are sparse in existing molecular interaction databases, especially for agricultural host–pathogen systems. Therefore, resources that annotate, predict and display the HPI that underpin infectious diseases are critical for developing novel intervention strategies. HPIDB 2.0 (http://www.agbase.msstate.edu/hpi/main.html) is a resource for HPI data, and contains 45, 238 manually curated entries in the current release. Since the first description of the database in 2010, multiple enhancements to HPIDB data and interface services were made that are described here. Notably, HPIDB 2.0 now provides targeted biocuration of molecular interaction data. As a member of the International Molecular Exchange consortium, annotations provided by HPIDB 2.0 curators meet community standards to provide detailed contextual experimental information and facilitate data sharing. Moreover, HPIDB 2.0 provides access to rapidly available community annotations that capture minimum molecular interaction information to address immediate researcher needs for HPI network analysis. In addition to curation, HPIDB 2.0 integrates HPI from existing external sources and contains tools to infer additional HPI where annotated data are scarce. Compared to other interaction databases, our data collection approach ensures HPIDB 2.0 users access the most comprehensive HPI data from a wide range of pathogens and their hosts (594 pathogen and 70 host species, as of February 2016). Improvements also include enhanced search capacity, addition of Gene Ontology functional information, and implementation of network visualization. The changes made to HPIDB 2.0 content and interface ensure that users, especially agricultural researchers, are able to easily access and analyse high quality, comprehensive HPI data. All HPIDB 2.0 data are updated regularly, are publically available for direct

  16. Advancing Exposure Science through Chemical Data Curation and Integration in the Comparative Toxicogenomics Database

    PubMed Central

    Grondin, Cynthia J.; Davis, Allan Peter; Wiegers, Thomas C.; King, Benjamin L.; Wiegers, Jolene A.; Reif, David M.; Hoppin, Jane A.; Mattingly, Carolyn J.

    2016-01-01

    Background: Exposure science studies the interactions and outcomes between environmental stressors and human or ecological receptors. To augment its role in understanding human health and the exposome, we aimed to centralize and integrate exposure science data into the broader biological framework of the Comparative Toxicogenomics Database (CTD), a public resource that promotes understanding of environmental chemicals and their effects on human health. Objectives: We integrated exposure data within the CTD to provide a centralized, freely available resource that facilitates identification of connections between real-world exposures, chemicals, genes/proteins, diseases, biological processes, and molecular pathways. Methods: We developed a manual curation paradigm that captures exposure data from the scientific literature using controlled vocabularies and free text within the context of four primary exposure concepts: stressor, receptor, exposure event, and exposure outcome. Using data from the Agricultural Health Study, we have illustrated the benefits of both centralization and integration of exposure information with CTD core data. Results: We have described our curation process, demonstrated how exposure data can be accessed and analyzed in the CTD, and shown how this integration provides a broad biological context for exposure data to promote mechanistic understanding of environmental influences on human health. Conclusions: Curation and integration of exposure data within the CTD provides researchers with new opportunities to correlate exposures with human health outcomes, to identify underlying potential molecular mechanisms, and to improve understanding about the exposome. Citation: Grondin CJ, Davis AP, Wiegers TC, King BL, Wiegers JA, Reif DM, Hoppin JA, Mattingly CJ. 2016. Advancing exposure science through chemical data curation and integration in the Comparative Toxicogenomics Database. Environ Health Perspect 124:1592–1599; http://dx.doi.org/10

  17. SORGOdb: Superoxide Reductase Gene Ontology curated DataBase

    PubMed Central

    2011-01-01

    Background Superoxide reductases (SOR) catalyse the reduction of superoxide anions to hydrogen peroxide and are involved in the oxidative stress defences of anaerobic and facultative anaerobic organisms. Genes encoding SOR were discovered recently and suffer from annotation problems. These genes, named sor, are short and the transfer of annotations from previously characterized neelaredoxin, desulfoferrodoxin, superoxide reductase and rubredoxin oxidase has been heterogeneous. Consequently, many sor remain anonymous or mis-annotated. Description SORGOdb is an exhaustive database of SOR that proposes a new classification based on domain architecture. SORGOdb supplies a simple user-friendly web-based database for retrieving and exploring relevant information about the proposed SOR families. The database can be queried using an organism name, a locus tag or phylogenetic criteria, and also offers sequence similarity searches using BlastP. Genes encoding SOR have been re-annotated in all available genome sequences (prokaryotic and eukaryotic (complete and in draft) genomes, updated in May 2010). Conclusions SORGOdb contains 325 non-redundant and curated SOR, from 274 organisms. It proposes a new classification of SOR into seven different classes and allows biologists to explore and analyze sor in order to establish correlations between the class of SOR and organism phenotypes. SORGOdb is freely available at http://sorgo.genouest.org/index.php. PMID:21575179

  18. Instruction manual for the Wahoo computerized database

    SciTech Connect

    Lasota, D.; Watts, K.

    1995-05-01

    As part of our research on the Lisburne Group, we have developed a powerful relational computerized database to accommodate the huge amounts of data generated by our multi-disciplinary research project. The Wahoo database has data files on petrographic data, conodont analyses, locality and sample data, well logs and diagenetic (cement) studies. Chapter 5 is essentially an instruction manual that summarizes some of the unique attributes and operating procedures of the Wahoo database. The main purpose of a database is to allow users to manipulate their data and produce reports and graphs for presentation. We present a variety of data tables in appendices at the end of this report, each encapsulating a small part of the data contained in the Wahoo database. All the data are sorted and listed by map index number and stratigraphic position (depth). The Locality data table (Appendix A) lists of the stratigraphic sections examined in our study. It gives names of study areas, stratigraphic units studied, locality information, and researchers. Most localities are keyed to a geologic map that shows the distribution of the Lisburne Group and location of our sections in ANWR. Petrographic reports (Appendix B) are detailed summaries of data the composition and texture of the Lisburne Group carbonates. The relative abundance of different carbonate grains (allochems) and carbonate texture are listed using symbols that portray data in a format similar to stratigraphic columns. This enables researchers to recognize trends in the evolution of the Lisburne carbonate platform and to check their paleoenvironmental interpretations in a stratigraphic context. Some of the figures in Chapter 1 were made using the Wahoo database.

  19. MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution

    PubMed Central

    Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian

    2015-01-01

    Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. PMID:26286928

  20. MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution.

    PubMed

    Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian

    2015-01-01

    Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. PMID:26286928

  1. A survey of locus-specific database curation. Human Genome Variation Society.

    PubMed

    Cotton, Richard G H; Phillips, Kate; Horaitis, Ourania

    2007-04-01

    It is widely accepted that curation of variation in genes is best performed by experts in those genes and their variation. However, obtaining funding for such variation is difficult even though up-to-date lists of variations in genes are essential for optimum delivery of genetic healthcare and for medical research. This study was undertaken to gather information on gene-specific databases (locus-specific databases) in an effort to understand their functioning, funding and needs. A questionnaire was sent to 125 curators and we received 47 responses. Individuals performed curation of up to 69 genes. The time curators spent curating was extremely variable. This ranged from 0 h per week up to 5 curators spending over 4 h per week. The funding required ranged from US$600 to US$45,000 per year. Most databases were stimulated by the Human Genome Organization-Mutation Database Initiative and used their guidelines. Many databases reported unpublished mutations, with all but one respondent reporting errors in the literature. Of the 13 who reported hit rates, 9 reported over 52,000 hits per year. On the basis of this, five recommendations were made to improve the curation of variation information, particularly that of mutations causing single-gene disorder: 1. A curator for each gene, who is an expert in it, should be identified or nominated. 2. Curation at a minimum of 2 h per week at US$2000 per gene per year should be encouraged. 3. Guidelines and custom software use should be encouraged to facilitate easy setup and curation. 4. Hits per week on the website should be recorded to allow the importance of the site to be illustrated for grant-giving purposes. 5. Published protocols should be followed in the establishment of locus-specific databases.

  2. RiceWiki: a wiki-based database for community curation of rice genes.

    PubMed

    Zhang, Zhang; Sang, Jian; Ma, Lina; Wu, Gang; Wu, Hao; Huang, Dawei; Zou, Dong; Liu, Siqi; Li, Ang; Hao, Lili; Tian, Ming; Xu, Chao; Wang, Xumin; Wu, Jiayan; Xiao, Jingfa; Dai, Lin; Chen, Ling-Ling; Hu, Songnian; Yu, Jun

    2014-01-01

    Rice is the most important staple food for a large part of the world's human population and also a key model organism for biological studies of crops as well as other related plants. Here we present RiceWiki (http://ricewiki.big.ac.cn), a wiki-based, publicly editable and open-content platform for community curation of rice genes. Most existing related biological databases are based on expert curation; with the exponentially exploding volume of rice knowledge and other relevant data, however, expert curation becomes increasingly laborious and time-consuming to keep knowledge up-to-date, accurate and comprehensive, struggling with the flood of data and requiring a large number of people getting involved in rice knowledge curation. Unlike extant relevant databases, RiceWiki features harnessing collective intelligence in community curation of rice genes, quantifying users' contributions in each curated gene and providing explicit authorship for each contributor in any given gene, with the aim to exploit the full potential of the scientific community for rice knowledge curation. Based on community curation, RiceWiki bears the potential to make it possible to build a rice encyclopedia by and for the scientific community that harnesses community intelligence for collaborative knowledge curation, covers all aspects of biological knowledge and keeps evolving with novel knowledge. PMID:24136999

  3. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences

    PubMed Central

    McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta

    2016-01-01

    BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation. Database URL: https://www.biosharing.org PMID:27189610

  4. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences.

    PubMed

    McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta

    2016-01-01

    BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation.Database URL: https://www.biosharing.org. PMID:27189610

  5. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences.

    PubMed

    McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta

    2016-01-01

    BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation.Database URL: https://www.biosharing.org.

  6. National Solar Radiation Database 1991-2005 Update: User's Manual

    SciTech Connect

    Wilcox, S.

    2007-04-01

    This manual describes how to obtain and interpret the data products from the updated 1991-2005 National Solar Radiation Database (NSRDB). This is an update of the original 1961-1990 NSRDB released in 1992.

  7. National Solar Radiation Database 1991-2010 Update: User's Manual

    SciTech Connect

    Wilcox, S. M.

    2012-08-01

    This user's manual provides information on the updated 1991-2010 National Solar Radiation Database. Included are data format descriptions, data sources, production processes, and information about data uncertainty.

  8. AtlasT4SS: A curated database for type IV secretion systems

    PubMed Central

    2012-01-01

    Background The type IV secretion system (T4SS) can be classified as a large family of macromolecule transporter systems, divided into three recognized sub-families, according to the well-known functions. The major sub-family is the conjugation system, which allows transfer of genetic material, such as a nucleoprotein, via cell contact among bacteria. Also, the conjugation system can transfer genetic material from bacteria to eukaryotic cells; such is the case with the T-DNA transfer of Agrobacterium tumefaciens to host plant cells. The system of effector protein transport constitutes the second sub-family, and the third one corresponds to the DNA uptake/release system. Genome analyses have revealed numerous T4SS in Bacteria and Archaea. The purpose of this work was to organize, classify, and integrate the T4SS data into a single database, called AtlasT4SS - the first public database devoted exclusively to this prokaryotic secretion system. Description The AtlasT4SS is a manual curated database that describes a large number of proteins related to the type IV secretion system reported so far in Gram-negative and Gram-positive bacteria, as well as in Archaea. The database was created using the RDBMS MySQL and the Catalyst Framework based in the Perl programming language and using the Model-View-Controller (MVC) design pattern for Web. The current version holds a comprehensive collection of 1,617 T4SS proteins from 58 Bacteria (49 Gram-negative and 9 Gram-Positive), one Archaea and 11 plasmids. By applying the bi-directional best hit (BBH) relationship in pairwise genome comparison, it was possible to obtain a core set of 134 clusters of orthologous genes encoding T4SS proteins. Conclusions In our database we present one way of classifying orthologous groups of T4SSs in a hierarchical classification scheme with three levels. The first level comprises four classes that are based on the organization of genetic determinants, shared homologies, and evolutionary

  9. Hydrologic database user`s manual

    SciTech Connect

    Champman, J.B.; Gray, K.J.; Thompson, C.B.

    1993-09-01

    The Hydrologic Database is an electronic filing cabinet containing water-related data for the Nevada Test Site (NTS). The purpose of the database is to enhance research on hydrologic issues at the NTS by providing efficient access to information gathered by a variety of scientists. Data are often generated for specific projects and are reported to DOE in the context of specific project goals. The originators of the database recognized that much of this information has a general value that transcends project-specific requirements. Allowing researchers access to information generated by a wide variety of projects can prevent needless duplication of data-gathering efforts and can augment new data collection and interpretation. In addition, collecting this information in the database ensures that the results are not lost at the end of discrete projects as long as the database is actively maintained. This document is a guide to using the database.

  10. DEPOT database: Reference manual and user's guide

    SciTech Connect

    Clancey, P.; Logg, C.

    1991-03-01

    DEPOT has been developed to provide tracking for the Stanford Linear Collider (SLC) control system equipment. For each piece of equipment entered into the database, complete location, service, maintenance, modification, certification, and radiation exposure histories can be maintained. To facilitate data entry accuracy, efficiency, and consistency, barcoding technology has been used extensively. DEPOT has been an important tool in improving the reliability of the microsystems controlling SLC. This document describes the components of the DEPOT database, the elements in the database records, and the use of the supporting programs for entering data, searching the database, and producing reports from the information.

  11. Geroprotectors.org: a new, structured and curated database of current therapeutic interventions in aging and age-related disease.

    PubMed

    Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex

    2015-09-01

    As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions. PMID:26342919

  12. Geroprotectors.org: a new, structured and curated database of current therapeutic interventions in aging and age-related disease.

    PubMed

    Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex

    2015-09-01

    As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions.

  13. Geroprotectors.org: a new, structured and curated database of current therapeutic interventions in aging and age-related disease

    PubMed Central

    Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E.; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C.; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex

    2015-01-01

    As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions. PMID:26342919

  14. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database.

    PubMed

    Drabkin, Harold J; Blake, Judith A

    2012-01-01

    The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as 'GO' or 'homology' or 'phenotype'. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as 'papers selected for GO that refer to genes with NO GO annotation'. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported

  15. A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions.

    PubMed

    Davis, Allan Peter; Wiegers, Thomas C; Roberts, Phoebe M; King, Benjamin L; Lay, Jean M; Lennon-Hopkins, Kelley; Sciaky, Daniela; Johnson, Robin; Keating, Heather; Greene, Nigel; Hernandez, Robert; McConnell, Kevin J; Enayetallah, Ahmed E; Mattingly, Carolyn J

    2013-01-01

    Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88,629 articles relating over 1,200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 254,173 toxicogenomic interactions (152,173 chemical-disease, 58,572 chemical-gene, 5,345 gene-disease and 38,083 phenotype interactions). All chemical-gene-disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer's text-mining process to collate the articles, and CTD's curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug-disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades' worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/

  16. Assessment System for Aircraft Noise (ASAN) citation database. Volume 2: Database update manual

    NASA Astrophysics Data System (ADS)

    Reddingius, Nicolaas

    1989-12-01

    The Assessment System for Aircraft Noise (ASAN) includes a database of several thousand references to the literature on the impact of noise and sonic booms on humans, animals and structures. Bibliographic data, abstracts and critical reviews of key documents can be retrieved. The manual for the database maintenance module is presented. It is only intended for use by the critical maintenance organization to prepare new releases of the database. Several programs used to add, delete and update the database are discussed are needed together with Vol. 2 to properly maintain the database.

  17. Human and chicken TLR pathways: manual curation and computer-based orthology analysis

    PubMed Central

    Gillespie, Marc; Shamovsky, Veronica; D’Eustachio, Peter

    2011-01-01

    The innate immune responses mediated by Toll-like receptors (TLR) provide an evolutionarily well-conserved first line of defense against microbial pathogens. In the Reactome Knowledgebase we previously integrated annotations of human TLR molecular functions with those of over 4000 other human proteins involved in processes such as adaptive immunity, DNA replication, signaling, and intermediary metabolism, and have linked these annotations to external resources, including PubMed, UniProt, EntrezGene, Ensembl, and the Gene Ontology to generate a resource suitable for data mining, pathway analysis, and other systems biology approaches. We have now used a combination of manual expert curation and computer-based orthology analysis to generate a set of annotations for TLR molecular function in the chicken (Gallus gallus). Mammalian and avian lineages diverged approximately 300 million years ago, and the avian TLR repertoire consists of both orthologs and distinct new genes. The work described here centers on the molecular biology of TLR3, the host receptor that mediates responses to viral and other doubled-stranded polynucleotides, as a paradigm for our approach to integrated manual and computationally based annotation and data analysis. It tests the quality of computationally generated annotations projected from human onto other species and supports a systems biology approach to analysis of virus-activated signaling pathways and identification of clinically useful antiviral measures. PMID:21052677

  18. Human and chicken TLR pathways: manual curation and computer-based orthology analysis.

    PubMed

    Gillespie, Marc; Shamovsky, Veronica; D'Eustachio, Peter

    2011-02-01

    The innate immune responses mediated by Toll-like receptors (TLR) provide an evolutionarily well-conserved first line of defense against microbial pathogens. In the Reactome Knowledgebase we previously integrated annotations of human TLR molecular functions with those of over 4000 other human proteins involved in processes such as adaptive immunity, DNA replication, signaling, and intermediary metabolism, and have linked these annotations to external resources, including PubMed, UniProt, EntrezGene, Ensembl, and the Gene Ontology to generate a resource suitable for data mining, pathway analysis, and other systems biology approaches. We have now used a combination of manual expert curation and computer-based orthology analysis to generate a set of annotations for TLR molecular function in the chicken (Gallus gallus). Mammalian and avian lineages diverged approximately 300 million years ago, and the avian TLR repertoire consists of both orthologs and distinct new genes. The work described here centers on the molecular biology of TLR3, the host receptor that mediates responses to viral and other doubled-stranded polynucleotides, as a paradigm for our approach to integrated manual and computationally based annotation and data analysis. It tests the quality of computationally generated annotations projected from human onto other species and supports a systems biology approach to analysis of virus-activated signaling pathways and identification of clinically useful antiviral measures.

  19. The SIB Swiss Institute of Bioinformatics’ resources: focus on curated databases

    PubMed Central

    2016-01-01

    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article. PMID:26615188

  20. The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases.

    PubMed

    2016-01-01

    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article. PMID:26615188

  1. The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases.

    PubMed

    2016-01-01

    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article.

  2. Improving the Discoverability and Availability of Sample Data and Imagery in NASA's Astromaterials Curation Digital Repository Using a New Common Architecture for Sample Databases

    NASA Astrophysics Data System (ADS)

    Todd, N. S.; Evans, C.

    2015-06-01

    NASA’s Astromaterials Curation Office is leading an initiative to create a common framework for its databases to improve discoverability and access to data and imagery from NASA-curated extraterrestrial samples for planetary science researchers.

  3. CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by 13C-fluxomics

    PubMed Central

    Zhang, Zhengdong; Shen, Tie; Rui, Bin; Zhou, Wenwei; Zhou, Xiangfei; Shang, Chuanyu; Xin, Chenwei; Liu, Xiaoguang; Li, Gang; Jiang, Jiansi; Li, Chao; Li, Ruiyuan; Han, Mengshu; You, Shanping; Yu, Guojun; Yi, Yin; Wen, Han; Liu, Zhijie; Xie, Xiaoyao

    2015-01-01

    The Central Carbon Metabolic Flux Database (CeCaFDB, available at http://www.cecafdb.org) is a manually curated, multipurpose and open-access database for the documentation, visualization and comparative analysis of the quantitative flux results of central carbon metabolism among microbes and animal cells. It encompasses records for more than 500 flux distributions among 36 organisms and includes information regarding the genotype, culture medium, growth conditions and other specific information gathered from hundreds of journal articles. In addition to its comprehensive literature-derived data, the CeCaFDB supports a common text search function among the data and interactive visualization of the curated flux distributions with compartmentation information based on the Cytoscape Web API, which facilitates data interpretation. The CeCaFDB offers four modules to calculate a similarity score or to perform an alignment between the flux distributions. One of the modules was built using an inter programming algorithm for flux distribution alignment that was specifically designed for this study. Based on these modules, the CeCaFDB also supports an extensive flux distribution comparison function among the curated data. The CeCaFDB is strenuously designed to address the broad demands of biochemists, metabolic engineers, systems biologists and members of the -omics community. PMID:25392417

  4. Laminin database: a tool to retrieve high-throughput and curated data for studies on laminins

    PubMed Central

    Golbert, Daiane C. F.; Linhares-Lacerda, Leandra; Almeida, Luiz G.; Correa-de-Santana, Eliane; de Oliveira, Alice R.; Mundstein, Alex S.; Savino, Wilson; de Vasconcelos, Ana T. R.

    2011-01-01

    The Laminin(LM)-database, hosted at http://www.lm.lncc.br, is the first database focusing a non-collagenous extracellular matrix protein family, the LMs. Part of the knowledge available in this website is automatically retrieved, whereas a significant amount of information is curated and annotated, thus placing LM-database beyond a simple repository of data. In its home page, an overview of the rationale for the database is seen and readers can access a tutorial to facilitate navigation in the website, which in turn is presented with tabs subdivided into LMs, receptors, extracellular binding and other related proteins. Each tab opens into a given LM or LM-related molecule, where the reader finds a series of further tabs for ‘protein’, ‘gene structure’, ‘gene expression’ and ‘tissue distribution’ and ‘therapy’. Data are separated as a function of species, comprising Homo sapiens, Mus musculus and Rattus novergicus. Furthermore, there is specific tab displaying the LM nomenclatures. In another tab, a direct link to PubMed, which can be then consulted in a specific way, in terms of the biological functions of each molecule, knockout animals and genetic diseases, immune response and lymphomas/leukemias. LM-database will hopefully be a relevant tool for retrieving information concerning LMs in health and disease, particularly regarding the hemopoietic system. PMID:21087995

  5. The Developmental Brain Disorders Database (DBDB): a curated neurogenetics knowledge base with clinical and research applications.

    PubMed

    Mirzaa, Ghayda M; Millen, Kathleen J; Barkovich, A James; Dobyns, William B; Paciorkowski, Alex R

    2014-06-01

    The number of single genes associated with neurodevelopmental disorders has increased dramatically over the past decade. The identification of causative genes for these disorders is important to clinical outcome as it allows for accurate assessment of prognosis, genetic counseling, delineation of natural history, inclusion in clinical trials, and in some cases determines therapy. Clinicians face the challenge of correctly identifying neurodevelopmental phenotypes, recognizing syndromes, and prioritizing the best candidate genes for testing. However, there is no central repository of definitions for many phenotypes, leading to errors of diagnosis. Additionally, there is no system of levels of evidence linking genes to phenotypes, making it difficult for clinicians to know which genes are most strongly associated with a given condition. We have developed the Developmental Brain Disorders Database (DBDB: https://www.dbdb.urmc.rochester.edu/home), a publicly available, online-curated repository of genes, phenotypes, and syndromes associated with neurodevelopmental disorders. DBDB contains the first referenced ontology of developmental brain phenotypes, and uses a novel system of levels of evidence for gene-phenotype associations. It is intended to assist clinicians in arriving at the correct diagnosis, select the most appropriate genetic test for that phenotype, and improve the care of patients with developmental brain disorders. For researchers interested in the discovery of novel genes for developmental brain disorders, DBDB provides a well-curated source of important genes against which research sequencing results can be compared. Finally, DBDB allows novel observations about the landscape of the neurogenetics knowledge base.

  6. Strategies for annotation and curation of translational databases: the eTUMOUR project

    PubMed Central

    Julià-Sapé, Margarida; Lurgi, Miguel; Mier, Mariola; Estanyol, Francesc; Rafael, Xavier; Candiota, Ana Paula; Barceló, Anna; García, Alina; Martínez-Bisbal, M. Carmen; Ferrer-Luna, Rubén; Moreno-Torres, Àngel; Celda, Bernardo; Arús, Carles

    2012-01-01

    The eTUMOUR (eT) multi-centre project gathered in vivo and ex vivo magnetic resonance (MR) data, as well as transcriptomic and clinical information from brain tumour patients, with the purpose of improving the diagnostic and prognostic evaluation of future patients. In order to carry this out, among other work, a database—the eTDB—was developed. In addition to complex permission rules and software and management quality control (QC), it was necessary to develop anonymization, processing and data visualization tools for the data uploaded. It was also necessary to develop sophisticated curation strategies that involved on one hand, dedicated fields for QC-generated meta-data and specialized queries and global permissions for senior curators and on the other, to establish a set of metrics to quantify its contents. The indispensable dataset (ID), completeness and pairedness indices were set. The database contains 1317 cases created as a result of the eT project and 304 from a previous project, INTERPRET. The number of cases fulfilling the ID was 656. Completeness and pairedness were heterogeneous, depending on the data type involved. PMID:23180768

  7. ProPepper: a curated database for identification and analysis of peptide and immune-responsive epitope composition of cereal grain protein families

    PubMed Central

    Juhász, Angéla; Haraszi, Réka; Maulis, Csaba

    2015-01-01

    ProPepper is a database that contains prolamin proteins identified from true grasses (Poaceae), their peptides obtained with single- and multi-enzyme in silico digestions as well as linear T- and B-cell-specific epitopes that are responsible for wheat-related food disorders. The integrated database and analysis platform contains datasets that are collected from multiple public databases (UniprotKB, IEDB, NCBI GenBank), manually curated and annotated, and interpreted in three main data tables: Protein-, Peptide- and Epitope list views that are cross-connected by unique identifications. Altogether 21 genera and 80 different species are represented. Currently, the database contains 2146 unique and complete protein sequences related to 2618 GenBank entries and 35 657 unique peptide sequences that are a result of 575 110 unique digestion events obtained by in silico digestion methods involving six proteolytic enzymes and their combinations. The interface allows advanced global and parametric search functions along with a download option, with direct connections to the relevant public databases. Database URL: https://propepper.net PMID:26450949

  8. ProPepper: a curated database for identification and analysis of peptide and immune-responsive epitope composition of cereal grain protein families.

    PubMed

    Juhász, Angéla; Haraszi, Réka; Maulis, Csaba

    2015-01-01

    ProPepper is a database that contains prolamin proteins identified from true grasses (Poaceae), their peptides obtained with single- and multi-enzyme in silico digestions as well as linear T- and B-cell-specific epitopes that are responsible for wheat-related food disorders. The integrated database and analysis platform contains datasets that are collected from multiple public databases (UniprotKB, IEDB, NCBI GenBank), manually curated and annotated, and interpreted in three main data tables: Protein-, Peptide- and Epitope list views that are cross-connected by unique identifications. Altogether 21 genera and 80 different species are represented. Currently, the database contains 2146 unique and complete protein sequences related to 2618 GenBank entries and 35 657 unique peptide sequences that are a result of 575 110 unique digestion events obtained by in silico digestion methods involving six proteolytic enzymes and their combinations. The interface allows advanced global and parametric search functions along with a download option, with direct connections to the relevant public databases. Database URL: https://propepper.net.

  9. The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata.

    PubMed

    Inglis, Diane O; Arnaud, Martha B; Binkley, Jonathan; Shah, Prachi; Skrzypek, Marek S; Wymore, Farrell; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin

    2012-01-01

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at http://www.candidagenome.org/. We welcome feedback from the research community at candida-curator@lists.stanford.edu.

  10. Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database.

    PubMed

    Vishnyakova, Dina; Pasche, Emilie; Ruch, Patrick

    2012-01-01

    We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat.

  11. UNSODA UNSATURATED SOIL HYDRAULIC DATABASE USER'S MANUAL VERSION 1.0

    EPA Science Inventory

    This report contains general documentation and serves as a user manual of the UNSODA program. UNSODA is a database of unsaturated soil hydraulic properties (water retention, hydraulic conductivity, and soil water diffusivity), basic soil properties (particle-size distribution, b...

  12. MALIDUP: a database of manually constructed structure alignments for duplicated domain pairs.

    PubMed

    Cheng, Hua; Kim, Bong-Hyun; Grishin, Nick V

    2008-03-01

    We describe MALIDUP (manual alignments of duplicated domains), a database of 241 pairwise structure alignments for homologous domains originated by internal duplication within the same polypeptide chain. Since duplicated domains within a protein frequently diverge in function and thus in sequence, this would be the first database of structurally similar homologs that is not strongly biased by sequence or functional similarity. Our manual alignments in most cases agree with the automatic structural alignments generated by several commonly used programs. This carefully constructed database could be used in studies on protein evolution and as a reference for testing structure alignment programs. The database is available at http://prodata.swmed.edu/malidup. PMID:17932926

  13. Procedures for Conducting a Job Analysis: A Manual for the COMTASK Database.

    ERIC Educational Resources Information Center

    Alvic, Fadia M.; Newkirk-Moore, Susan

    This manual provides the information needed to conduct a job analysis and to enter or update job analysis information in the Computerized Task Inventory (COMTASK) database. Chapter I presents the purpose and organization of the manual. The second chapter provides a brief background on the purpose and design of COMTASK, a definition of terms used…

  14. Selective databases distributed on the basis of Frascati manual

    PubMed Central

    Kujundzic, Enes; Masic, Izet

    2013-01-01

    Introduction The answer to the question of what a database is and what is its relevance to the scientific research is not easy. We may not be wrong if we say that it is, basically, a kind of information resource, often incomparably richer than it is, for example, a single book or magazine. Discussion and conclusion As a form of storing and retrieval of the knowledge, appeared in the information age, which we’ve just participated and witnesses. In it, thanks to the technical possibilities of information networks, it can be searched for a number of more or less relevant information, and that scientific and profound content. Databases are divided into: bibliographic databases, citation databases and databases containing full-text. In the paper are shortly presented most important on-line databases with their web site links. Thanks to those online databases scientific knowledge is spreading much more easy and useful. PMID:23572867

  15. PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

    PubMed

    Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier

    2003-01-01

    The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.

  16. The Aspergillus Genome Database (AspGD): recent developments in comprehensive multispecies curation, comparative genomics and community resources.

    PubMed

    Arnaud, Martha B; Cerqueira, Gustavo C; Inglis, Diane O; Skrzypek, Marek S; Binkley, Jonathan; Chibucos, Marcus C; Crabtree, Jonathan; Howarth, Clinton; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin; Wortman, Jennifer R

    2012-01-01

    The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available, web-based resource for researchers studying fungi of the genus Aspergillus, which includes organisms of clinical, agricultural and industrial importance. AspGD curators have now completed comprehensive review of the entire published literature about Aspergillus nidulans and Aspergillus fumigatus, and this annotation is provided with streamlined, ortholog-based navigation of the multispecies information. AspGD facilitates comparative genomics by providing a full-featured genomics viewer, as well as matched and standardized sets of genomic information for the sequenced aspergilli. AspGD also provides resources to foster interaction and dissemination of community information and resources. We welcome and encourage feedback at aspergillus-curator@lists.stanford.edu.

  17. TWRS information locator database system administrator`s manual

    SciTech Connect

    Knutson, B.J., Westinghouse Hanford

    1996-09-13

    This document is a guide for use by the Tank Waste Remediation System (TWRS) Information Locator Database (ILD) System Administrator. The TWRS ILD System is an inventory of information used in the TWRS Systems Engineering process to represent the TWRS Technical Baseline. The inventory is maintained in the form of a relational database developed in Paradox 4.5.

  18. Nuclear Energy Infrastructure Database Description and User’s Manual

    SciTech Connect

    Heidrich, Brenden

    2015-11-01

    In 2014, the Deputy Assistant Secretary for Science and Technology Innovation initiated the Nuclear Energy (NE)–Infrastructure Management Project by tasking the Nuclear Science User Facilities, formerly the Advanced Test Reactor National Scientific User Facility, to create a searchable and interactive database of all pertinent NE-supported and -related infrastructure. This database, known as the Nuclear Energy Infrastructure Database (NEID), is used for analyses to establish needs, redundancies, efficiencies, distributions, etc., to best understand the utility of NE’s infrastructure and inform the content of infrastructure calls. The Nuclear Science User Facilities developed the database by utilizing data and policy direction from a variety of reports from the U.S. Department of Energy, the National Research Council, the International Atomic Energy Agency, and various other federal and civilian resources. The NEID currently contains data on 802 research and development instruments housed in 377 facilities at 84 institutions in the United States and abroad. The effort to maintain and expand the database is ongoing. Detailed information on many facilities must be gathered from associated institutions and added to complete the database. The data must be validated and kept current to capture facility and instrumentation status as well as to cover new acquisitions and retirements. This document provides a short tutorial on the navigation of the NEID web portal at NSUF-Infrastructure.INL.gov.

  19. Improving the Discoverability and Availability of Sample Data and Imagery in NASA's Astromaterials Curation Digital Repository Using a New Common Architecture for Sample Databases

    NASA Technical Reports Server (NTRS)

    Todd, N. S.; Evans, C.

    2015-01-01

    The Astromaterials Acquisition and Curation Office at NASA's Johnson Space Center (JSC) is the designated facility for curating all of NASA's extraterrestrial samples. The suite of collections includes the lunar samples from the Apollo missions, cosmic dust particles falling into the Earth's atmosphere, meteorites collected in Antarctica, comet and interstellar dust particles from the Stardust mission, asteroid particles from the Japanese Hayabusa mission, and solar wind atoms collected during the Genesis mission. To support planetary science research on these samples, NASA's Astromaterials Curation Office hosts the Astromaterials Curation Digital Repository, which provides descriptions of the missions and collections, and critical information about each individual sample. Our office is implementing several informatics initiatives with the goal of better serving the planetary research community. One of these initiatives aims to increase the availability and discoverability of sample data and images through the use of a newly designed common architecture for Astromaterials Curation databases.

  20. ODIN. Online Database Information Network: ODIN Policy & Procedure Manual.

    ERIC Educational Resources Information Center

    Townley, Charles T.; And Others

    Policies and procedures are outlined for the Online Database Information Network (ODIN), a cooperative of libraries in south-central Pennsylvania, which was organized to improve library services through technology. The first section covers organization and goals, members, and responsibilities of the administrative council and libraries. Patrons…

  1. TRENDS: A flight test relational database user's guide and reference manual

    NASA Technical Reports Server (NTRS)

    Bondi, M. J.; Bjorkman, W. S.; Cross, J. L.

    1994-01-01

    This report is designed to be a user's guide and reference manual for users intending to access rotocraft test data via TRENDS, the relational database system which was developed as a tool for the aeronautical engineer with no programming background. This report has been written to assist novice and experienced TRENDS users. TRENDS is a complete system for retrieving, searching, and analyzing both numerical and narrative data, and for displaying time history and statistical data in graphical and numerical formats. This manual provides a 'guided tour' and a 'user's guide' for the new and intermediate-skilled users. Examples for the use of each menu item within TRENDS is provided in the Menu Reference section of the manual, including full coverage for TIMEHIST, one of the key tools. This manual is written around the XV-15 Tilt Rotor database, but does include an appendix on the UH-60 Blackhawk database. This user's guide and reference manual establishes a referrable source for the research community and augments NASA TM-101025, TRENDS: The Aeronautical Post-Test, Database Management System, Jan. 1990, written by the same authors.

  2. Practical guidelines addressing ethical issues pertaining to the curation of human locus-specific variation databases (LSDBs).

    PubMed

    Povey, Sue; Al Aqeel, Aida I; Cambon-Thomsen, Anne; Dalgleish, Raymond; den Dunnen, Johan T; Firth, Helen V; Greenblatt, Marc S; Barash, Carol Isaacson; Parker, Michael; Patrinos, George P; Savige, Judith; Sobrido, Maria-Jesus; Winship, Ingrid; Cotton, Richard G H

    2010-11-01

    More than 1,000 Web-based locus-specific variation databases (LSDBs) are listed on the Website of the Human Genetic Variation Society (HGVS). These individual efforts, which often relate phenotype to genotype, are a valuable source of information for clinicians, patients, and their families, as well as for basic research. The initiators of the Human Variome Project recently recognized that having access to some of the immense resources of unpublished information already present in diagnostic laboratories would provide critical data to help manage genetic disorders. However, there are significant ethical issues involved in sharing these data worldwide. An international working group presents second-generation guidelines addressing ethical issues relating to the curation of human LSDBs that provide information via a Web-based interface. It is intended that these should help current and future curators and may also inform the future decisions of ethics committees and legislators. These guidelines have been reviewed by the Ethics Committee of the Human Genome Organization (HUGO).

  3. R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring.

    PubMed

    Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès

    2016-01-01

    Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the

  4. R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring

    PubMed Central

    Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès

    2016-01-01

    Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the

  5. Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis.

    PubMed

    Blohm, Philipp; Frishman, Goar; Smialowski, Pawel; Goebels, Florian; Wachinger, Benedikt; Ruepp, Andreas; Frishman, Dmitrij

    2014-01-01

    Knowledge about non-interacting proteins (NIPs) is important for training the algorithms to predict protein-protein interactions (PPIs) and for assessing the false positive rates of PPI detection efforts. We present the second version of Negatome, a database of proteins and protein domains that are unlikely to engage in physical interactions (available online at http://mips.helmholtz-muenchen.de/proj/ppi/negatome). Negatome is derived by manual curation of literature and by analyzing three-dimensional structures of protein complexes. The main methodological innovation in Negatome 2.0 is the utilization of an advanced text mining procedure to guide the manual annotation process. Potential non-interactions were identified by a modified version of Excerbt, a text mining tool based on semantic sentence analysis. Manual verification shows that nearly a half of the text mining results with the highest confidence values correspond to NIP pairs. Compared to the first version the contents of the database have grown by over 300%.

  6. Classifying the bacterial gut microbiota of termites and cockroaches: A curated phylogenetic reference database (DictDb).

    PubMed

    Mikaelyan, Aram; Köhler, Tim; Lampert, Niclas; Rohland, Jeffrey; Boga, Hamadi; Meuser, Katja; Brune, Andreas

    2015-10-01

    Recent developments in sequencing technology have given rise to a large number of studies that assess bacterial diversity and community structure in termite and cockroach guts based on large amplicon libraries of 16S rRNA genes. Although these studies have revealed important ecological and evolutionary patterns in the gut microbiota, classification of the short sequence reads is limited by the taxonomic depth and resolution of the reference databases used in the respective studies. Here, we present a curated reference database for accurate taxonomic analysis of the bacterial gut microbiota of dictyopteran insects. The Dictyopteran gut microbiota reference Database (DictDb) is based on the Silva database but was significantly expanded by the addition of clones from 11 mostly unexplored termite and cockroach groups, which increased the inventory of bacterial sequences from dictyopteran guts by 26%. The taxonomic depth and resolution of DictDb was significantly improved by a general revision of the taxonomic guide tree for all important lineages, including a detailed phylogenetic analysis of the Treponema and Alistipes complexes, the Fibrobacteres, and the TG3 phylum. The performance of this first documented version of DictDb (v. 3.0) using the revised taxonomic guide tree in the classification of short-read libraries obtained from termites and cockroaches was highly superior to that of the current Silva and RDP databases. DictDb uses an informative nomenclature that is consistent with the literature also for clades of uncultured bacteria and provides an invaluable tool for anyone exploring the gut community structure of termites and cockroaches.

  7. The Coral Trait Database, a curated database of trait information for coral species from the global oceans.

    PubMed

    Madin, Joshua S; Anderson, Kristen D; Andreasen, Magnus Heide; Bridge, Tom C L; Cairns, Stephen D; Connolly, Sean R; Darling, Emily S; Diaz, Marcela; Falster, Daniel S; Franklin, Erik C; Gates, Ruth D; Hoogenboom, Mia O; Huang, Danwei; Keith, Sally A; Kosnik, Matthew A; Kuo, Chao-Yang; Lough, Janice M; Lovelock, Catherine E; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M; Pochon, Xavier; Pratchett, Morgan S; Putnam, Hollie M; Roberts, T Edward; Stat, Michael; Wallace, Carden C; Widman, Elizabeth; Baird, Andrew H

    2016-01-01

    Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism's function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research. PMID:27023900

  8. The Coral Trait Database, a curated database of trait information for coral species from the global oceans

    PubMed Central

    Madin, Joshua S.; Anderson, Kristen D.; Andreasen, Magnus Heide; Bridge, Tom C.L.; Cairns, Stephen D.; Connolly, Sean R.; Darling, Emily S.; Diaz, Marcela; Falster, Daniel S.; Franklin, Erik C.; Gates, Ruth D.; Hoogenboom, Mia O.; Huang, Danwei; Keith, Sally A.; Kosnik, Matthew A.; Kuo, Chao-Yang; Lough, Janice M.; Lovelock, Catherine E.; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M.; Pochon, Xavier; Pratchett, Morgan S.; Putnam, Hollie M.; Roberts, T. Edward; Stat, Michael; Wallace, Carden C.; Widman, Elizabeth; Baird, Andrew H.

    2016-01-01

    Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism’s function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research. PMID:27023900

  9. The Coral Trait Database, a curated database of trait information for coral species from the global oceans.

    PubMed

    Madin, Joshua S; Anderson, Kristen D; Andreasen, Magnus Heide; Bridge, Tom C L; Cairns, Stephen D; Connolly, Sean R; Darling, Emily S; Diaz, Marcela; Falster, Daniel S; Franklin, Erik C; Gates, Ruth D; Hoogenboom, Mia O; Huang, Danwei; Keith, Sally A; Kosnik, Matthew A; Kuo, Chao-Yang; Lough, Janice M; Lovelock, Catherine E; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M; Pochon, Xavier; Pratchett, Morgan S; Putnam, Hollie M; Roberts, T Edward; Stat, Michael; Wallace, Carden C; Widman, Elizabeth; Baird, Andrew H

    2016-03-29

    Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism's function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research.

  10. The development of an Ada programming support environment database: SEAD (Software Engineering and Ada Database), user's manual

    NASA Technical Reports Server (NTRS)

    Liaw, Morris; Evesson, Donna

    1988-01-01

    This is a manual for users of the Software Engineering and Ada Database (SEAD). SEAD was developed to provide an information resource to NASA and NASA contractors with respect to Ada-based resources and activities that are available or underway either in NASA or elsewhere in the worldwide Ada community. The sharing of such information will reduce the duplication of effort while improving quality in the development of future software systems. The manual describes the organization of the data in SEAD, the user interface from logging in to logging out, and concludes with a ten chapter tutorial on how to use the information in SEAD. Two appendices provide quick reference for logging into SEAD and using the keyboard of an IBM 3270 or VT100 computer terminal.

  11. DREMECELS: A Curated Database for Base Excision and Mismatch Repair Mechanisms Associated Human Malignancies.

    PubMed

    Shukla, Ankita; Moussa, Ahmed; Singh, Tiratha Raj

    2016-01-01

    DNA repair mechanisms act as a warrior combating various damaging processes that ensue critical malignancies. DREMECELS was designed considering the malignancies with frequent alterations in DNA repair pathways, that is, colorectal and endometrial cancers, associated with Lynch syndrome (also known as HNPCC). Since lynch syndrome carries high risk (~40-60%) for both cancers, therefore we decided to cover all three diseases in this portal. Although a large population is presently affected by these malignancies, many resources are available for various cancer types but no database archives information on the genes specifically for only these cancers and disorders. The database contains 156 genes and two repair mechanisms, base excision repair (BER) and mismatch repair (MMR). Other parameters include some of the regulatory processes that have roles in these disease progressions due to incompetent repair mechanisms, specifically BER and MMR. However, our unique database mainly provides qualitative and quantitative information on these cancer types along with methylation, drug sensitivity, miRNAs, copy number variation (CNV) and somatic mutations data. This database would serve the scientific community by providing integrated information on these disease types, thus sustaining diagnostic and therapeutic processes. This repository would serve as an excellent accompaniment for researchers and biomedical professionals and facilitate in understanding such critical diseases. DREMECELS is publicly available at http://www.bioinfoindia.org/dremecels.

  12. dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes

    PubMed Central

    Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P. S.

    2016-01-01

    We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem. PMID:26777304

  13. DREMECELS: A Curated Database for Base Excision and Mismatch Repair Mechanisms Associated Human Malignancies.

    PubMed

    Shukla, Ankita; Moussa, Ahmed; Singh, Tiratha Raj

    2016-01-01

    DNA repair mechanisms act as a warrior combating various damaging processes that ensue critical malignancies. DREMECELS was designed considering the malignancies with frequent alterations in DNA repair pathways, that is, colorectal and endometrial cancers, associated with Lynch syndrome (also known as HNPCC). Since lynch syndrome carries high risk (~40-60%) for both cancers, therefore we decided to cover all three diseases in this portal. Although a large population is presently affected by these malignancies, many resources are available for various cancer types but no database archives information on the genes specifically for only these cancers and disorders. The database contains 156 genes and two repair mechanisms, base excision repair (BER) and mismatch repair (MMR). Other parameters include some of the regulatory processes that have roles in these disease progressions due to incompetent repair mechanisms, specifically BER and MMR. However, our unique database mainly provides qualitative and quantitative information on these cancer types along with methylation, drug sensitivity, miRNAs, copy number variation (CNV) and somatic mutations data. This database would serve the scientific community by providing integrated information on these disease types, thus sustaining diagnostic and therapeutic processes. This repository would serve as an excellent accompaniment for researchers and biomedical professionals and facilitate in understanding such critical diseases. DREMECELS is publicly available at http://www.bioinfoindia.org/dremecels. PMID:27276067

  14. DREMECELS: A Curated Database for Base Excision and Mismatch Repair Mechanisms Associated Human Malignancies

    PubMed Central

    Shukla, Ankita; Singh, Tiratha Raj

    2016-01-01

    DNA repair mechanisms act as a warrior combating various damaging processes that ensue critical malignancies. DREMECELS was designed considering the malignancies with frequent alterations in DNA repair pathways, that is, colorectal and endometrial cancers, associated with Lynch syndrome (also known as HNPCC). Since lynch syndrome carries high risk (~40–60%) for both cancers, therefore we decided to cover all three diseases in this portal. Although a large population is presently affected by these malignancies, many resources are available for various cancer types but no database archives information on the genes specifically for only these cancers and disorders. The database contains 156 genes and two repair mechanisms, base excision repair (BER) and mismatch repair (MMR). Other parameters include some of the regulatory processes that have roles in these disease progressions due to incompetent repair mechanisms, specifically BER and MMR. However, our unique database mainly provides qualitative and quantitative information on these cancer types along with methylation, drug sensitivity, miRNAs, copy number variation (CNV) and somatic mutations data. This database would serve the scientific community by providing integrated information on these disease types, thus sustaining diagnostic and therapeutic processes. This repository would serve as an excellent accompaniment for researchers and biomedical professionals and facilitate in understanding such critical diseases. DREMECELS is publicly available at http://www.bioinfoindia.org/dremecels. PMID:27276067

  15. BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models

    PubMed Central

    2010-01-01

    Background Quantitative models of biochemical and cellular systems are used to answer a variety of questions in the biological sciences. The number of published quantitative models is growing steadily thanks to increasing interest in the use of models as well as the development of improved software systems and the availability of better, cheaper computer hardware. To maximise the benefits of this growing body of models, the field needs centralised model repositories that will encourage, facilitate and promote model dissemination and reuse. Ideally, the models stored in these repositories should be extensively tested and encoded in community-supported and standardised formats. In addition, the models and their components should be cross-referenced with other resources in order to allow their unambiguous identification. Description BioModels Database http://www.ebi.ac.uk/biomodels/ is aimed at addressing exactly these needs. It is a freely-accessible online resource for storing, viewing, retrieving, and analysing published, peer-reviewed quantitative models of biochemical and cellular systems. The structure and behaviour of each simulation model distributed by BioModels Database are thoroughly checked; in addition, model elements are annotated with terms from controlled vocabularies as well as linked to relevant data resources. Models can be examined online or downloaded in various formats. Reaction network diagrams generated from the models are also available in several formats. BioModels Database also provides features such as online simulation and the extraction of components from large scale models into smaller submodels. Finally, the system provides a range of web services that external software systems can use to access up-to-date data from the database. Conclusions BioModels Database has become a recognised reference resource for systems biology. It is being used by the community in a variety of ways; for example, it is used to benchmark different simulation

  16. Assessment System for Aircraft Noise (ASAN) citation database. Volume 1: User's manual

    NASA Astrophysics Data System (ADS)

    Reddingius, Nicolaas

    1989-12-01

    The Assessment System for Aircraft Noise (ASAN) includes a database of several thousand references to the literature on the impact of noise and sonic booms on humans, animals and structures. Bibliographic data, abstracts and critical reviews of key documents can be retrieved. A user's manual for the retrievable module is presented. It describes the types of searches that can be conducted, the options for each and explains all user screens. Installation instructions for the stand-along MS-DOS version are included.

  17. aglgenes, A curated and searchable database of archaeal N-glycosylation pathway components.

    PubMed

    Godin, Noa; Eichler, Jerry

    2014-01-01

    Whereas N-glycosylation is a posttranslational modification performed across evolution, the archaeal version of this protein-processing event presents a degree of diversity not seen in either bacteria or eukarya. Accordingly, archaeal N-glycosylation relies on a large number of enzymes that are often species-specific or restricted to a select group of species. As such, there is a need for an organized platform upon which amassing information about archaeal glycosylation (agl) genes can rest. Accordingly, the aglgenes database provides detailed descriptions of experimentally characterized archaeal N-glycosyation pathway components. For each agl gene, genomic information, supporting literature and relevant external links are provided at a functional intuitive web-interface designed for data browsing. Routine updates ensure that novel experimental information on genes and proteins contributing to archaeal N-glycosylation is incorporated into aglgenes in a timely manner. As such, aglgenes represents a specialized resource for sharing validated experimental information online, providing support for workers in the field of archaeal protein glycosylation. Database URL: www.bgu.ac.il/aglgenes.

  18. Sno/scaRNAbase: a curated database for small nucleolar RNAs and cajal body-specific RNAs.

    PubMed

    Xie, Jun; Zhang, Ming; Zhou, Tao; Hua, Xia; Tang, Lisha; Wu, Weilin

    2007-01-01

    Small nucleolar RNAs (snoRNAs) and Cajal body-specific RNAs (scaRNAs) are named for their subcellular localization within nucleoli and Cajal bodies (conserved subnuclear organelles present in the nucleoplasm), respectively. They have been found to play important roles in rRNA, tRNA, snRNAs, and even mRNA modification and processing. All snoRNAs fall in two categories, box C/D snoRNAs and box H/ACA snoRNAs, according to their distinct sequence and secondary structure features. Box C/D snoRNAs and box H/ACA snoRNAs mainly function in guiding 2'-O-ribose methylation and pseudouridilation, respectively. ScaRNAs possess both box C/D snoRNA and box H/ACA snoRNA sequence motif features, but guide snRNA modifications that are transcribed by RNA polymerase II. Here we present a Web-based sno/scaRNA database, called sno/scaRNAbase, to facilitate the sno/scaRNA research in terms of providing a more comprehensive knowledge base. Covering 1979 records derived from 85 organisms for the first time, sno/scaRNAbase is not only dedicated to filling gaps between existing organism-specific sno/scaRNA databases that are focused on different sno/scaRNA aspects, but also provides sno/scaRNA scientists with an opportunity to adopt a unified nomenclature for sno/scaRNAs. Derived from a systematic literature curation and annotation effort, the sno/scaRNAbase provides an easy-to-use gateway to important sno/scaRNA features such as sequence motifs, possible functions, homologues, secondary structures, genomics organization, sno/scaRNA gene's chromosome location, and more. Approximate searches, in addition to accurate and straightforward searches, make the database search more flexible. A BLAST search engine is implemented to enable blast of query sequences against all sno/scaRNAbase sequences. Thus our sno/scaRNAbase serves as a more uniform and friendly platform for sno/scaRNA research. The database is free available at http://gene.fudan.sh.cn/snoRNAbase.nsf. PMID:17099227

  19. Development of an Ada programming support environment database SEAD (Software Engineering and Ada Database) administration manual

    NASA Technical Reports Server (NTRS)

    Liaw, Morris; Evesson, Donna

    1988-01-01

    Software Engineering and Ada Database (SEAD) was developed to provide an information resource to NASA and NASA contractors with respect to Ada-based resources and activities which are available or underway either in NASA or elsewhere in the worldwide Ada community. The sharing of such information will reduce duplication of effort while improving quality in the development of future software systems. SEAD data is organized into five major areas: information regarding education and training resources which are relevant to the life cycle of Ada-based software engineering projects such as those in the Space Station program; research publications relevant to NASA projects such as the Space Station Program and conferences relating to Ada technology; the latest progress reports on Ada projects completed or in progress both within NASA and throughout the free world; Ada compilers and other commercial products that support Ada software development; and reusable Ada components generated both within NASA and from elsewhere in the free world. This classified listing of reusable components shall include descriptions of tools, libraries, and other components of interest to NASA. Sources for the data include technical newletters and periodicals, conference proceedings, the Ada Information Clearinghouse, product vendors, and project sponsors and contractors.

  20. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy.

    PubMed

    Guillou, Laure; Bachar, Dipankar; Audic, Stéphane; Bass, David; Berney, Cédric; Bittner, Lucie; Boutte, Christophe; Burgaud, Gaétan; de Vargas, Colomban; Decelle, Johan; Del Campo, Javier; Dolan, John R; Dunthorn, Micah; Edvardsen, Bente; Holzmann, Maria; Kooistra, Wiebe H C F; Lara, Enrique; Le Bescot, Noan; Logares, Ramiro; Mahé, Frédéric; Massana, Ramon; Montresor, Marina; Morard, Raphael; Not, Fabrice; Pawlowski, Jan; Probert, Ian; Sauvadet, Anne-Laure; Siano, Raffaele; Stoeck, Thorsten; Vaulot, Daniel; Zimmermann, Pascal; Christen, Richard

    2013-01-01

    The interrogation of genetic markers in environmental meta-barcoding studies is currently seriously hindered by the lack of taxonomically curated reference data sets for the targeted genes. The Protist Ribosomal Reference database (PR(2), http://ssu-rrna.org/) provides a unique access to eukaryotic small sub-unit (SSU) ribosomal RNA and DNA sequences, with curated taxonomy. The database mainly consists of nuclear-encoded protistan sequences. However, metazoans, land plants, macrosporic fungi and eukaryotic organelles (mitochondrion, plastid and others) are also included because they are useful for the analysis of high-troughput sequencing data sets. Introns and putative chimeric sequences have been also carefully checked. Taxonomic assignation of sequences consists of eight unique taxonomic fields. In total, 136 866 sequences are nuclear encoded, 45 708 (36 501 mitochondrial and 9657 chloroplastic) are from organelles, the remaining being putative chimeric sequences. The website allows the users to download sequences from the entire and partial databases (including representative sequences after clustering at a given level of similarity). Different web tools also allow searches by sequence similarity. The presence of both rRNA and rDNA sequences, taking into account introns (crucial for eukaryotic sequences), a normalized eight terms ranked-taxonomy and updates of new GenBank releases were made possible by a long-term collaboration between experts in taxonomy and computer scientists.

  1. User's manual (UM) for the enhanced logistics intratheater support tool (ELIST) database utility segment version 8.1.0.0 for solaris 7.

    SciTech Connect

    Dritz, K.

    2002-03-06

    This document is the User's Manual (UM) for the Enhanced Logistics Intratheater Support Tool (ELIST) Database Utility Segment. It tells how to use its features to administer ELIST database user accounts.

  2. Building an efficient curation workflow for the Arabidopsis literature corpus.

    PubMed

    Li, Donghui; Berardini, Tanya Z; Muller, Robert J; Huala, Eva

    2012-01-01

    TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org PMID:23221298

  3. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

    PubMed

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  4. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

    PubMed

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  5. Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation.

    PubMed

    Famiglietti, Maria Livia; Estreicher, Anne; Gos, Arnaud; Bolleman, Jerven; Géhant, Sébastien; Breuza, Lionel; Bridge, Alan; Poux, Sylvain; Redaschi, Nicole; Bougueleret, Lydie; Xenarios, Ioannis

    2014-08-01

    During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.

  6. Making the procedure manual come alive: A prototype relational database and dynamic website model for the management of nursing information.

    PubMed

    Peace, Jane; Brennan, Patricia Flatley

    2006-01-01

    The nursing procedural manual is an essential resource for clinical practice, yet insuring its currency and availability at the point of care remains an unresolved information management challenge for nurses. While standard HTML-based web pages offer significant advantage over paper compilations, employing emerging computer science tools offers even greater promise. This paper reports on the creation of a prototypical dynamic web-based nursing procedure manual driven by a relational database. We created a relational database in MySQL to manage, store, and link the procedure information, and developed PHP files to guide content retrieval, content management, and display on demand in browser-viewable format. This database driven dynamic website model is an important innovation to meet the challenge of content management and dissemination of nursing information.

  7. The NCBI Taxonomy database.

    PubMed

    Federhen, Scott

    2012-01-01

    The NCBI Taxonomy database (http://www.ncbi.nlm.nih.gov/taxonomy) is the standard nomenclature and classification repository for the International Nucleotide Sequence Database Collaboration (INSDC), comprising the GenBank, ENA (EMBL) and DDBJ databases. It includes organism names and taxonomic lineages for each of the sequences represented in the INSDC's nucleotide and protein sequence databases. The taxonomy database is manually curated by a small group of scientists at the NCBI who use the current taxonomic literature to maintain a phylogenetic taxonomy for the source organisms represented in the sequence databases. The taxonomy database is a central organizing hub for many of the resources at the NCBI, and provides a means for clustering elements within other domains of NCBI web site, for internal linking between domains of the Entrez system and for linking out to taxon-specific external resources on the web. Our primary purpose is to index the domain of sequences as conveniently as possible for our user community.

  8. PFR²: a curated database of planktonic foraminifera 18S ribosomal DNA as a resource for studies of plankton ecology, biogeography and evolution.

    PubMed

    Morard, Raphaël; Darling, Kate F; Mahé, Frédéric; Audic, Stéphane; Ujiié, Yurika; Weiner, Agnes K M; André, Aurore; Seears, Heidi A; Wade, Christopher M; Quillévéré, Frédéric; Douady, Christophe J; Escarguel, Gilles; de Garidel-Thoron, Thibault; Siccha, Michael; Kucera, Michal; de Vargas, Colomban

    2015-11-01

    Planktonic foraminifera (Rhizaria) are ubiquitous marine pelagic protists producing calcareous shells with conspicuous morphology. They play an important role in the marine carbon cycle, and their exceptional fossil record serves as the basis for biochronostratigraphy and past climate reconstructions. A major worldwide sampling effort over the last two decades has resulted in the establishment of multiple large collections of cryopreserved individual planktonic foraminifera samples. Thousands of 18S rDNA partial sequences have been generated, representing all major known morphological taxa across their worldwide oceanic range. This comprehensive data coverage provides an opportunity to assess patterns of molecular ecology and evolution in a holistic way for an entire group of planktonic protists. We combined all available published and unpublished genetic data to build PFR(2), the Planktonic foraminifera Ribosomal Reference database. The first version of the database includes 3322 reference 18S rDNA sequences belonging to 32 of the 47 known morphospecies of extant planktonic foraminifera, collected from 460 oceanic stations. All sequences have been rigorously taxonomically curated using a six-rank annotation system fully resolved to the morphological species level and linked to a series of metadata. The PFR(2) website, available at http://pfr2.sb-roscoff.fr, allows downloading the entire database or specific sections, as well as the identification of new planktonic foraminiferal sequences. Its novel, fully documented curation process integrates advances in morphological and molecular taxonomy. It allows for an increase in its taxonomic resolution and assures that integrity is maintained by including a complete contingency tracking of annotations and assuring that the annotations remain internally consistent. PMID:25828689

  9. PFR²: a curated database of planktonic foraminifera 18S ribosomal DNA as a resource for studies of plankton ecology, biogeography and evolution.

    PubMed

    Morard, Raphaël; Darling, Kate F; Mahé, Frédéric; Audic, Stéphane; Ujiié, Yurika; Weiner, Agnes K M; André, Aurore; Seears, Heidi A; Wade, Christopher M; Quillévéré, Frédéric; Douady, Christophe J; Escarguel, Gilles; de Garidel-Thoron, Thibault; Siccha, Michael; Kucera, Michal; de Vargas, Colomban

    2015-11-01

    Planktonic foraminifera (Rhizaria) are ubiquitous marine pelagic protists producing calcareous shells with conspicuous morphology. They play an important role in the marine carbon cycle, and their exceptional fossil record serves as the basis for biochronostratigraphy and past climate reconstructions. A major worldwide sampling effort over the last two decades has resulted in the establishment of multiple large collections of cryopreserved individual planktonic foraminifera samples. Thousands of 18S rDNA partial sequences have been generated, representing all major known morphological taxa across their worldwide oceanic range. This comprehensive data coverage provides an opportunity to assess patterns of molecular ecology and evolution in a holistic way for an entire group of planktonic protists. We combined all available published and unpublished genetic data to build PFR(2), the Planktonic foraminifera Ribosomal Reference database. The first version of the database includes 3322 reference 18S rDNA sequences belonging to 32 of the 47 known morphospecies of extant planktonic foraminifera, collected from 460 oceanic stations. All sequences have been rigorously taxonomically curated using a six-rank annotation system fully resolved to the morphological species level and linked to a series of metadata. The PFR(2) website, available at http://pfr2.sb-roscoff.fr, allows downloading the entire database or specific sections, as well as the identification of new planktonic foraminiferal sequences. Its novel, fully documented curation process integrates advances in morphological and molecular taxonomy. It allows for an increase in its taxonomic resolution and assures that integrity is maintained by including a complete contingency tracking of annotations and assuring that the annotations remain internally consistent.

  10. Scaling drug indication curation through crowdsourcing

    PubMed Central

    Khare, Ritu; Burger, John D.; Aberdeen, John S.; Tresner-Kirsch, David W.; Corrales, Theodore J.; Hirchman, Lynette; Lu, Zhiyong

    2015-01-01

    Motivated by the high cost of human curation of biological databases, there is an increasing interest in using computational approaches to assist human curators and accelerate the manual curation process. Towards the goal of cataloging drug indications from FDA drug labels, we recently developed LabeledIn, a human-curated drug indication resource for 250 clinical drugs. Its development required over 40 h of human effort across 20 weeks, despite using well-defined annotation guidelines. In this study, we aim to investigate the feasibility of scaling drug indication annotation through a crowdsourcing technique where an unknown network of workers can be recruited through the technical environment of Amazon Mechanical Turk (MTurk). To translate the expert-curation task of cataloging indications into human intelligence tasks (HITs) suitable for the average workers on MTurk, we first simplify the complex task such that each HIT only involves a worker making a binary judgment of whether a highlighted disease, in context of a given drug label, is an indication. In addition, this study is novel in the crowdsourcing interface design where the annotation guidelines are encoded into user options. For evaluation, we assess the ability of our proposed method to achieve high-quality annotations in a time-efficient and cost-effective manner. We posted over 3000 HITs drawn from 706 drug labels on MTurk. Within 8 h of posting, we collected 18 775 judgments from 74 workers, and achieved an aggregated accuracy of 96% on 450 control HITs (where gold-standard answers are known), at a cost of $1.75 per drug label. On the basis of these results, we conclude that our crowdsourcing approach not only results in significant cost and time saving, but also leads to accuracy comparable to that of domain experts. Database URL: ftp://ftp.ncbi.nlm.nih.gov/pub/lu/LabeledIn/Crowdsourcing/. PMID:25797061

  11. The needs for chemistry standards, database tools and data curation at the chemical-biology interface (SLAS meeting)

    EPA Science Inventory

    This presentation will highlight known challenges with the production of high quality chemical databases and outline recent efforts made to address these challenges. Specific examples will be provided illustrating these challenges within the U.S. Environmental Protection Agency ...

  12. PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy.

    PubMed

    Decelle, Johan; Romac, Sarah; Stern, Rowena F; Bendif, El Mahdi; Zingone, Adriana; Audic, Stéphane; Guiry, Michael D; Guillou, Laure; Tessier, Désiré; Le Gall, Florence; Gourvil, Priscillia; Dos Santos, Adriana L; Probert, Ian; Vaulot, Daniel; de Vargas, Colomban; Christen, Richard

    2015-11-01

    Photosynthetic eukaryotes have a critical role as the main producers in most ecosystems of the biosphere. The ongoing environmental metabarcoding revolution opens the perspective for holistic ecosystems biological studies of these organisms, in particular the unicellular microalgae that often lack distinctive morphological characters and have complex life cycles. To interpret environmental sequences, metabarcoding necessarily relies on taxonomically curated databases containing reference sequences of the targeted gene (or barcode) from identified organisms. To date, no such reference framework exists for photosynthetic eukaryotes. In this study, we built the PhytoREF database that contains 6490 plastidial 16S rDNA reference sequences that originate from a large diversity of eukaryotes representing all known major photosynthetic lineages. We compiled 3333 amplicon sequences available from public databases and 879 sequences extracted from plastidial genomes, and generated 411 novel sequences from cultured marine microalgal strains belonging to different eukaryotic lineages. A total of 1867 environmental Sanger 16S rDNA sequences were also included in the database. Stringent quality filtering and a phylogeny-based taxonomic classification were applied for each 16S rDNA sequence. The database mainly focuses on marine microalgae, but sequences from land plants (representing half of the PhytoREF sequences) and freshwater taxa were also included to broaden the applicability of PhytoREF to different aquatic and terrestrial habitats. PhytoREF, accessible via a web interface (http://phytoref.fr), is a new resource in molecular ecology to foster the discovery, assessment and monitoring of the diversity of photosynthetic eukaryotes using high-throughput sequencing.

  13. Tox-Database.net: a curated resource for data describing chemical triggered in vitro cardiac ion channels inhibition

    PubMed Central

    2012-01-01

    Background Drugs safety issues are now recognized as being factors generating the most reasons for drug withdrawals at various levels of development and at the post-approval stage. Among them cardiotoxicity remains the main reason, despite the substantial effort put into in vitro and in vivo testing, with the main focus put on hERG channel inhibition as the hypothesized surrogate of drug proarrhythmic potency. The large interest in the IKr current has resulted in the development of predictive tools and informative databases describing a drug's susceptibility to interactions with the hERG channel, although there are no similar, publicly available sets of data describing other ionic currents driven by the human cardiomyocyte ionic channels, which are recognized as an overlooked drug safety target. Discussion The aim of this database development and publication was to provide a scientifically useful, easily usable and clearly verifiable set of information describing not only IKr (hERG), but also other human cardiomyocyte specific ionic channels inhibition data (IKs, INa, ICa). Summary The broad range of data (chemical space and in vitro settings) and the easy to use user interface makes tox-database.net a useful tool for interested scientists. Database URL http://tox-database.net. PMID:22947121

  14. 76 FR 30997 - National Transit Database: Amendments to Urbanized Area Annual Reporting Manual

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-27

    ... Federal Register (73 FR 7361) inviting comments on proposed amendments to the 2011 Annual Manual. This... Federal Register (75 FR 192) inviting comments on proposed amendments to the 2011 Annual Manual. FTA... vehicles or level-platform boarding, and separate branding of the service. High-frequency service...

  15. miRGate: a curated database of human, mouse and rat miRNA-mRNA targets.

    PubMed

    Andrés-León, Eduardo; González Peña, Daniel; Gómez-López, Gonzalo; Pisano, David G

    2015-01-01

    MicroRNAs (miRNAs) are small non-coding elements involved in the post-transcriptional down-regulation of gene expression through base pairing with messenger RNAs (mRNAs). Through this mechanism, several miRNA-mRNA pairs have been described as critical in the regulation of multiple cellular processes, including early embryonic development and pathological conditions. Many of these pairs (such as miR-15 b/BCL2 in apoptosis or BART-6/BCL6 in diffuse large B-cell lymphomas) were experimentally discovered and/or computationally predicted. Available tools for target prediction are usually based on sequence matching, thermodynamics and conservation, among other approaches. Nevertheless, the main issue on miRNA-mRNA pair prediction is the little overlapping results among different prediction methods, or even with experimentally validated pairs lists, despite the fact that all rely on similar principles. To circumvent this problem, we have developed miRGate, a database containing novel computational predicted miRNA-mRNA pairs that are calculated using well-established algorithms. In addition, it includes an updated and complete dataset of sequences for both miRNA and mRNAs 3'-Untranslated region from human (including human viruses), mouse and rat, as well as experimentally validated data from four well-known databases. The underlying methodology of miRGate has been successfully applied to independent datasets providing predictions that were convincingly validated by functional assays. miRGate is an open resource available at http://mirgate.bioinfo.cnio.es. For programmatic access, we have provided a representational state transfer web service application programming interface that allows accessing the database at http://mirgate.bioinfo.cnio.es/API/ Database URL: http://mirgate.bioinfo.cnio.es

  16. miRGate: a curated database of human, mouse and rat miRNA–mRNA targets

    PubMed Central

    Andrés-León, Eduardo; González Peña, Daniel; Gómez-López, Gonzalo; Pisano, David G.

    2015-01-01

    MicroRNAs (miRNAs) are small non-coding elements involved in the post-transcriptional down-regulation of gene expression through base pairing with messenger RNAs (mRNAs). Through this mechanism, several miRNA–mRNA pairs have been described as critical in the regulation of multiple cellular processes, including early embryonic development and pathological conditions. Many of these pairs (such as miR-15 b/BCL2 in apoptosis or BART-6/BCL6 in diffuse large B-cell lymphomas) were experimentally discovered and/or computationally predicted. Available tools for target prediction are usually based on sequence matching, thermodynamics and conservation, among other approaches. Nevertheless, the main issue on miRNA–mRNA pair prediction is the little overlapping results among different prediction methods, or even with experimentally validated pairs lists, despite the fact that all rely on similar principles. To circumvent this problem, we have developed miRGate, a database containing novel computational predicted miRNA–mRNA pairs that are calculated using well-established algorithms. In addition, it includes an updated and complete dataset of sequences for both miRNA and mRNAs 3′-Untranslated region from human (including human viruses), mouse and rat, as well as experimentally validated data from four well-known databases. The underlying methodology of miRGate has been successfully applied to independent datasets providing predictions that were convincingly validated by functional assays. miRGate is an open resource available at http://mirgate.bioinfo.cnio.es. For programmatic access, we have provided a representational state transfer web service application programming interface that allows accessing the database at http://mirgate.bioinfo.cnio.es/API/ Database URL: http://mirgate.bioinfo.cnio.es PMID:25858286

  17. AT_CHLORO, a Comprehensive Chloroplast Proteome Database with Subplastidial Localization and Curated Information on Envelope Proteins*

    PubMed Central

    Ferro, Myriam; Brugière, Sabine; Salvi, Daniel; Seigneurin-Berny, Daphné; Court, Magali; Moyet, Lucas; Ramus, Claire; Miras, Stéphane; Mellal, Mourad; Le Gall, Sophie; Kieffer-Jaquinod, Sylvie; Bruley, Christophe; Garin, Jérôme; Joyard, Jacques; Masselon, Christophe; Rolland, Norbert

    2010-01-01

    Recent advances in the proteomics field have allowed a series of high throughput experiments to be conducted on chloroplast samples, and the data are available in several public databases. However, the accurate localization of many chloroplast proteins often remains hypothetical. This is especially true for envelope proteins. We went a step further into the knowledge of the chloroplast proteome by focusing, in the same set of experiments, on the localization of proteins in the stroma, the thylakoids, and envelope membranes. LC-MS/MS-based analyses first allowed building the AT_CHLORO database (http://www.grenoble.prabi.fr/protehome/grenoble-plant-proteomics/), a comprehensive repertoire of the 1323 proteins, identified by 10,654 unique peptide sequences, present in highly purified chloroplasts and their subfractions prepared from Arabidopsis thaliana leaves. This database also provides extensive proteomics information (peptide sequences and molecular weight, chromatographic retention times, MS/MS spectra, and spectral count) for a unique chloroplast protein accurate mass and time tag database gathering identified peptides with their respective and precise analytical coordinates, molecular weight, and retention time. We assessed the partitioning of each protein in the three chloroplast compartments by using a semiquantitative proteomics approach (spectral count). These data together with an in-depth investigation of the literature were compiled to provide accurate subplastidial localization of previously known and newly identified proteins. A unique knowledge base containing extensive information on the proteins identified in envelope fractions was thus obtained, allowing new insights into this membrane system to be revealed. Altogether, the data we obtained provide unexpected information about plastidial or subplastidial localization of some proteins that were not suspected to be associated to this membrane system. The spectral counting-based strategy was further

  18. Scaling drug indication curation through crowdsourcing.

    PubMed

    Khare, Ritu; Burger, John D; Aberdeen, John S; Tresner-Kirsch, David W; Corrales, Theodore J; Hirchman, Lynette; Lu, Zhiyong

    2015-01-01

    Motivated by the high cost of human curation of biological databases, there is an increasing interest in using computational approaches to assist human curators and accelerate the manual curation process. Towards the goal of cataloging drug indications from FDA drug labels, we recently developed LabeledIn, a human-curated drug indication resource for 250 clinical drugs. Its development required over 40 h of human effort across 20 weeks, despite using well-defined annotation guidelines. In this study, we aim to investigate the feasibility of scaling drug indication annotation through a crowdsourcing technique where an unknown network of workers can be recruited through the technical environment of Amazon Mechanical Turk (MTurk). To translate the expert-curation task of cataloging indications into human intelligence tasks (HITs) suitable for the average workers on MTurk, we first simplify the complex task such that each HIT only involves a worker making a binary judgment of whether a highlighted disease, in context of a given drug label, is an indication. In addition, this study is novel in the crowdsourcing interface design where the annotation guidelines are encoded into user options. For evaluation, we assess the ability of our proposed method to achieve high-quality annotations in a time-efficient and cost-effective manner. We posted over 3000 HITs drawn from 706 drug labels on MTurk. Within 8 h of posting, we collected 18 775 judgments from 74 workers, and achieved an aggregated accuracy of 96% on 450 control HITs (where gold-standard answers are known), at a cost of $1.75 per drug label. On the basis of these results, we conclude that our crowdsourcing approach not only results in significant cost and time saving, but also leads to accuracy comparable to that of domain experts. PMID:25797061

  19. Data Curation

    ERIC Educational Resources Information Center

    Mallon, Melissa, Ed.

    2012-01-01

    In their Top Trends of 2012, the Association of College and Research Libraries (ACRL) named data curation as one of the issues to watch in academic libraries in the near future (ACRL, 2012, p. 312). Data curation can be summarized as "the active and ongoing management of data through its life cycle of interest and usefulness to scholarship,…

  20. Solid Waste Projection Model: Database (Version 1.4). Technical reference manual

    SciTech Connect

    Blackburn, C.; Cillan, T.

    1993-09-01

    The Solid Waste Projection Model (SWPM) system is an analytical tool developed by Pacific Northwest Laboratory (PNL) for Westinghouse Hanford Company (WHC). The SWPM system provides a modeling and analysis environment that supports decisions in the process of evaluating various solid waste management alternatives. This document, one of a series describing the SWPM system, contains detailed information regarding the software and data structures utilized in developing the SWPM Version 1.4 Database. This document is intended for use by experienced database specialists and supports database maintenance, utility development, and database enhancement. Those interested in using the SWPM database should refer to the SWPM Database User`s Guide. This document is available from the PNL Task M Project Manager (D. L. Stiles, 509-372-4358), the PNL Task L Project Manager (L. L. Armacost, 509-372-4304), the WHC Restoration Projects Section Manager (509-372-1443), or the WHC Waste Characterization Manager (509-372-1193).

  1. Solid Waste Projection Model: Database (Version 1.3). Technical reference manual

    SciTech Connect

    Blackburn, C.L.

    1991-11-01

    The Solid Waste Projection Model (SWPM) system is an analytical tool developed by Pacific Northwest Laboratory (PNL) for Westinghouse Hanford Company (WHC). The SWPM system provides a modeling and analysis environment that supports decisions in the process of evaluating various solid waste management alternatives. This document, one of a series describing the SWPM system, contains detailed information regarding the software and data structures utilized in developing the SWPM Version 1.3 Database. This document is intended for use by experienced database specialists and supports database maintenance, utility development, and database enhancement.

  2. A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework.

    PubMed

    Bandrowski, A E; Cachat, J; Li, Y; Müller, H M; Sternberg, P W; Ciccarese, P; Clark, T; Marenco, L; Wang, R; Astakhov, V; Grethe, J S; Martone, M E

    2012-01-01

    The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is 'hidden' from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to

  3. How much does curation cost?

    PubMed Central

    2016-01-01

    NIH administrators have recently expressed concerns about the cost of curation for biological databases. However, they did not articulate the exact costs of curation. Here we calculate the cost of biocuration of articles for the EcoCyc database as $219 per article over a 5-year period. That cost is 6–15% of the cost of open-access publication fees for publishing biomedical articles, and we estimate that cost is 0.088% of the cost of the overall research project that generated the experimental results. Thus, curation costs are small in an absolute sense, and represent a miniscule fraction of the cost of the research. PMID:27504008

  4. How much does curation cost?

    PubMed

    Karp, Peter D

    2016-01-01

    NIH administrators have recently expressed concerns about the cost of curation for biological databases. However, they did not articulate the exact costs of curation. Here we calculate the cost of biocuration of articles for the EcoCyc database as $219 per article over a 5-year period. That cost is 6-15% of the cost of open-access publication fees for publishing biomedical articles, and we estimate that cost is 0.088% of the cost of the overall research project that generated the experimental results. Thus, curation costs are small in an absolute sense, and represent a miniscule fraction of the cost of the research.

  5. Egas: a collaborative and interactive document curation platform

    PubMed Central

    Campos, David; Lourenço, Jóni; Matos, Sérgio; Oliveira, José Luís

    2014-01-01

    With the overwhelming amount of biomedical textual information being produced, several manual curation efforts have been set up to extract and store concepts and their relationships into structured resources. As manual annotation is a demanding and expensive task, computerized solutions were developed to perform such tasks automatically. However, high-end information extraction techniques are still not widely used by biomedical research communities, mainly because of the lack of standards and limitations in usability. Interactive annotation tools intend to fill this gap, taking advantage of automatic techniques and existing knowledge bases to assist expert curators in their daily tasks. This article presents Egas, a web-based platform for biomedical text mining and assisted curation with highly usable interfaces for manual and automatic in-line annotation of concepts and relations. A comprehensive set of de facto standard knowledge bases are integrated and indexed to provide straightforward concept normalization features. Real-time collaboration and conversation functionalities allow discussing details of the annotation task as well as providing instant feedback of curator’s interactions. Egas also provides interfaces for on-demand management of the annotation task settings and guidelines, and supports standard formats and literature services to import and export documents. By taking advantage of Egas, we participated in the BioCreative IV interactive annotation task, targeting the assisted identification of protein–protein interactions described in PubMed abstracts related to neuropathological disorders. When evaluated by expert curators, it obtained positive scores in terms of usability, reliability and performance. These results, together with the provided innovative features, place Egas as a state-of-the-art solution for fast and accurate curation of information, facilitating the task of creating and updating knowledge bases and annotated resources

  6. GSOSTATS Database: USAF Synchronous Satellite Catalog Data Conversion Software. User's Guide and Software Maintenance Manual, Version 2.1

    NASA Technical Reports Server (NTRS)

    Mallasch, Paul G.; Babic, Slavoljub

    1994-01-01

    The United States Air Force (USAF) provides NASA Lewis Research Center with monthly reports containing the Synchronous Satellite Catalog and the associated Two Line Mean Element Sets. The USAF Synchronous Satellite Catalog supplies satellite orbital parameters collected by an automated monitoring system and provided to Lewis Research Center as text files on magnetic tape. Software was developed to facilitate automated formatting, data normalization, cross-referencing, and error correction of Synchronous Satellite Catalog files before loading into the NASA Geosynchronous Satellite Orbital Statistics Database System (GSOSTATS). This document contains the User's Guide and Software Maintenance Manual with information necessary for installation, initialization, start-up, operation, error recovery, and termination of the software application. It also contains implementation details, modification aids, and software source code adaptations for use in future revisions.

  7. Argo: an integrative, interactive, text mining-based workbench supporting curation.

    PubMed

    Rak, Rafal; Rowley, Andrew; Black, William; Ananiadou, Sophia

    2012-01-01

    Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the curation pipeline, catering for a variety of tasks, types of information and applications. Processing components usually come from different sources and often lack interoperability. The well established Unstructured Information Management Architecture is a framework that addresses interoperability by defining common data structures and interfaces. However, most of the efforts are targeted towards software developers and are not suitable for curators, or are otherwise inconvenient to use on a higher level of abstraction. To overcome these issues we introduce Argo, an interoperable, integrative, interactive and collaborative system for text analysis with a convenient graphic user interface to ease the development of processing workflows and boost productivity in labour-intensive manual curation. Robust, scalable text analytics follow a modular approach, adopting component modules for distinct levels of text analysis. The user interface is available entirely through a web browser that saves the user from going through often complicated and platform-dependent installation procedures. Argo comes with a predefined set of processing components commonly used in text analysis, while giving the users the ability to deposit their own components. The system accommodates various areas and levels of user expertise, from TM and computational linguistics to ontology-based curation. One of the key functionalities of Argo is its ability to seamlessly incorporate user-interactive components, such as manual annotation editors, into otherwise completely automatic pipelines. As a use case, we demonstrate the functionality of an in

  8. Tracking and coordinating an international curation effort for the CCDS Project

    PubMed Central

    Harte, Rachel A.; Farrell, Catherine M.; Loveland, Jane E.; Suner, Marie-Marthe; Wilming, Laurens; Aken, Bronwen; Barrell, Daniel; Frankish, Adam; Wallin, Craig; Searle, Steve; Diekhans, Mark; Harrow, Jennifer; Pruitt, Kim D.

    2012-01-01

    The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a ‘gold standard’ definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines. Database URL: http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi PMID:22434842

  9. Human Events Reference for ATHEANA (HERA) Database Description and Preliminary User's Manual

    SciTech Connect

    Auflick, J.L.

    1999-08-12

    The Technique for Human Error Analysis (ATHEANA) is a newly developed human reliability analysis (HRA) methodology that aims to facilitate better representation and integration of human performance into probabilistic risk assessment (PRA) modeling and quantification by analyzing risk-significant operating experience in the context of existing behavioral science models. The fundamental premise of ATHEANA is that error forcing contexts (EFCs), which refer to combinations of equipment/material conditions and performance shaping factors (PSFs), set up or create the conditions under which unsafe actions (UAs) can occur. Because ATHEANA relies heavily on the analysis of operational events that have already occurred as a mechanism for generating creative thinking about possible EFCs, a database (db) of analytical operational events, called the Human Events Reference for ATHEANA (HERA), has been developed to support the methodology. This report documents the initial development efforts for HERA.

  10. Human events reference for ATHEANA (HERA) database description and preliminary user`s manual

    SciTech Connect

    Auflick, J.L.; Hahn, H.A.; Pond, D.J.

    1998-05-27

    The Technique for Human Error Analysis (ATHEANA) is a newly developed human reliability analysis (HRA) methodology that aims to facilitate better representation and integration of human performance into probabilistic risk assessment (PRA) modeling and quantification by analyzing risk-significant operating experience in the context of existing behavioral science models. The fundamental premise of ATHEANA is that error-forcing contexts (EFCs), which refer to combinations of equipment/material conditions and performance shaping factors (PSFs), set up or create the conditions under which unsafe actions (UAs) can occur. Because ATHEANA relies heavily on the analysis of operational events that have already occurred as a mechanism for generating creative thinking about possible EFCs, a database, called the Human Events Reference for ATHEANA (HERA), has been developed to support the methodology. This report documents the initial development efforts for HERA.

  11. Canto: an online tool for community literature curation

    PubMed Central

    Rutherford, Kim M.; Harris, Midori A.; Lock, Antonia; Oliver, Stephen G.; Wood, Valerie

    2014-01-01

    Motivation: Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species. Availability: Canto code and documentation are available under an Open Source license from http://curation.pombase.org/. Canto is a component of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/). Contact: helpdesk@pombase.org PMID:24574118

  12. NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases.

    PubMed

    Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin; Senger, Philipp

    2015-01-01

    Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article's supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in

  13. NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases.

    PubMed

    Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin; Senger, Philipp

    2015-01-01

    Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article's supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in

  14. Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts

    PubMed Central

    Neves, Mariana; Damaschun, Alexander; Mah, Nancy; Lekschas, Fritz; Seltmann, Stefanie; Stachelscheid, Harald; Fontaine, Jean-Fred; Kurtz, Andreas; Leser, Ulf

    2013-01-01

    Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ∼50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org/ PMID:23599415

  15. A visual review of the interactome of LRRK2: Using deep-curated molecular interaction data to represent biology

    PubMed Central

    Porras, Pablo; Duesbury, Margaret; Fabregat, Antonio; Ueffing, Marius; Orchard, Sandra; Gloeckner, Christian Johannes; Hermjakob, Henning

    2015-01-01

    Molecular interaction databases are essential resources that enable access to a wealth of information on associations between proteins and other biomolecules. Network graphs generated from these data provide an understanding of the relationships between different proteins in the cell, and network analysis has become a widespread tool supporting –omics analysis. Meaningfully representing this information remains far from trivial and different databases strive to provide users with detailed records capturing the experimental details behind each piece of interaction evidence. A targeted curation approach is necessary to transfer published data generated by primarily low-throughput techniques into interaction databases. In this review we present an example highlighting the value of both targeted curation and the subsequent effective visualization of detailed features of manually curated interaction information. We have curated interactions involving LRRK2, a protein of largely unknown function linked to familial forms of Parkinson's disease, and hosted the data in the IntAct database. This LRRK2-specific dataset was then used to produce different visualization examples highlighting different aspects of the data: the level of confidence in the interaction based on orthogonal evidence, those interactions found under close-to-native conditions, and the enzyme–substrate relationships in different in vitro enzymatic assays. Finally, pathway annotation taken from the Reactome database was overlaid on top of interaction networks to bring biological functional context to interaction maps. PMID:25648416

  16. A visual review of the interactome of LRRK2: Using deep-curated molecular interaction data to represent biology.

    PubMed

    Porras, Pablo; Duesbury, Margaret; Fabregat, Antonio; Ueffing, Marius; Orchard, Sandra; Gloeckner, Christian Johannes; Hermjakob, Henning

    2015-04-01

    Molecular interaction databases are essential resources that enable access to a wealth of information on associations between proteins and other biomolecules. Network graphs generated from these data provide an understanding of the relationships between different proteins in the cell, and network analysis has become a widespread tool supporting -omics analysis. Meaningfully representing this information remains far from trivial and different databases strive to provide users with detailed records capturing the experimental details behind each piece of interaction evidence. A targeted curation approach is necessary to transfer published data generated by primarily low-throughput techniques into interaction databases. In this review we present an example highlighting the value of both targeted curation and the subsequent effective visualization of detailed features of manually curated interaction information. We have curated interactions involving LRRK2, a protein of largely unknown function linked to familial forms of Parkinson's disease, and hosted the data in the IntAct database. This LRRK2-specific dataset was then used to produce different visualization examples highlighting different aspects of the data: the level of confidence in the interaction based on orthogonal evidence, those interactions found under close-to-native conditions, and the enzyme-substrate relationships in different in vitro enzymatic assays. Finally, pathway annotation taken from the Reactome database was overlaid on top of interaction networks to bring biological functional context to interaction maps.

  17. Cataloging the biomedical world of pain through semi-automated curation of molecular interactions.

    PubMed

    Jamieson, Daniel G; Roberts, Phoebe M; Robertson, David L; Sidders, Ben; Nenadic, Goran

    2013-01-01

    The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie 'pain', a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can

  18. Cataloging the biomedical world of pain through semi-automated curation of molecular interactions.

    PubMed

    Jamieson, Daniel G; Roberts, Phoebe M; Robertson, David L; Sidders, Ben; Nenadic, Goran

    2013-01-01

    The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie 'pain', a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can

  19. Cataloging the biomedical world of pain through semi-automated curation of molecular interactions

    PubMed Central

    Jamieson, Daniel G.; Roberts, Phoebe M.; Robertson, David L.; Sidders, Ben; Nenadic, Goran

    2013-01-01

    The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie ‘pain’, a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can

  20. The Halophile protein database.

    PubMed

    Sharma, Naveen; Farooqi, Mohammad Samir; Chaturvedi, Krishna Kumar; Lal, Shashi Bhushan; Grover, Monendra; Rai, Anil; Pandey, Pankaj

    2014-01-01

    Halophilic archaea/bacteria adapt to different salt concentration, namely extreme, moderate and low. These type of adaptations may occur as a result of modification of protein structure and other changes in different cell organelles. Thus proteins may play an important role in the adaptation of halophilic archaea/bacteria to saline conditions. The Halophile protein database (HProtDB) is a systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed. These physicochemical properties play an important role in identifying the protein structure, bonding pattern and function of the specific proteins. This database is comprehensive, manually curated, non-redundant catalogue of proteins. The database currently contains 59 897 proteins properties extracted from 21 different strains of halophilic archaea/bacteria. The database can be accessed through link. Database URL: http://webapp.cabgrid.res.in/protein/

  1. Curation, integration and visualization of bacterial virulence factors in PATRIC

    PubMed Central

    Mao, Chunhong; Abraham, David; Wattam, Alice R.; Wilson, Meredith J.C.; Shukla, Maulik; Yoo, Hyun Seung; Sobral, Bruno W.

    2015-01-01

    Motivation: We’ve developed a highly curated bacterial virulence factor (VF) library in PATRIC (Pathosystems Resource Integration Center, www.patricbrc.org) to support infectious disease research. Although several VF databases are available, there is still a need to incorporate new knowledge found in published experimental evidence and integrate these data with other information known for these specific VF genes, including genomic and other omics data. This integration supports the identification of VFs, comparative studies and hypothesis generation, which facilitates the understanding of virulence and pathogenicity. Results: We have manually curated VFs from six prioritized NIAID (National Institute of Allergy and Infectious Diseases) category A–C bacterial pathogen genera, Mycobacterium, Salmonella, Escherichia, Shigella, Listeria and Bartonella, using published literature. This curated information on virulence has been integrated with data from genomic functional annotations, trancriptomic experiments, protein–protein interactions and disease information already present in PATRIC. Such integration gives researchers access to a broad array of information about these individual genes, and also to a suite of tools to perform comparative genomic and transcriptomics analysis that are available at PATRIC. Availability and implementation: All tools and data are freely available at PATRIC (http://patricbrc.org). Contact: cmao@vbi.vt.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25273106

  2. INOH: ontology-based highly structured database of signal transduction pathways

    PubMed Central

    Sakai, Noriko; Nakamura, Hiromi; Fukagawa, Hiroshi; Fukuda, Ken; Takagi, Toshihisa

    2011-01-01

    The Integrating Network Objects with Hierarchies (INOH) database is a highly structured, manually curated database of signal transduction pathways including Mammalia, Xenopus laevis, Drosophila melanogaster, Caenorhabditis elegans and canonical. Since most pathway knowledge resides in scientific articles, the database focuses on curating and encoding textual knowledge into a machine-processable form. We use a hierarchical pathway representation model with a compound graph, and every pathway component in the INOH database is annotated by a set of uniquely developed ontologies. Finally, we developed the Similarity Search using the combination of a compound graph and hierarchical ontologies. The INOH database is to be a good resource for many users who want to analyze a large protein network. INOH ontologies and 73 signal transduction and 29 metabolic pathway diagrams (including over 6155 interactions and 3395 protein entities) are freely available in INOH XML and BioPAX formats. Database URL: http://www.inoh.org/ PMID:22120663

  3. How well are protein structures annotated in secondary databases?

    PubMed

    Rother, Kristian; Michalsky, Elke; Leser, Ulf

    2005-09-01

    We investigated to what extent Protein Data Bank (PDB) entries are annotated with second-party information based on existing cross-references between PDB and 15 other databases. We report 2 interesting findings. First, there is a clear "annotation gap" for structures less than 7 years old for secondary databases that are manually curated. Second, the examined databases overlap with each other quite well, dividing the PDB into 2 well-annotated thirds and one poorly annotated third. Both observations should be taken into account in any study depending on the selection of protein structures by their annotation.

  4. Creation of a genome-wide metabolic pathway database for Populus trichocarpa using a new approach for reconstruction and curation of metabolic pathways for plants.

    PubMed

    Zhang, Peifen; Dreher, Kate; Karthikeyan, A; Chi, Anjo; Pujar, Anuradha; Caspi, Ron; Karp, Peter; Kirkup, Vanessa; Latendresse, Mario; Lee, Cynthia; Mueller, Lukas A; Muller, Robert; Rhee, Seung Yon

    2010-08-01

    Metabolic networks reconstructed from sequenced genomes or transcriptomes can help visualize and analyze large-scale experimental data, predict metabolic phenotypes, discover enzymes, engineer metabolic pathways, and study metabolic pathway evolution. We developed a general approach for reconstructing metabolic pathway complements of plant genomes. Two new reference databases were created and added to the core of the infrastructure: a comprehensive, all-plant reference pathway database, PlantCyc, and a reference enzyme sequence database, RESD, for annotating metabolic functions of protein sequences. PlantCyc (version 3.0) includes 714 metabolic pathways and 2,619 reactions from over 300 species. RESD (version 1.0) contains 14,187 literature-supported enzyme sequences from across all kingdoms. We used RESD, PlantCyc, and MetaCyc (an all-species reference metabolic pathway database), in conjunction with the pathway prediction software Pathway Tools, to reconstruct a metabolic pathway database, PoplarCyc, from the recently sequenced genome of Populus trichocarpa. PoplarCyc (version 1.0) contains 321 pathways with 1,807 assigned enzymes. Comparing PoplarCyc (version 1.0) with AraCyc (version 6.0, Arabidopsis [Arabidopsis thaliana]) showed comparable numbers of pathways distributed across all domains of metabolism in both databases, except for a higher number of AraCyc pathways in secondary metabolism and a 1.5-fold increase in carbohydrate metabolic enzymes in PoplarCyc. Here, we introduce these new resources and demonstrate the feasibility of using them to identify candidate enzymes for specific pathways and to analyze metabolite profiling data through concrete examples. These resources can be searched by text or BLAST, browsed, and downloaded from our project Web site (http://plantcyc.org).

  5. Quality Control of Biomedicinal Allergen Products – Highly Complex Isoallergen Composition Challenges Standard MS Database Search and Requires Manual Data Analyses

    PubMed Central

    Spiric, Jelena; Engin, Anna M.; Karas, Michael; Reuter, Andreas

    2015-01-01

    Allergy against birch pollen is among the most common causes of spring pollinosis in Europe and is diagnosed and treated using extracts from natural sources. Quality control is crucial for safe and effective diagnosis and treatment. However, current methods are very difficult to standardize and do not address individual allergen or isoallergen composition. MS provides information regarding selected proteins or the entire proteome and could overcome the aforementioned limitations. We studied the proteome of birch pollen, focusing on allergens and isoallergens, to clarify which of the 93 published sequence variants of the major allergen, Bet v 1, are expressed as proteins within one source material in parallel. The unexpectedly complex Bet v 1 isoallergen composition required manual data interpretation and a specific design of databases, as current database search engines fail to unambiguously assign spectra to highly homologous, partially identical proteins. We identified 47 non-allergenic proteins and all 5 known birch pollen allergens, and unambiguously proved the existence of 18 Bet v 1 isoallergens and variants by manual data analysis. This highly complex isoallergen composition raises questions whether isoallergens can be ignored or must be included for the quality control of allergen products, and which data analysis strategies are to be applied. PMID:26561299

  6. DIP: The Database of Interacting Proteins

    DOE Data Explorer

    The DIP Database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. By interaction, the DIP Database creators mean that two amino acid chains were experimentally identified to bind to each other. The database lists such pairs to aid those studying a particular protein-protein interaction but also those investigating entire regulatory and signaling pathways as well as those studying the organisation and complexity of the protein interaction network at the cellular level. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data. It is a relational database that can be searched by protein, sequence, motif, article information, and pathBLAST. The website also serves as an access point to a number of projects related to DIP, such as LiveDIP, The Database of Ligand-Receptor Partners (DLRP) and JDIP. Users have free and open access to DIP after login. [Taken from the DIP Guide and the DIP website] (Specialized Interface) (Registration Required)

  7. NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases

    PubMed Central

    Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin

    2015-01-01

    Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article’s supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in

  8. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

    PubMed Central

    O'Leary, Nuala A.; Wright, Mathew W.; Brister, J. Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M.; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S.; Kodali, Vamsi K.; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M.; Murphy, Michael R.; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H.; Rausch, Daniel; Riddick, Lillian D.; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S.; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E.; Vatsan, Anjana R.; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J.; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D.; Pruitt, Kim D.

    2016-01-01

    The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. PMID:26553804

  9. Curation of Frozen Samples

    NASA Technical Reports Server (NTRS)

    Fletcher, L. A.; Allen, C. C.; Bastien, R.

    2008-01-01

    NASA's Johnson Space Center (JSC) and the Astromaterials Curator are charged by NPD 7100.10D with the curation of all of NASA s extraterrestrial samples, including those from future missions. This responsibility includes the development of new sample handling and preparation techniques; therefore, the Astromaterials Curator must begin developing procedures to preserve, prepare and ship samples at sub-freezing temperatures in order to enable future sample return missions. Such missions might include the return of future frozen samples from permanently-shadowed lunar craters, the nuclei of comets, the surface of Mars, etc. We are demonstrating the ability to curate samples under cold conditions by designing, installing and testing a cold curation glovebox. This glovebox will allow us to store, document, manipulate and subdivide frozen samples while quantifying and minimizing contamination throughout the curation process.

  10. HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    PubMed

    Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D

    2016-01-01

    The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease. PMID:26578596

  11. Sentra : a database of signal transduction proteins for comparative genome analysis.

    SciTech Connect

    D'Souza, M.; Glass, E. M.; Syed, M. H.; Zhang, Y.; Rodriguez, A.; Maltsev, N.; Galerpin, M. Y.; Mathematics and Computer Science; Univ. of Chicago; NIH

    2007-01-01

    Sentra (http://compbio.mcs.anl.gov/sentra), a database of signal transduction proteins encoded in completely sequenced prokaryotic genomes, has been updated to reflect recent advances in understanding signal transduction events on a whole-genome scale. Sentra consists of two principal components, a manually curated list of signal transduction proteins in 202 completely sequenced prokaryotic genomes and an automatically generated listing of predicted signaling proteins in 235 sequenced genomes that are awaiting manual curation. In addition to two-component histidine kinases and response regulators, the database now lists manually curated Ser/Thr/Tyr protein kinases and protein phosphatases, as well as adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases, as defined in several recent reviews. All entries in Sentra are extensively annotated with relevant information from public databases (e.g. UniProt, KEGG, PDB and NCBI). Sentra's infrastructure was redesigned to support interactive cross-genome comparisons of signal transduction capabilities of prokaryotic organisms from a taxonomic and phenotypic perspective and in the framework of signal transduction pathways from KEGG. Sentra leverages the PUMA2 system to support interactive analysis and annotation of signal transduction proteins by the users.

  12. HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    PubMed

    Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D

    2016-01-01

    The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease.

  13. Southern African Treatment Resistance Network (SATuRN) RegaDB HIV drug resistance and clinical management database: supporting patient management, surveillance and research in southern Africa.

    PubMed

    Manasa, Justen; Lessells, Richard; Rossouw, Theresa; Naidu, Kevindra; Van Vuuren, Cloete; Goedhals, Dominique; van Zyl, Gert; Bester, Armand; Skingsley, Andrew; Stott, Katharine; Danaviah, Siva; Chetty, Terusha; Singh, Lavanya; Moodley, Pravi; Iwuji, Collins; McGrath, Nuala; Seebregts, Christopher J; de Oliveira, Tulio

    2014-01-01

    Substantial amounts of data have been generated from patient management and academic exercises designed to better understand the human immunodeficiency virus (HIV) epidemic and design interventions to control it. A number of specialized databases have been designed to manage huge data sets from HIV cohort, vaccine, host genomic and drug resistance studies. Besides databases from cohort studies, most of the online databases contain limited curated data and are thus sequence repositories. HIV drug resistance has been shown to have a great potential to derail the progress made thus far through antiretroviral therapy. Thus, a lot of resources have been invested in generating drug resistance data for patient management and surveillance purposes. Unfortunately, most of the data currently available relate to subtype B even though >60% of the epidemic is caused by HIV-1 subtype C. A consortium of clinicians, scientists, public health experts and policy markers working in southern Africa came together and formed a network, the Southern African Treatment and Resistance Network (SATuRN), with the aim of increasing curated HIV-1 subtype C and tuberculosis drug resistance data. This article describes the HIV-1 data curation process using the SATuRN Rega database. The data curation is a manual and time-consuming process done by clinical, laboratory and data curation specialists. Access to the highly curated data sets is through applications that are reviewed by the SATuRN executive committee. Examples of research outputs from the analysis of the curated data include trends in the level of transmitted drug resistance in South Africa, analysis of the levels of acquired resistance among patients failing therapy and factors associated with the absence of genotypic evidence of drug resistance among patients failing therapy. All these studies have been important for informing first- and second-line therapy. This database is a free password-protected open source database available on

  14. Curating the Shelves

    ERIC Educational Resources Information Center

    Schiano, Deborah

    2013-01-01

    Curation: to gather, organize, and present resources in a way that meets information needs and interests, makes sense for virtual as well as physical resources. A Northern New Jersey middle school library made the decision to curate its physical resources according to the needs of its users, and, in so doing, created a shelving system that is,…

  15. Ontology searching and browsing at the Rat Genome Database.

    PubMed

    Laulederkind, Stanley J F; Tutaj, Marek; Shimoyama, Mary; Hayman, G Thomas; Lowry, Timothy F; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R; Wang, Shur-Jen; de Pons, Jeff; Dwinell, Melinda R; Jacob, Howard J

    2012-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40 000 rat gene records, as well as human and mouse orthologs, 1857 rat and 1912 human quantitative trait loci (QTLs) and 2347 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. RGD uses more than a dozen different ontologies to standardize annotation information for genes, QTLs and strains. That means a lot of time can be spent searching and browsing ontologies for the appropriate terms needed both for curating and mining the data. RGD has upgraded its ontology term search to make it more versatile and more robust. A term search result is connected to a term browser so the user can fine-tune the search by viewing parent and children terms. Most publicly available term browsers display a hierarchical organization of terms in an expandable tree format. RGD has replaced its old tree browser format with a 'driller' type of browser that allows quicker drilling up and down through the term branches, which has been confirmed by testing. The RGD ontology report pages have also been upgraded. Expanded functionality allows more choice in how annotations are displayed and what subsets of annotations are displayed. The new ontology search, browser and report features have been designed to enhance both manual data curation and manual data extraction. DATABASE URL: http://rgd.mcw.edu/rgdweb/ontology/search.html.

  16. Mining clinical attributes of genomic variants through assisted literature curation in Egas.

    PubMed

    Matos, Sérgio; Campos, David; Pinho, Renato; Silva, Raquel M; Mort, Matthew; Cooper, David N; Oliveira, José Luís

    2016-01-01

    The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/Database URL: https://demo.bmd-software.com/egas/.

  17. Mining clinical attributes of genomic variants through assisted literature curation in Egas.

    PubMed

    Matos, Sérgio; Campos, David; Pinho, Renato; Silva, Raquel M; Mort, Matthew; Cooper, David N; Oliveira, José Luís

    2016-01-01

    The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/Database URL: https://demo.bmd-software.com/egas/. PMID:27278817

  18. Mining clinical attributes of genomic variants through assisted literature curation in Egas

    PubMed Central

    Matos, Sérgio; Campos, David; Pinho, Renato; Silva, Raquel M.; Mort, Matthew; Cooper, David N.; Oliveira, José Luís

    2016-01-01

    The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/. Database URL: https://demo.bmd-software.com/egas/ PMID:27278817

  19. SABIO-RK--database for biochemical reaction kinetics.

    PubMed

    Wittig, Ulrike; Kania, Renate; Golebiewski, Martin; Rey, Maja; Shi, Lei; Jong, Lenneke; Algaa, Enkhjargal; Weidemann, Andreas; Sauer-Danzwith, Heidrun; Mir, Saqib; Krebs, Olga; Bittkowski, Meik; Wetsch, Elina; Rojas, Isabel; Müller, Wolfgang

    2012-01-01

    SABIO-RK (http://sabio.h-its.org/) is a web-accessible database storing comprehensive information about biochemical reactions and their kinetic properties. SABIO-RK offers standardized data manually extracted from the literature and data directly submitted from lab experiments. The database content includes kinetic parameters in relation to biochemical reactions and their biological sources with no restriction on any particular set of organisms. Additionally, kinetic rate laws and corresponding equations as well as experimental conditions are represented. All the data are manually curated and annotated by biological experts, supported by automated consistency checks. SABIO-RK can be accessed via web-based user interfaces or automatically via web services that allow direct data access by other tools. Both interfaces support the export of the data together with its annotations in SBML (Systems Biology Markup Language), e.g. for import in modelling tools.

  20. HypoxiaDB: a database of hypoxia-regulated proteins.

    PubMed

    Khurana, Pankaj; Sugadev, Ragumani; Jain, Jaspreet; Singh, Shashi Bala

    2013-01-01

    There has been intense interest in the cellular response to hypoxia, and a large number of differentially expressed proteins have been identified through various high-throughput experiments. These valuable data are scattered, and there have been no systematic attempts to document the various proteins regulated by hypoxia. Compilation, curation and annotation of these data are important in deciphering their role in hypoxia and hypoxia-related disorders. Therefore, we have compiled HypoxiaDB, a database of hypoxia-regulated proteins. It is a comprehensive, manually-curated, non-redundant catalog of proteins whose expressions are shown experimentally to be altered at different levels and durations of hypoxia. The database currently contains 72 000 manually curated entries taken on 3500 proteins extracted from 73 peer-reviewed publications selected from PubMed. HypoxiaDB is distinctive from other generalized databases: (i) it compiles tissue-specific protein expression changes under different levels and duration of hypoxia. Also, it provides manually curated literature references to support the inclusion of the protein in the database and establish its association with hypoxia. (ii) For each protein, HypoxiaDB integrates data on gene ontology, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, protein-protein interactions, protein family (Pfam), OMIM (Online Mendelian Inheritance in Man), PDB (Protein Data Bank) structures and homology to other sequenced genomes. (iii) It also provides pre-compiled information on hypoxia-proteins, which otherwise requires tedious computational analysis. This includes information like chromosomal location, identifiers like Entrez, HGNC, Unigene, Uniprot, Ensembl, Vega, GI numbers and Genbank accession numbers associated with the protein. These are further cross-linked to respective public databases augmenting HypoxiaDB to the external repositories. (iv) In addition, HypoxiaDB provides an online sequence-similarity search tool for

  1. Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12.

    PubMed

    Gama-Castro, Socorro; Rinaldi, Fabio; López-Fuentes, Alejandra; Balderas-Martínez, Yalbi Itzel; Clematide, Simon; Ellendorff, Tilia Renate; Santos-Zavaleta, Alberto; Marques-Madeira, Hernani; Collado-Vides, Julio

    2014-01-01

    Given the current explosion of data within original publications generated in the field of genomics, a recognized bottleneck is the transfer of such knowledge into comprehensive databases. We have for years organized knowledge on transcriptional regulation reported in the original literature of Escherichia coli K-12 into RegulonDB (http://regulondb.ccg.unam.mx), our database that is currently supported by >5000 papers. Here, we report a first step towards the automatic biocuration of growth conditions in this corpus. Using the OntoGene text-mining system (http://www.ontogene.org), we extracted and manually validated regulatory interactions and growth conditions in a new approach based on filters that enable the curator to select informative sentences from preprocessed full papers. Based on a set of 48 papers dealing with oxidative stress by OxyR, we were able to retrieve 100% of the OxyR regulatory interactions present in RegulonDB, including the transcription factors and their effect on target genes. Our strategy was designed to extract, as we did, their growth conditions. This result provides a proof of concept for a more direct and efficient curation process, and enables us to define the strategy of the subsequent steps to be implemented for a semi-automatic curation of original literature dealing with regulation of gene expression in bacteria. This project will enhance the efficiency and quality of the curation of knowledge present in the literature of gene regulation, and contribute to a significant increase in the encoding of the regulatory network of E. coli. RegulonDB Database URL: http://regulondb.ccg.unam.mx OntoGene URL: http://www.ontogene.org.

  2. NeXO Web: the NeXO ontology database and visualization platform

    PubMed Central

    Dutkowski, Janusz; Ono, Keiichiro; Kramer, Michael; Yu, Michael; Pratt, Dexter; Demchak, Barry; Ideker, Trey

    2014-01-01

    The Network-extracted Ontology (NeXO) is a gene ontology inferred directly from large-scale molecular networks. While most ontologies are constructed through manual expert curation, NeXO uses a principled computational approach which integrates evidence from hundreds of thousands of individual gene and protein interactions to construct a global hierarchy of cellular components and processes. Here, we describe the development of the NeXO Web platform (http://www.nexontology.org)—an online database and graphical user interface for visualizing, browsing and performing term enrichment analysis using NeXO and the gene ontology. The platform applies state-of-the-art web technology and visualization techniques to provide an intuitive framework for investigating biological machinery captured by both data-driven and manually curated ontologies. PMID:24271398

  3. HoPaCI-DB: host-Pseudomonas and Coxiella interaction database

    PubMed Central

    Bleves, Sophie; Dunger, Irmtraud; Walter, Mathias C.; Frangoulidis, Dimitrios; Kastenmüller, Gabi; Voulhoux, Romé; Ruepp, Andreas

    2014-01-01

    Bacterial infectious diseases are the result of multifactorial processes affected by the interplay between virulence factors and host targets. The host-Pseudomonas and Coxiella interaction database (HoPaCI-DB) is a publicly available manually curated integrative database (http://mips.helmholtz-muenchen.de/HoPaCI/) of host–pathogen interaction data from Pseudomonas aeruginosa and Coxiella burnetii. The resource provides structured information on 3585 experimentally validated interactions between molecules, bioprocesses and cellular structures extracted from the scientific literature. Systematic annotation and interactive graphical representation of disease networks make HoPaCI-DB a versatile knowledge base for biologists and network biology approaches. PMID:24137008

  4. Cazymes Analysis Toolkit (CAT): Webservice for searching and analyzing carbohydrateactive enzymes in a newly sequenced organism using CAZy database

    SciTech Connect

    Karpinets, Tatiana V; Park, Byung; Syed, Mustafa H; Uberbacher, Edward C; Leuze, Michael Rex

    2010-01-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire non-redundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains (DUF) and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit (CAT), and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

  5. SPdb – a signal peptide database

    PubMed Central

    Choo, Khar Heng; Tan, Tin Wee; Ranganathan, Shoba

    2005-01-01

    Background The signal peptide plays an important role in protein targeting and protein translocation in both prokaryotic and eukaryotic cells. This transient, short peptide sequence functions like a postal address on an envelope by targeting proteins for secretion or for transfer to specific organelles for further processing. Understanding how signal peptides function is crucial in predicting where proteins are translocated. To support this understanding, we present SPdb signal peptide database , a repository of experimentally determined and computationally predicted signal peptides. Results SPdb integrates information from two sources (a) Swiss-Prot protein sequence database which is now part of UniProt and (b) EMBL nucleotide sequence database. The database update is semi-automated with human checking and verification of the data to ensure the correctness of the data stored. The latest release SPdb release 3.2 contains 18,146 entries of which 2,584 entries are experimentally verified signal sequences; the remaining 15,562 entries are either signal sequences that fail to meet our filtering criteria or entries that contain unverified signal sequences. Conclusion SPdb is a manually curated database constructed to support the understanding and analysis of signal peptides. SPdb tracks the major updates of the two underlying primary databases thereby ensuring that its information remains up-to-date. PMID:16221310

  6. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases

    PubMed Central

    Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    Motivation First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. Results We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material. PMID:27489953

  7. DDRprot: a database of DNA damage response-related proteins

    PubMed Central

    Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M.

    2016-01-01

    The DNA Damage Response (DDR) signalling network is an essential system that protects the genome’s integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used. Database URL: http://ddr.cbbio.es. PMID:27577567

  8. DDRprot: a database of DNA damage response-related proteins.

    PubMed

    Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M

    2016-01-01

    The DNA Damage Response (DDR) signalling network is an essential system that protects the genome's integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.Database URL: http://ddr.cbbio.es. PMID:27577567

  9. The National NeuroAIDS Tissue Consortium (NNTC) Database: an integrated database for HIV-related studies.

    PubMed

    Cserhati, Matyas F; Pandey, Sanjit; Beaudoin, James J; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S

    2015-01-01

    We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33,017,407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database.

  10. The Protein Ensemble Database.

    PubMed

    Varadi, Mihaly; Tompa, Peter

    2015-01-01

    The scientific community's major conceptual notion of structural biology has recently shifted in emphasis from the classical structure-function paradigm due to the emergence of intrinsically disordered proteins (IDPs). As opposed to their folded cousins, these proteins are defined by the lack of a stable 3D fold and a high degree of inherent structural heterogeneity that is closely tied to their function. Due to their flexible nature, solution techniques such as small-angle X-ray scattering (SAXS), nuclear magnetic resonance (NMR) spectroscopy and fluorescence resonance energy transfer (FRET) are particularly well-suited for characterizing their biophysical properties. Computationally derived structural ensembles based on such experimental measurements provide models of the conformational sampling displayed by these proteins, and they may offer valuable insights into the functional consequences of inherent flexibility. The Protein Ensemble Database (http://pedb.vib.be) is the first openly accessible, manually curated online resource storing the ensemble models, protocols used during the calculation procedure, and underlying primary experimental data derived from SAXS and/or NMR measurements. By making this previously inaccessible data freely available to researchers, this novel resource is expected to promote the development of more advanced modelling methodologies, facilitate the design of standardized calculation protocols, and consequently lead to a better understanding of how function arises from the disordered state.

  11. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  12. Curating NASA's Past, Present, and Future Astromaterial Sample Collections

    NASA Technical Reports Server (NTRS)

    Zeigler, R. A.; Allton, J. H.; Evans, C. A.; Fries, M. D.; McCubbin, F. M.; Nakamura-Messenger, K.; Righter, K.; Zolensky, M.; Stansbery, E. K.

    2016-01-01

    The Astromaterials Acquisition and Curation Office at NASA Johnson Space Center (hereafter JSC curation) is responsible for curating all of NASA's extraterrestrial samples. JSC presently curates 9 different astromaterials collections in seven different clean-room suites: (1) Apollo Samples (ISO (International Standards Organization) class 6 + 7); (2) Antarctic Meteorites (ISO 6 + 7); (3) Cosmic Dust Particles (ISO 5); (4) Microparticle Impact Collection (ISO 7; formerly called Space-Exposed Hardware); (5) Genesis Solar Wind Atoms (ISO 4); (6) Stardust Comet Particles (ISO 5); (7) Stardust Interstellar Particles (ISO 5); (8) Hayabusa Asteroid Particles (ISO 5); (9) OSIRIS-REx Spacecraft Coupons and Witness Plates (ISO 7). Additional cleanrooms are currently being planned to house samples from two new collections, Hayabusa 2 (2021) and OSIRIS-REx (2023). In addition to the labs that house the samples, we maintain a wide variety of infra-structure facilities required to support the clean rooms: HEPA-filtered air-handling systems, ultrapure dry gaseous nitrogen systems, an ultrapure water system, and cleaning facilities to provide clean tools and equipment for the labs. We also have sample preparation facilities for making thin sections, microtome sections, and even focused ion-beam sections. We routinely monitor the cleanliness of our clean rooms and infrastructure systems, including measurements of inorganic or organic contamination, weekly airborne particle counts, compositional and isotopic monitoring of liquid N2 deliveries, and daily UPW system monitoring. In addition to the physical maintenance of the samples, we track within our databases the current and ever changing characteristics (weight, location, etc.) of more than 250,000 individually numbered samples across our various collections, as well as more than 100,000 images, and countless "analog" records that record the sample processing records of each individual sample. JSC Curation is co-located with JSC

  13. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015

    PubMed Central

    Davis, Allan Peter; Grondin, Cynthia J.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; King, Benjamin L.; Wiegers, Thomas C.; Mattingly, Carolyn J.

    2015-01-01

    Ten years ago, the Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) was developed out of a need to formalize, harmonize and centralize the information on numerous genes and proteins responding to environmental toxic agents across diverse species. CTD's initial approach was to facilitate comparisons of nucleotide and protein sequences of toxicologically significant genes by curating these sequences and electronically annotating them with chemical terms from their associated references. Since then, however, CTD has vastly expanded its scope to robustly represent a triad of chemical–gene, chemical–disease and gene–disease interactions that are manually curated from the scientific literature by professional biocurators using controlled vocabularies, ontologies and structured notation. Today, CTD includes 24 million toxicogenomic connections relating chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, Gene Ontology annotations, pathways and interaction modules. In this 10th year anniversary update, we outline the evolution of CTD, including our increased data content, new ‘Pathway View’ visualization tool, enhanced curation practices, pilot chemical–phenotype results and impending exposure data set. The prototype database originally described in our first report has transformed into a sophisticated resource used actively today to help scientists develop and test hypotheses about the etiologies of environmentally influenced diseases. PMID:25326323

  14. MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins

    PubMed Central

    Potenza, Emilio; Domenico, Tomás Di; Walsh, Ian; Tosatto, Silvio C.E.

    2015-01-01

    MobiDB (http://mobidb.bio.unipd.it/) is a database of intrinsically disordered and mobile proteins. Intrinsically disordered regions are key for the function of numerous proteins. Here we provide a new version of MobiDB, a centralized source aimed at providing the most complete picture on different flavors of disorder in protein structures covering all UniProt sequences (currently over 80 million). The database features three levels of annotation: manually curated, indirect and predicted. Manually curated data is extracted from the DisProt database. Indirect data is inferred from PDB structures that are considered an indication of intrinsic disorder. The 10 predictors currently included (three ESpritz flavors, two IUPred flavors, two DisEMBL flavors, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein in absence of more reliable data. The new version also features a consensus annotation and classification for long disordered regions. In order to complement the disorder annotations, MobiDB features additional annotations from external sources. Annotations from the UniProt database include post-translational modifications and linear motifs. Pfam annotations are displayed in graphical form and are link-enabled, allowing the user to visit the corresponding Pfam page for further information. Experimental protein–protein interactions from STRING are also classified for disorder content. PMID:25361972

  15. VHLdb: A database of von Hippel-Lindau protein interactors and mutations.

    PubMed

    Tabaro, Francesco; Minervini, Giovanni; Sundus, Faiza; Quaglia, Federica; Leonardi, Emanuela; Piovesan, Damiano; Tosatto, Silvio C E

    2016-01-01

    Mutations in von Hippel-Lindau tumor suppressor protein (pVHL) predispose to develop tumors affecting specific target organs, such as the retina, epididymis, adrenal glands, pancreas and kidneys. Currently, more than 400 pVHL interacting proteins are either described in the literature or predicted in public databases. This data is scattered among several different sources, slowing down the comprehension of pVHL's biological role. Here we present VHLdb, a novel database collecting available interaction and mutation data on pVHL to provide novel integrated annotations. In VHLdb, pVHL interactors are organized according to two annotation levels, manual and automatic. Mutation data are easily accessible and a novel visualization tool has been implemented. A user-friendly feedback function to improve database content through community-driven curation is also provided. VHLdb presently contains 478 interactors, of which 117 have been manually curated, and 1,074 mutations. This makes it the largest available database for pVHL-related information. VHLdb is available from URL: http://vhldb.bio.unipd.it/. PMID:27511743

  16. VHLdb: A database of von Hippel-Lindau protein interactors and mutations

    PubMed Central

    Tabaro, Francesco; Minervini, Giovanni; Sundus, Faiza; Quaglia, Federica; Leonardi, Emanuela; Piovesan, Damiano; Tosatto, Silvio C. E.

    2016-01-01

    Mutations in von Hippel-Lindau tumor suppressor protein (pVHL) predispose to develop tumors affecting specific target organs, such as the retina, epididymis, adrenal glands, pancreas and kidneys. Currently, more than 400 pVHL interacting proteins are either described in the literature or predicted in public databases. This data is scattered among several different sources, slowing down the comprehension of pVHL’s biological role. Here we present VHLdb, a novel database collecting available interaction and mutation data on pVHL to provide novel integrated annotations. In VHLdb, pVHL interactors are organized according to two annotation levels, manual and automatic. Mutation data are easily accessible and a novel visualization tool has been implemented. A user-friendly feedback function to improve database content through community-driven curation is also provided. VHLdb presently contains 478 interactors, of which 117 have been manually curated, and 1,074 mutations. This makes it the largest available database for pVHL-related information. VHLdb is available from URL: http://vhldb.bio.unipd.it/. PMID:27511743

  17. FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana.

    PubMed

    Bouché, Frédéric; Lobet, Guillaume; Tocquin, Pierre; Périlleux, Claire

    2016-01-01

    Flowering is a hot topic in Plant Biology and important progress has been made in Arabidopsis thaliana toward unraveling the genetic networks involved. The increasing complexity and the explosion of literature however require development of new tools for information management and update. We therefore created an evolutive and interactive database of flowering time genes, named FLOR-ID (Flowering-Interactive Database), which is freely accessible at http://www.flor-id.org. The hand-curated database contains information on 306 genes and links to 1595 publications gathering the work of >4500 authors. Gene/protein functions and interactions within the flowering pathways were inferred from the analysis of related publications, included in the database and translated into interactive manually drawn snapshots. PMID:26476447

  18. FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana

    PubMed Central

    Bouché, Frédéric; Lobet, Guillaume; Tocquin, Pierre; Périlleux, Claire

    2016-01-01

    Flowering is a hot topic in Plant Biology and important progress has been made in Arabidopsis thaliana toward unraveling the genetic networks involved. The increasing complexity and the explosion of literature however require development of new tools for information management and update. We therefore created an evolutive and interactive database of flowering time genes, named FLOR-ID (Flowering-Interactive Database), which is freely accessible at http://www.flor-id.org. The hand-curated database contains information on 306 genes and links to 1595 publications gathering the work of >4500 authors. Gene/protein functions and interactions within the flowering pathways were inferred from the analysis of related publications, included in the database and translated into interactive manually drawn snapshots. PMID:26476447

  19. Pfam: the protein families database

    PubMed Central

    Finn, Robert D.; Bateman, Alex; Clements, Jody; Coggill, Penelope; Eberhardt, Ruth Y.; Eddy, Sean R.; Heger, Andreas; Hetherington, Kirstie; Holm, Liisa; Mistry, Jaina; Sonnhammer, Erik L. L.; Tate, John; Punta, Marco

    2014-01-01

    Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures. PMID:24288371

  20. BGDB: a database of bivalent genes.

    PubMed

    Li, Qingyan; Lian, Shuabin; Dai, Zhiming; Xiang, Qian; Dai, Xianhua

    2013-01-01

    Bivalent gene is a gene marked with both H3K4me3 and H3K27me3 epigenetic modification in the same area, and is proposed to play a pivotal role related to pluripotency in embryonic stem (ES) cells. Identification of these bivalent genes and understanding their functions are important for further research of lineage specification and embryo development. So far, lots of genome-wide histone modification data were generated in mouse and human ES cells. These valuable data make it possible to identify bivalent genes, but no comprehensive data repositories or analysis tools are available for bivalent genes currently. In this work, we develop BGDB, the database of bivalent genes. The database contains 6897 bivalent genes in human and mouse ES cells, which are manually collected from scientific literature. Each entry contains curated information, including genomic context, sequences, gene ontology and other relevant information. The web services of BGDB database were implemented with PHP + MySQL + JavaScript, and provide diverse query functions. Database URL: http://dailab.sysu.edu.cn/bgdb/

  1. The Human PAX6 Mutation Database.

    PubMed

    Brown, A; McKie, M; van Heyningen, V; Prosser, J

    1998-01-01

    The Human PAX6 Mutation Database contains details of 94 mutations of the PAX6 gene. A Microsoft Access program is used by the Curator to store, update and search the database entries. Mutations can be entered directly by the Curator, or imported from submissions made via the World Wide Web. The PAX6 Mutation Database web page at URL http://www.hgu.mrc.ac.uk/Softdata/PAX6/ provides information about PAX6, as well as a fill-in form through which new mutations can be submitted to the Curator. A search facility allows remote users to query the database. A plain text format file of the data can be downloaded via the World Wide Web. The Curation program contains prior knowledge of the genetic code and of the PAX6 gene including cDNA sequence, location of intron/exon boundaries, and protein domains, so that the minimum of information need be provided by the submitter or Curator.

  2. Lynx: a database and knowledge extraction engine for integrative medicine.

    PubMed

    Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing; Feng, Bo; Taylor, Andrew; Wang, Sheng; Berrocal, Eduardo; Dave, Utpal; Xu, Jinbo; Börnigen, Daniela; Gilliam, T Conrad; Maltsev, Natalia

    2014-01-01

    We have developed Lynx (http://lynx.ci.uchicago.edu)--a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces.

  3. Saccharomyces Genome Database: the genomics resource of budding yeast

    PubMed Central

    Cherry, J. Michael; Hong, Eurie L.; Amundsen, Craig; Balakrishnan, Rama; Binkley, Gail; Chan, Esther T.; Christie, Karen R.; Costanzo, Maria C.; Dwight, Selina S.; Engel, Stacia R.; Fisk, Dianna G.; Hirschman, Jodi E.; Hitz, Benjamin C.; Karra, Kalpana; Krieger, Cynthia J.; Miyasato, Stuart R.; Nash, Rob S.; Park, Julie; Skrzypek, Marek S.; Simison, Matt; Weng, Shuai; Wong, Edith D.

    2012-01-01

    The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use. PMID:22110037

  4. Saccharomyces Genome Database: the genomics resource of budding yeast.

    PubMed

    Cherry, J Michael; Hong, Eurie L; Amundsen, Craig; Balakrishnan, Rama; Binkley, Gail; Chan, Esther T; Christie, Karen R; Costanzo, Maria C; Dwight, Selina S; Engel, Stacia R; Fisk, Dianna G; Hirschman, Jodi E; Hitz, Benjamin C; Karra, Kalpana; Krieger, Cynthia J; Miyasato, Stuart R; Nash, Rob S; Park, Julie; Skrzypek, Marek S; Simison, Matt; Weng, Shuai; Wong, Edith D

    2012-01-01

    The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use. PMID:22110037

  5. Xenbase: expansion and updates of the Xenopus model organism database

    PubMed Central

    James-Zorn, Christina; Ponferrada, Virgilio G.; Jarabek, Chris J.; Burns, Kevin A.; Segerdell, Erik J.; Lee, Jacqueline; Snyder, Kevin; Bhattacharyya, Bishnu; Karpinka, J. Brad; Fortriede, Joshua; Bowes, Jeff B.; Zorn, Aaron M.; Vize, Peter D.

    2013-01-01

    Xenbase (http://www.xenbase.org) is a model organism database that provides genomic, molecular, cellular and developmental biology content to biomedical researchers working with the frog, Xenopus and Xenopus data to workers using other model organisms. As an amphibian Xenopus serves as a useful evolutionary bridge between invertebrates and more complex vertebrates such as birds and mammals. Xenbase content is collated from a variety of external sources using automated and semi-automated pipelines then processed via a combination of automated and manual annotation. A link-matching system allows for the wide variety of synonyms used to describe biological data on unique features, such as a gene or an anatomical entity, to be used by the database in an equivalent manner. Recent updates to the database include the Xenopus laevis genome, a new Xenopus tropicalis genome build, epigenomic data, collections of RNA and protein sequences associated with genes, more powerful gene expression searches, a community and curated wiki, an extensive set of manually annotated gene expression patterns and a new database module that contains data on over 700 antibodies that are useful for exploring Xenopus cell and developmental biology. PMID:23125366

  6. Comprehensive coverage of cardiovascular disease data in the disease portals at the Rat Genome Database.

    PubMed

    Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary

    2016-08-01

    Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. PMID:27287925

  7. Curative cancer chemotherapy.

    PubMed

    Frei, E

    1985-12-01

    Cancer chemotherapy provides variably effective treatment for the majority of forms of human cancer and curative treatment for some 12 categories of cancer. Curative treatment is defined as the proportion of patients who survive beyond the time after which the risk of treatment failure approaches zero, i.e., the disease-free survival plateau. This progress has resulted from a closely integrated scientific effort, including drug development, pharmacology, preclinical modeling, experimental design with respect to clinical trials, quantitative criteria for response, and a series of clinical trials (initially in children with acute lymphocytic leukemia) in which the importance of complete remission, of dose and schedule, of sequencing chemotherapeutic agents, of pharmacological sanctuaries, and particularly of combination chemotherapy was studied. The principles derived from these studies, particularly those relating to combination chemotherapy, resulted in curative treatment for disseminated Hodgkin's disease, non-Hodgkin's lymphoma, pediatric solid tumors, testicular cancer, and limited small cell lung cancer. Many patients with certain stages of solid tumors, such as breast cancer and osteogenic sarcoma, are at high risk of having disseminated microscopic disease. Experimental studies indicate that treatment which is only partially effective against macroscopic disease is much more effective against microscopic tumors. Therefore chemotherapy is administered immediately following control of the primary tumor in patients at high risk of having disseminated microscopic disease, a treatment known as adjuvant chemotherapy. This program has been highly successful in increasing the cure rate in patients with pediatric solid tumors and in prolonging disease-free survival in patients with premenopausal breast cancer. Given dissemination of the technology, it is estimated that 15,000-30,000 patients per year are potentially curable in the United States. Curability of cancer

  8. Is love curative?

    PubMed

    Glucksman, Myron L

    2010-01-01

    This article explores the phenomenon of love in the therapeutic relationship and its role as a curative factor. Since Freud's (1915) description of transference love, a major goal of treatment is to understand its developmental antecedents. Most analysts agree that transference love is no different than ordinary love, except that it is overdetermined and requires the patient to view it as simultaneously real and illusory without reciprocity from the analyst. Nontransferential, realistic elements of the therapeutic relationship also play an important role in treatment. An important outgrowth of the therapeutic process is the development of a new object relationship between analyst and patient. This special or transformative friendship is a new object relationship characterized by genuine feelings of mutual respect, trust, caring, and even love. It facilitates the patient's capacity to form and maintain other loving relationships. Two case presentations are illustrative.

  9. Data mining in the MetaCyc family of pathway databases.

    PubMed

    Karp, Peter D; Paley, Suzanne; Altman, Tomer

    2013-01-01

    Pathway databases collect the bioreactions and molecular interactions that define the processes of life. The MetaCyc family of pathway databases consists of thousands of databases that were derived through computational inference of metabolic pathways from the MetaCyc pathway/genome database (PGDB). In some cases, these DBs underwent subsequent manual curation. Curated pathway DBs are now available for most of the major model organisms. Databases in the MetaCyc family are managed using the Pathway Tools software. This chapter presents methods for performing data mining on the MetaCyc family of pathway DBs. We discuss the major data access mechanisms for the family, which include data files in multiple formats; application programming interfaces (APIs) for the Lisp, Java, and Perl languages; and web services. We present an overview of the Pathway Tools schema, an understanding of which is needed to query the DBs. The chapter also presents several interactive data mining tools within Pathway Tools for performing omics data analysis. PMID:23192547

  10. Data Mining in the MetaCyc Family of Pathway Databases

    PubMed Central

    Karp, Peter D.; Paley, Suzanne; Altman, Tomer

    2013-01-01

    Pathway databases collect the bioreactions and molecular interactions that define the processes of life. The MetaCyc family of pathway databases consists of thousands of databases that were derived through computational inference of metabolic pathways from the MetaCyc Pathway/Genome Database (PGDB). In some cases these DBs underwent subsequent manual curation. Curated pathway DBs are now available for most of the major model organisms. Databases in the MetaCyc family are managed using the Pathway Tools software. This chapter presents methods for performing data mining on the MetaCyc family of pathway DBs. We discuss the major data access mechanisms for the family, which include data files in multiple formats; application programming interfaces (APIs) for the Lisp, Java, and Perl languages; and web services. We present an overview of the Pathway Tools schema, an understanding of which is needed to query the DBs. The chapter also presents several interactive data mining tools within Pathway Tools for performing omics data analysis. PMID:23192547

  11. HistoneDB 2.0: a histone database with variants--an integrated resource to explore histones and their variants.

    PubMed

    Draizen, Eli J; Shaytan, Alexey K; Mariño-Ramírez, Leonardo; Talbert, Paul B; Landsman, David; Panchenko, Anna R

    2016-01-01

    Compaction of DNA into chromatin is a characteristic feature of eukaryotic organisms. The core (H2A, H2B, H3, H4) and linker (H1) histone proteins are responsible for this compaction through the formation of nucleosomes and higher order chromatin aggregates. Moreover, histones are intricately involved in chromatin functioning and provide a means for genome dynamic regulation through specific histone variants and histone post-translational modifications. 'HistoneDB 2.0--with variants' is a comprehensive database of histone protein sequences, classified by histone types and variants. All entries in the database are supplemented by rich sequence and structural annotations with many interactive tools to explore and compare sequences of different variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant. HistoneDB 2.0 is a resource for the interactive comparative analysis of histone protein sequences and their implications for chromatin function. Database URL: http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0.

  12. Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.

    PubMed

    Kim, Sun; Kim, Won; Wei, Chih-Hsuan; Lu, Zhiyong; Wilbur, W John

    2012-01-01

    The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical-gene interactions, chemical-disease relationships and gene-disease relationships. Finding articles containing this information is the first and an important step to assist manual curation efficiency. However, the complex nature of named entities and their relationships make it challenging to choose relevant articles. In this article, we introduce a machine learning framework for prioritizing CTD-relevant articles based on our prior system for the protein-protein interaction article classification task in BioCreative III. To address new challenges in the CTD task, we explore a new entity identification method for genes, chemicals and diseases. In addition, latent topics are analyzed and used as a feature type to overcome the small size of the training set. Applied to the BioCreative 2012 Triage dataset, our method achieved 0.8030 mean average precision (MAP) in the official runs, resulting in the top MAP system among participants. Integrated with PubTator, a Web interface for annotating biomedical literature, the proposed system also received a positive review from the CTD curation team.

  13. Curation and Computational Design of Bioenergy-Related Metabolic Pathways

    SciTech Connect

    Karp, Peter D.

    2014-09-12

    Pathway Tools is a systems-biology software package written by SRI International (SRI) that produces Pathway/Genome Databases (PGDBs) for organisms with a sequenced genome. Pathway Tools also provides a wide range of capabilities for analyzing predicted metabolic networks and user-generated omics data. More than 5,000 academic, industrial, and government groups have licensed Pathway Tools. This user community includes researchers at all three DOE bioenergy centers, as well as academic and industrial metabolic engineering (ME) groups. An integral part of the Pathway Tools software is MetaCyc, a large, multiorganism database of metabolic pathways and enzymes that SRI and its academic collaborators manually curate. This project included two main goals: I. Enhance the MetaCyc content of bioenergy-related enzymes and pathways. II. Develop computational tools for engineering metabolic pathways that satisfy specified design goals, in particular for bioenergy-related pathways. In part I, SRI proposed to significantly expand the coverage of bioenergy-related metabolic information in MetaCyc, followed by the generation of organism-specific PGDBs for all energy-relevant organisms sequenced at the DOE Joint Genome Institute (JGI). Part I objectives included: 1: Expand the content of MetaCyc to include bioenergy-related enzymes and pathways. 2: Enhance the Pathway Tools software to enable display of complex polymer degradation processes. 3: Create new PGDBs for the energy-related organisms sequenced by JGI, update existing PGDBs with new MetaCyc content, and make these data available to JBEI via the BioCyc website. In part II, SRI proposed to develop an efficient computational tool for the engineering of metabolic pathways. Part II objectives included: 4: Develop computational tools for generating metabolic pathways that satisfy specified design goals, enabling users to specify parameters such as starting and ending compounds, and preferred or disallowed intermediate compounds

  14. TimeLineCurator: Interactive Authoring of Visual Timelines from Unstructured Text.

    PubMed

    Fulda, Johanna; Brehmel, Matthew; Munzner, Tamara

    2016-01-01

    We present TimeLineCurator, a browser-based authoring tool that automatically extracts event data from temporal references in unstructured text documents using natural language processing and encodes them along a visual timeline. Our goal is to facilitate the timeline creation process for journalists and others who tell temporal stories online. Current solutions involve manually extracting and formatting event data from source documents, a process that tends to be tedious and error prone. With TimeLineCurator, a prospective timeline author can quickly identify the extent of time encompassed by a document, as well as the distribution of events occurring along this timeline. Authors can speculatively browse possible documents to quickly determine whether they are appropriate sources of timeline material. TimeLineCurator provides controls for curating and editing events on a timeline, the ability to combine timelines from multiple source documents, and export curated timelines for online deployment. We evaluate TimeLineCurator through a benchmark comparison of entity extraction error against a manual timeline curation process, a preliminary evaluation of the user experience of timeline authoring, a brief qualitative analysis of its visual output, and a discussion of prospective use cases suggested by members of the target author communities following its deployment. PMID:26529709

  15. JSC Stardust Curation Team

    NASA Technical Reports Server (NTRS)

    Zolensky, Michael E.

    2000-01-01

    STARDUST, a NASA Discovery-class mission, is the first to return samples from a comet. Grains from comet Wild 2's coma-the gas and dust envelope that surrounds the nucleus-will be collected as well as interstellar dust. The mission which launched on February 7, 1999 will encounter the comet on January 10, 2004. As the spacecraft passes through the coma, a tray of silica aerogel will be exposed, and coma grains will impact there and become captured. Following the collection, the aerogel tray is closed for return to Earth in 2006. A dust impact mass spectrometer on board the STARDUST spacecraft will be used to gather spectra. of dust during the entire mission, including the coma passage. This instrument will be the best chance to obtain data on volatile grains, which will not be well-collected in the aerogel. The dust impact mass spectrometer will also be used to study the composition of interstellar grains. In the past 5 years, analysis of data from dust detectors aboard the Ulysses and Galileo spacecraft have revealed that there is a stream of interstellar dust flowing through our solar system. These grains will be captured during the cruise phase of the STARDUST mission, as the spacecraft travels toward the comet. The sample return capsule will parachute to Earth in February 2006, and will land in western Utah. Once on y the ground, the sample return capsule will be placed into a dry nitrogen environment and flown to the curation lab at JSC.

  16. Kalium: a database of potassium channel toxins from scorpion venom.

    PubMed

    Kuzmenkov, Alexey I; Krylov, Nikolay A; Chugunov, Anton O; Grishin, Eugene V; Vassilevski, Alexander A

    2016-01-01

    Kalium (http://kaliumdb.org/) is a manually curated database that accumulates data on potassium channel toxins purified from scorpion venom (KTx). This database is an open-access resource, and provides easy access to pages of other databases of interest, such as UniProt, PDB, NCBI Taxonomy Browser, and PubMed. General achievements of Kalium are a strict and easy regulation of KTx classification based on the unified nomenclature supported by researchers in the field, removal of peptides with partial sequence and entries supported by transcriptomic information only, classification of β-family toxins, and addition of a novel λ-family. Molecules presented in the database can be processed by the Clustal Omega server using a one-click option. Molecular masses of mature peptides are calculated and available activity data are compiled for all KTx. We believe that Kalium is not only of high interest to professional toxinologists, but also of general utility to the scientific community.Database URL:http://kaliumdb.org/. PMID:27087309

  17. Kalium: a database of potassium channel toxins from scorpion venom

    PubMed Central

    Kuzmenkov, Alexey I.; Krylov, Nikolay A.; Chugunov, Anton O.; Grishin, Eugene V.; Vassilevski, Alexander A.

    2016-01-01

    Kalium (http://kaliumdb.org/) is a manually curated database that accumulates data on potassium channel toxins purified from scorpion venom (KTx). This database is an open-access resource, and provides easy access to pages of other databases of interest, such as UniProt, PDB, NCBI Taxonomy Browser, and PubMed. General achievements of Kalium are a strict and easy regulation of KTx classification based on the unified nomenclature supported by researchers in the field, removal of peptides with partial sequence and entries supported by transcriptomic information only, classification of β-family toxins, and addition of a novel λ-family. Molecules presented in the database can be processed by the Clustal Omega server using a one-click option. Molecular masses of mature peptides are calculated and available activity data are compiled for all KTx. We believe that Kalium is not only of high interest to professional toxinologists, but also of general utility to the scientific community. Database URL: http://kaliumdb.org/ PMID:27087309

  18. Curation of the genome annotation of Pichia pastoris (Komagataella phaffii) CBS7435 from gene level to protein function.

    PubMed

    Valli, Minoska; Tatto, Nadine E; Peymann, Armin; Gruber, Clemens; Landes, Nils; Ekker, Heinz; Thallinger, Gerhard G; Mattanovich, Diethard; Gasser, Brigitte; Graf, Alexandra B

    2016-09-01

    As manually curated and non-automated BLAST analysis of the published Pichia pastoris genome sequences revealed many differences between the gene annotations of the strains GS115 and CBS7435, RNA-Seq analysis, supported by proteomics, was performed to improve the genome annotation. Detailed analysis of sequence alignment and protein domain predictions were made to extend the functional genome annotation to all P. pastoris sequences. This allowed the identification of 492 new ORFs, 4916 hypothetical UTRs and the correction of 341 incorrect ORF predictions, which were mainly due to the presence of upstream ATG or erroneous intron predictions. Moreover, 175 previously erroneously annotated ORFs need to be removed from the annotation. In total, we have annotated 5325 ORFs. Regarding the functionality of those genes, we improved all gene and protein descriptions. Thereby, the percentage of ORFs with functional annotation was increased from 48% to 73%. Furthermore, we defined functional groups, covering 25 biological cellular processes of interest, by grouping all genes that are part of the defined process. All data are presented in the newly launched genome browser and database available at www.pichiagenome.org In summary, we present a wide spectrum of curation of the P. pastoris genome annotation from gene level to protein function. PMID:27388471

  19. KID, a Kinase Inhibitor Database project.

    PubMed

    Collin, O; Meijer, L

    1999-01-01

    The Kinase Inhibitor Database is a small specialized database dedicated to the gathering of information on protein kinase inhibitors. The database is accessible through the World Wide Web system and gives access to structural and bibliographic information on protein kinase inhibitors. The data in the database will be collected and submitted by researchers working in the kinase inhibitor field. The submitted data will be checked by the curator of the database before entry.

  20. A MOD(ern) perspective on literature curation.

    PubMed

    Hirschman, Jodi; Berardini, Tanya Z; Drabkin, Harold J; Howe, Doug

    2010-05-01

    Curation of biological data is a multi-faceted task whose goal is to create a structured, comprehensive, integrated, and accurate resource of current biological knowledge. These structured data facilitate the work of the scientific community by providing knowledge about genes or genomes and by generating validated connections between the data that yield new information and stimulate new research approaches. For the model organism databases (MODs), an important source of data is research publications. Every published paper containing experimental information about a particular model organism is a candidate for curation. All such papers are examined carefully by curators for relevant information. Here, four curators from different MODs describe the literature curation process and highlight approaches taken by the four MODs to address: (1) the decision process by which papers are selected, and (2) the identification and prioritization of the data contained in the paper. We will highlight some of the challenges that MOD biocurators face, and point to ways in which researchers and publishers can support the work of biocurators and the value of such support.

  1. Management Manual.

    ERIC Educational Resources Information Center

    San Joaquin Delta Community Coll. District, CA.

    This manual articulates the rights, responsibilities, entitlements, and conditions of employment of management personnel at San Joaquin Delta College (SJDC). The manual first presents SJDC's mission statement and then discusses the college's management goals and priorities. An examination of SJDC's administrative organization and a list of…

  2. Terminology Manual.

    ERIC Educational Resources Information Center

    Felber, Helmut

    A product of the International Information Center for Terminology (Infoterm), this manual is designed to serve as a reference tool for practitioners active in terminology work and documentation. The manual explores the basic ideas of the Vienna School of Terminology and explains developments in the area of applied computer aided terminography…

  3. Resource Manual

    ERIC Educational Resources Information Center

    Human Development Institute, 2008

    2008-01-01

    This manual was designed primarily for use by individuals with developmental disabilities and related conditions. The main focus of this manual is to provide easy-to-read information concerning available resources, and to provide immediate contact information for the purpose of applying for resources and/or locating additional information. The…

  4. Quality Manual

    NASA Astrophysics Data System (ADS)

    Koch, Michael

    The quality manual is the “heart” of every management system related to quality. Quality assurance in analytical laboratories is most frequently linked with ISO/IEC 17025, which lists the standard requirements for a quality manual. In this chapter examples are used to demonstrate, how these requirements can be met. But, certainly, there are many other ways to do this.

  5. EcoCyc: fusing model organism databases with systems biology.

    PubMed

    Keseler, Ingrid M; Mackie, Amanda; Peralta-Gil, Martin; Santos-Zavaleta, Alberto; Gama-Castro, Socorro; Bonavides-Martínez, César; Fulcher, Carol; Huerta, Araceli M; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Muñiz-Rascado, Luis; Ong, Quang; Paley, Suzanne; Schröder, Imke; Shearer, Alexander G; Subhraveti, Pallavi; Travers, Mike; Weerasinghe, Deepika; Weiss, Verena; Collado-Vides, Julio; Gunsalus, Robert P; Paulsen, Ian; Karp, Peter D

    2013-01-01

    EcoCyc (http://EcoCyc.org) is a model organism database built on the genome sequence of Escherichia coli K-12 MG1655. Expert manual curation of the functions of individual E. coli gene products in EcoCyc has been based on information found in the experimental literature for E. coli K-12-derived strains. Updates to EcoCyc content continue to improve the comprehensive picture of E. coli biology. The utility of EcoCyc is enhanced by new tools available on the EcoCyc web site, and the development of EcoCyc as a teaching tool is increasing the impact of the knowledge collected in EcoCyc.

  6. NESdb: a database of NES-containing CRM1 cargoes.

    PubMed

    Xu, Darui; Grishin, Nick V; Chook, Yuh Min

    2012-09-01

    The leucine-rich nuclear export signal (NES) is the only known class of targeting signal that directs macromolecules out of the cell nucleus. NESs are short stretches of 8-15 amino acids with regularly spaced hydrophobic residues that bind the export karyopherin CRM1. NES-containing proteins are involved in numerous cellular and disease processes. We compiled a database named NESdb that contains 221 NES-containing CRM1 cargoes that were manually curated from the published literature. Each NESdb entry is annotated with information about sequence and structure of both the NES and the cargo protein, as well as information about experimental evidence of NES-mapping and CRM1-mediated nuclear export. NESdb will be updated regularly and will serve as an important resource for nuclear export signals. NESdb is freely available to nonprofit organizations at http://prodata.swmed.edu/LRNes.

  7. NESdb: a database of NES-containing CRM1 cargoes.

    PubMed

    Xu, Darui; Grishin, Nick V; Chook, Yuh Min

    2012-09-01

    The leucine-rich nuclear export signal (NES) is the only known class of targeting signal that directs macromolecules out of the cell nucleus. NESs are short stretches of 8-15 amino acids with regularly spaced hydrophobic residues that bind the export karyopherin CRM1. NES-containing proteins are involved in numerous cellular and disease processes. We compiled a database named NESdb that contains 221 NES-containing CRM1 cargoes that were manually curated from the published literature. Each NESdb entry is annotated with information about sequence and structure of both the NES and the cargo protein, as well as information about experimental evidence of NES-mapping and CRM1-mediated nuclear export. NESdb will be updated regularly and will serve as an important resource for nuclear export signals. NESdb is freely available to nonprofit organizations at http://prodata.swmed.edu/LRNes. PMID:22833564

  8. NESdb: a database of NES-containing CRM1 cargoes

    PubMed Central

    Xu, Darui; Grishin, Nick V.; Chook, Yuh Min

    2012-01-01

    The leucine-rich nuclear export signal (NES) is the only known class of targeting signal that directs macromolecules out of the cell nucleus. NESs are short stretches of 8–15 amino acids with regularly spaced hydrophobic residues that bind the export karyopherin CRM1. NES-containing proteins are involved in numerous cellular and disease processes. We compiled a database named NESdb that contains 221 NES-containing CRM1 cargoes that were manually curated from the published literature. Each NESdb entry is annotated with information about sequence and structure of both the NES and the cargo protein, as well as information about experimental evidence of NES-mapping and CRM1-mediated nuclear export. NESdb will be updated regularly and will serve as an important resource for nuclear export signals. NESdb is freely available to nonprofit organizations at http://prodata.swmed.edu/LRNes. PMID:22833564

  9. Rfam 12.0: updates to the RNA families database

    PubMed Central

    Nawrocki, Eric P.; Burge, Sarah W.; Bateman, Alex; Daub, Jennifer; Eberhardt, Ruth Y.; Eddy, Sean R.; Floden, Evan W.; Gardner, Paul P.; Jones, Thomas A.; Tate, John; Finn, Robert D.

    2015-01-01

    The Rfam database (available at http://rfam.xfam.org) is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families. PMID:25392425

  10. Manual de Carpinteria (Carpentry Manual).

    ERIC Educational Resources Information Center

    TomSing, Luisa B.

    This manual is part of a Mexican series of instructional materials designed for Spanish speaking adults who are in the process of becoming literate or have recently become literate in their native language. The manual describes a carpentry course that is structured to appeal to the student as a self-directing adult. The following units are…

  11. VIEWCACHE: An incremental pointer-base access method for distributed databases. Part 1: The universal index system design document. Part 2: The universal index system low-level design document. Part 3: User's guide. Part 4: Reference manual. Part 5: UIMS test suite

    NASA Technical Reports Server (NTRS)

    Kelley, Steve; Roussopoulos, Nick; Sellis, Timos

    1992-01-01

    The goal of the Universal Index System (UIS), is to provide an easy-to-use and reliable interface to many different kinds of database systems. The impetus for this system was to simplify database index management for users, thus encouraging the use of indexes. As the idea grew into an actual system design, the concept of increasing database performance by facilitating the use of time-saving techniques at the user level became a theme for the project. This Final Report describes the Design, the Implementation of UIS, and its Language Interfaces. It also includes the User's Guide and the Reference Manual.

  12. Managing biological networks by using text mining and computer-aided curation

    NASA Astrophysics Data System (ADS)

    Yu, Seok Jong; Cho, Yongseong; Lee, Min-Ho; Lim, Jongtae; Yoo, Jaesoo

    2015-11-01

    In order to understand a biological mechanism in a cell, a researcher should collect a huge number of protein interactions with experimental data from experiments and the literature. Text mining systems that extract biological interactions from papers have been used to construct biological networks for a few decades. Even though the text mining of literature is necessary to construct a biological network, few systems with a text mining tool are available for biologists who want to construct their own biological networks. We have developed a biological network construction system called BioKnowledge Viewer that can generate a biological interaction network by using a text mining tool and biological taggers. It also Boolean simulation software to provide a biological modeling system to simulate the model that is made with the text mining tool. A user can download PubMed articles and construct a biological network by using the Multi-level Knowledge Emergence Model (KMEM), MetaMap, and A Biomedical Named Entity Recognizer (ABNER) as a text mining tool. To evaluate the system, we constructed an aging-related biological network that consist 9,415 nodes (genes) by using manual curation. With network analysis, we found that several genes, including JNK, AP-1, and BCL-2, were highly related in aging biological network. We provide a semi-automatic curation environment so that users can obtain a graph database for managing text mining results that are generated in the server system and can navigate the network with BioKnowledge Viewer, which is freely available at http://bioknowledgeviewer.kisti.re.kr.

  13. Curation and Analysis of Multitargeting Agents for Polypharmacological Modeling

    PubMed Central

    2015-01-01

    In drug discovery and development, the conventional “single drug, single target” concept has been shifted to “single drug, multiple targets” – a concept coined as polypharmacology. For studies in this emerging field, dedicated and high-quality databases of multitargeting ligands would be exceedingly beneficial. To this end, we conducted a comprehensive analysis of the structural and chemical/biological profiles of polypharmacological agents and present a Web-based database (Polypharma). All of these compounds curated herein have been cocrystallized with more than one unique protein with intensive reports of their multitargeting activities. The present study provides more insight of drug multitargeting and is particularly useful for polypharmacology modeling. This specialized curation has been made publically available at http:/imdlab.org/polypharma/ PMID:25133604

  14. The Comparative Toxicogenomics Database: update 2013.

    PubMed

    Davis, Allan Peter; Murphy, Cynthia Grondin; Johnson, Robin; Lay, Jean M; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; King, Benjamin L; Rosenstein, Michael C; Wiegers, Thomas C; Mattingly, Carolyn J

    2013-01-01

    The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) provides information about interactions between environmental chemicals and gene products and their relationships to diseases. Chemical-gene, chemical-disease and gene-disease interactions manually curated from the literature are integrated to generate expanded networks and predict many novel associations between different data types. CTD now contains over 15 million toxicogenomic relationships. To navigate this sea of data, we added several new features, including DiseaseComps (which finds comparable diseases that share toxicogenomic profiles), statistical scoring for inferred gene-disease and pathway-chemical relationships, filtering options for several tools to refine user analysis and our new Gene Set Enricher (which provides biological annotations that are enriched for gene sets). To improve data visualization, we added a Cytoscape Web view to our ChemComps feature, included color-coded interactions and created a 'slim list' for our MEDIC disease vocabulary (allowing diseases to be grouped for meta-analysis, visualization and better data management). CTD continues to promote interoperability with external databases by providing content and cross-links to their sites. Together, this wealth of expanded chemical-gene-disease data, combined with novel ways to analyze and view content, continues to help users generate testable hypotheses about the molecular mechanisms of environmental diseases.

  15. MeioBase: a comprehensive database for meiosis.

    PubMed

    Li, Hao; Meng, Fanrui; Guo, Chunce; Wang, Yingxiang; Xie, Xiaojing; Zhu, Tiansheng; Zhou, Shuigeng; Ma, Hong; Shan, Hongyan; Kong, Hongzhi

    2014-01-01

    Meiosis is a special type of cell division process necessary for the sexual reproduction of all eukaryotes. The ever expanding meiosis research calls for an effective and specialized database that is not readily available yet. To fill this gap, we have developed a knowledge database MeioBase (http://meiosis.ibcas.ac.cn), which is comprised of two core parts, Resources and Tools. In the Resources part, a wealth of meiosis data collected by curation and manual review from published literatures and biological databases are integrated and organized into various sections, such as Cytology, Pathway, Species, Interaction, and Expression. In the Tools part, some useful tools have been integrated into MeioBase, such as Search, Download, Blast, Comparison, My Favorites, Submission, and Advice. With a simplified and efficient web interface, users are able to search against the database with gene model IDs or keywords, and batch download the data for local investigation. We believe that MeioBase can greatly facilitate the researches related to meiosis. PMID:25566299

  16. DBatVir: the database of bat-associated viruses.

    PubMed

    Chen, Lihong; Liu, Bo; Yang, Jian; Jin, Qi

    2014-01-01

    Emerging infectious diseases remain a significant threat to public health. Most emerging infectious disease agents in humans are of zoonotic origin. Bats are important reservoir hosts of many highly lethal zoonotic viruses and have been implicated in numerous emerging infectious disease events in recent years. It is essential to enhance our knowledge and understanding of the genetic diversity of the bat-associated viruses to prevent future outbreaks. To facilitate further research, we constructed the database of bat-associated viruses (DBatVir). Known viral sequences detected in bat samples were manually collected and curated, along with the related metadata, such as the sampling time, location, bat species and specimen type. Additional information concerning the bats, including common names, diet type, geographic distribution and phylogeny were integrated into the database to bridge the gap between virologists and zoologists. The database currently covers >4100 bat-associated animal viruses of 23 viral families detected from 196 bat species in 69 countries worldwide. It provides an overview and snapshot of the current research regarding bat-associated viruses, which is essential now that the field is rapidly expanding. With a user-friendly interface and integrated online bioinformatics tools, DBatVir provides a convenient and powerful platform for virologists and zoologists to analyze the virome diversity of bats, as well as for epidemiologists and public health researchers to monitor and track current and future bat-related infectious diseases. Database URL: http://www.mgc.ac.cn/DBatVir/.

  17. Refining literature curated protein interactions using expert opinions.

    PubMed

    Tastan, Oznur; Qi, Yanjun; Carbonell, Jaime G; Klein-Seetharaman, Judith

    2015-01-01

    The availability of high-quality physical interaction datasets is a prerequisite for system-level analysis of interactomes and supervised models to predict protein-protein interactions (PPIs). One source is literature-curated PPI databases in which pairwise associations of proteins published in the scientific literature are deposited. However, PPIs may not be clearly labelled as physical interactions affecting the quality of the entire dataset. In order to obtain a high-quality gold standard dataset for PPIs between human immunodeficiency virus (HIV-1) and its human host, we adopted a crowd-sourcing approach. We collected expert opinions and utilized an expectation-maximization based approach to estimate expert labeling quality. These estimates are used to infer the probability of a reported PPI actually being a direct physical interaction given the set of expert opinions. The effectiveness of our approach is demonstrated through synthetic data experiments and a high quality physical interaction network between HIV and human proteins is obtained. Since many literature-curated databases suffer from similar challenges, the framework described herein could be utilized in refining other databases. The curated data is available at http://www.cs.bilkent.edu.tr/~oznur.tastan/supp/psb2015/.

  18. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs

    PubMed Central

    Quek, Xiu Cheng; Thomson, Daniel W.; Maag, Jesper L.V.; Bartonicek, Nenad; Signal, Bethany; Clark, Michael B.; Gloss, Brian S.; Dinger, Marcel E.

    2015-01-01

    Despite the prevalence of long noncoding RNA (lncRNA) genes in eukaryotic genomes, only a small proportion have been examined for biological function. lncRNAdb, available at http://lncrnadb.org, provides users with a comprehensive, manually curated reference database of 287 eukaryotic lncRNAs that have been described independently in the scientific literature. In addition to capturing a great proportion of the recent literature describing functions for individual lncRNAs, lncRNAdb now offers an improved user interface enabling greater accessibility to sequence information, expression data and the literature. The new features in lncRNAdb include the integration of Illumina Body Atlas expression profiles, nucleotide sequence information, a BLAST search tool and easy export of content via direct download or a REST API. lncRNAdb is now endorsed by RNAcentral and is in compliance with the International Nucleotide Sequence Database Collaboration. PMID:25332394

  19. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    PubMed

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics.

  20. The Ribosomal Database Project.

    PubMed Central

    Maidak, B L; Larsen, N; McCaughey, M J; Overbeek, R; Olsen, G J; Fogel, K; Blandy, J; Woese, C R

    1994-01-01

    The Ribosomal Database Project (RDP) is a curated database that offers ribosome-related data, analysis services, and associated computer programs. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (rdp.life.uiuc.edu), electronic mail (server/rdp.life.uiuc.edu) and gopher (rdpgopher.life.uiuc.edu). The electronic mail server also provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for chimeric nature of newly sequenced rRNAs, and automated alignment. PMID:7524021

  1. Towards pathway curation through literature mining--a case study using PharmGKB.

    PubMed

    Ravikumar, K E; Wagholikar, Kavishwar B; Liu, Hongfang

    2014-01-01

    The creation of biological pathway knowledge bases is largely driven by manual effort to curate based on evidences from the scientific literature. It is highly challenging for the curators to keep up with the literature. Text mining applications have been developed in the last decade to assist human curators to speed up the curation pace where majority of them aim to identify the most relevant papers for curation with little attempt to directly extract the pathway information from text. In this paper, we describe a rule-based literature mining system to extract pathway information from text. We evaluated the system using curated pharmacokinetic (PK) and pharmacodynamic (PD) pathways in PharmGKB. The system achieved an F-measure of 63.11% and 34.99% for entity extraction and event extraction respectively against all PubMed abstracts cited in PharmGKB. It may be possible to improve the system performance by incorporating using statistical machine learning approaches. This study also helped us gain insights into the barriers towards automated event extraction from text for pathway curation. PMID:24297561

  2. Towards pathway curation through literature mining--a case study using PharmGKB.

    PubMed

    Ravikumar, K E; Wagholikar, Kavishwar B; Liu, Hongfang

    2014-01-01

    The creation of biological pathway knowledge bases is largely driven by manual effort to curate based on evidences from the scientific literature. It is highly challenging for the curators to keep up with the literature. Text mining applications have been developed in the last decade to assist human curators to speed up the curation pace where majority of them aim to identify the most relevant papers for curation with little attempt to directly extract the pathway information from text. In this paper, we describe a rule-based literature mining system to extract pathway information from text. We evaluated the system using curated pharmacokinetic (PK) and pharmacodynamic (PD) pathways in PharmGKB. The system achieved an F-measure of 63.11% and 34.99% for entity extraction and event extraction respectively against all PubMed abstracts cited in PharmGKB. It may be possible to improve the system performance by incorporating using statistical machine learning approaches. This study also helped us gain insights into the barriers towards automated event extraction from text for pathway curation.

  3. How should the completeness and quality of curated nanomaterial data be evaluated?

    PubMed

    Marchese Robinson, Richard L; Lynch, Iseult; Peijnenburg, Willie; Rumble, John; Klaessig, Fred; Marquardt, Clarissa; Rauscher, Hubert; Puzyn, Tomasz; Purian, Ronit; Åberg, Christoffer; Karcher, Sandra; Vriens, Hanne; Hoet, Peter; Hoover, Mark D; Hendren, Christine Ogilvie; Harper, Stacey L

    2016-05-21

    Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated?

  4. How should the completeness and quality of curated nanomaterial data be evaluated?

    PubMed

    Marchese Robinson, Richard L; Lynch, Iseult; Peijnenburg, Willie; Rumble, John; Klaessig, Fred; Marquardt, Clarissa; Rauscher, Hubert; Puzyn, Tomasz; Purian, Ronit; Åberg, Christoffer; Karcher, Sandra; Vriens, Hanne; Hoet, Peter; Hoover, Mark D; Hendren, Christine Ogilvie; Harper, Stacey L

    2016-05-21

    Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated? PMID:27143028

  5. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  6. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database

    PubMed Central

    Winsor, Geoffrey L.; Griffiths, Emma J.; Lo, Raymond; Dhillon, Bhavjinder K.; Shay, Julie A.; Brinkman, Fiona S. L.

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  7. Teacher Training in Curative Education.

    ERIC Educational Resources Information Center

    Juul, Kristen D.; Maier, Manfred

    1992-01-01

    This article considers the application of the philosophical and educational principles of Rudolf Steiner, called "anthroposophy," to the training of teachers and curative educators in the Waldorf schools. Special emphasis is on the Camphill movement which focuses on therapeutic schools and communities for children with special needs. (DB)

  8. Cognitive Curations of Collaborative Curricula

    ERIC Educational Resources Information Center

    Ackerman, Amy S.

    2015-01-01

    Assuming the role of learning curators, 22 graduate students (in-service teachers) addressed authentic problems (challenges) within their respective classrooms by selecting digital tools as part of implementation of interdisciplinary lesson plans. Students focused on formative assessment tools as a means to gather evidence to make improvements in…

  9. Curation of OSIRIS-REx Asteroid Samples

    NASA Astrophysics Data System (ADS)

    Righter, K.; Nakamura-Messnger, K.; Lauretta, D. S.; Osiris-Rex Curation Working Group

    2013-09-01

    An overview of the mission curation plan will be given, including the main elements of contamination control, sample recovery, sample cleanroom construction, and curation support once the sample is returned to Earth.

  10. Assisted curation: does text mining really help?

    PubMed

    Alex, Beatrice; Grover, Claire; Haddow, Barry; Kabadjov, Mijail; Klein, Ewan; Matthews, Michael; Roebuck, Stuart; Tobin, Richard; Wang, Xinglong

    2008-01-01

    Although text mining shows considerable promise as a tool for supporting the curation of biomedical text, there is little concrete evidence as to its effectiveness. We report on three experiments measuring the extent to which curation can be speeded up with assistance from Natural Language Processing (NLP), together with subjective feedback from curators on the usability of a curation tool that integrates NLP hypotheses for protein-protein interactions (PPIs). In our curation scenario, we found that a maximum speed-up of 1/3 in curation time can be expected if NLP output is perfectly accurate. The preference of one curator for consistent NLP output and output with high recall needs to be confirmed in a larger study with several curators.

  11. Biological Databases for Behavioral Neurobiology

    PubMed Central

    Baker, Erich J.

    2014-01-01

    Databases are, at their core, abstractions of data and their intentionally derived relationships. They serve as a central organizing metaphor and repository, supporting or augmenting nearly all bioinformatics. Behavioral domains provide a unique stage for contemporary databases, as research in this area spans diverse data types, locations, and data relationships. This chapter provides foundational information on the diversity and prevalence of databases, how data structures support the various needs of behavioral neuroscience analysis and interpretation. The focus is on the classes of databases, data curation, and advanced applications in bioinformatics using examples largely drawn from research efforts in behavioral neuroscience. PMID:23195119

  12. Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data.

    PubMed

    Poux, Sylvain; Magrane, Michele; Arighi, Cecilia N; Bridge, Alan; O'Donovan, Claire; Laiho, Kati

    2014-01-01

    UniProtKB/Swiss-Prot provides expert curation with information extracted from literature and curator-evaluated computational analysis. As knowledgebases continue to play an increasingly important role in scientific research, a number of studies have evaluated their accuracy and revealed various errors. While some are curation errors, others are the result of incorrect information published in the scientific literature. By taking the example of sirtuin-5, a complex annotation case, we will describe the curation procedure of UniProtKB/Swiss-Prot and detail how we report conflicting information in the database. We will demonstrate the importance of collaboration between resources to ensure curation consistency and the value of contributions from the user community in helping maintain error-free resources. Database URL: www.uniprot.org.

  13. Pathway Interaction Database (PID) —

    Cancer.gov

    The National Cancer Institute (NCI) in collaboration with Nature Publishing Group has established the Pathway Interaction Database (PID) in order to provide a highly structured, curated collection of information about known biomolecular interactions and key cellular processes assembled into signaling pathways.

  14. The TP53 website: an integrative resource centre for the TP53 mutation database and TP53 mutant analysis

    PubMed Central

    Leroy, Bernard; Fournier, Jean Louis; Ishioka, Chikashi; Monti, Paola; Inga, Alberto; Fronza, Gilberto; Soussi, Thierry

    2013-01-01

    A novel resource centre for TP53 mutations and mutants has been developed (http://p53.fr). TP53 gene dysfunction can be found in the majority of human cancer types. The potential use of TP53 mutation as a biomarker for clinical studies or exposome analysis has led to the publication of thousands of reports describing the TP53 gene status in >10 000 tumours. The UMD TP53 mutation database was created in 1990 and has been regularly updated. The 2012 release of the database has been carefully curated, and all suspicious reports have been eliminated. It is available either as a flat file that can be easily manipulated or as novel multi-platform analytical software that has been designed to analyse various aspects of TP53 mutations. Several tools to ascertain TP53 mutations are also available for download. We have developed TP53MULTLoad, a manually curated database providing comprehensive details on the properties of 2549 missense TP53 mutants. More than 100 000 entries have been arranged in 39 different activity fields, such as change of transactivation on various promoters, apoptosis or growth arrest. For several hot spot mutants, multiple gain of function activities are also included. The database can be easily browsed via a graphical user interface. PMID:23161690

  15. Biosafety Manual

    SciTech Connect

    King, Bruce W.

    2010-05-18

    Work with or potential exposure to biological materials in the course of performing research or other work activities at Lawrence Berkeley National Laboratory (LBNL) must be conducted in a safe, ethical, environmentally sound, and compliant manner. Work must be conducted in accordance with established biosafety standards, the principles and functions of Integrated Safety Management (ISM), this Biosafety Manual, Chapter 26 (Biosafety) of the Health and Safety Manual (PUB-3000), and applicable standards and LBNL policies. The purpose of the Biosafety Program is to protect workers, the public, agriculture, and the environment from exposure to biological agents or materials that may cause disease or other detrimental effects in humans, animals, or plants. This manual provides workers; line management; Environment, Health, and Safety (EH&S) Division staff; Institutional Biosafety Committee (IBC) members; and others with a comprehensive overview of biosafety principles, requirements from biosafety standards, and measures needed to control biological risks in work activities and facilities at LBNL.

  16. ORegAnno 3.0: a community-driven resource for curated regulatory annotation.

    PubMed

    Lesurf, Robert; Cotto, Kelsy C; Wang, Grace; Griffith, Malachi; Kasaian, Katayoon; Jones, Steven J M; Montgomery, Stephen B; Griffith, Obi L

    2016-01-01

    The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/.

  17. Manual for ERIC Awareness Workshop.

    ERIC Educational Resources Information Center

    Strohmenger, C. Todd; Lanham, Berma A.

    This manual, designed to be used with a video tape, provides information for conducting a workshop to familiarize educators with the Educational Resources Information Center (ERIC). Objectives of the workshop include: (1) to develop an understanding of the contents and structure of the ERIC database; (2) to develop an understanding of ERIC as a…

  18. TIPdb: A Database of Anticancer, Antiplatelet, and Antituberculosis Phytochemicals from Indigenous Plants in Taiwan

    PubMed Central

    Lin, Ying-Chi; Wang, Chia-Chi; Chen, Ih-Sheng; Jheng, Jhao-Liang; Li, Jih-Heng; Tung, Chun-Wei

    2013-01-01

    The unique geographic features of Taiwan are attributed to the rich indigenous and endemic plant species in Taiwan. These plants serve as resourceful bank for biologically active phytochemicals. Given that these plant-derived chemicals are prototypes of potential drugs for diseases, databases connecting the chemical structures and pharmacological activities may facilitate drug development. To enhance the utility of the data, it is desirable to develop a database of chemical compounds and corresponding activities from indigenous plants in Taiwan. A database of anticancer, antiplatelet, and antituberculosis phytochemicals from indigenous plants in Taiwan was constructed. The database, TIPdb, is composed of a standardized format of published anticancer, antiplatelet, and antituberculosis phytochemicals from indigenous plants in Taiwan. A browse function was implemented for users to browse the database in a taxonomy-based manner. Search functions can be utilized to filter records of interest by botanical name, part, chemical class, or compound name. The structured and searchable database TIPdb was constructed to serve as a comprehensive and standardized resource for anticancer, antiplatelet, and antituberculosis compounds search. The manually curated chemical structures and activities provide a great opportunity to develop quantitative structure-activity relationship models for the high-throughput screening of potential anticancer, antiplatelet, and antituberculosis drugs. PMID:23766708

  19. ZFIN, the Zebrafish Model Organism Database: increased support for mutants and transgenics.

    PubMed

    Howe, Douglas G; Bradford, Yvonne M; Conlin, Tom; Eagle, Anne E; Fashena, David; Frazer, Ken; Knight, Jonathan; Mani, Prita; Martin, Ryan; Moxon, Sierra A Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruef, Barbara J; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Sprunger, Brock; Van Slyke, Ceri E; Westerfield, Monte

    2013-01-01

    ZFIN, the Zebrafish Model Organism Database (http://zfin.org), is the central resource for zebrafish genetic, genomic, phenotypic and developmental data. ZFIN curators manually curate and integrate comprehensive data involving zebrafish genes, mutants, transgenics, phenotypes, genotypes, gene expressions, morpholinos, antibodies, anatomical structures and publications. Integrated views of these data, as well as data gathered through collaborations and data exchanges, are provided through a wide selection of web-based search forms. Among the vertebrate model organisms, zebrafish are uniquely well suited for rapid and targeted generation of mutant lines. The recent rapid production of mutants and transgenic zebrafish is making management of data associated with these resources particularly important to the research community. Here, we describe recent enhancements to ZFIN aimed at improving our support for mutant and transgenic lines, including (i) enhanced mutant/transgenic search functionality; (ii) more expressive phenotype curation methods; (iii) new downloads files and archival data access; (iv) incorporation of new data loads from laboratories undertaking large-scale generation of mutant or transgenic lines and (v) new GBrowse tracks for transgenic insertions, genes with antibodies and morpholinos.

  20. ZFIN, the Zebrafish Model Organism Database: increased support for mutants and transgenics.

    PubMed

    Howe, Douglas G; Bradford, Yvonne M; Conlin, Tom; Eagle, Anne E; Fashena, David; Frazer, Ken; Knight, Jonathan; Mani, Prita; Martin, Ryan; Moxon, Sierra A Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruef, Barbara J; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Sprunger, Brock; Van Slyke, Ceri E; Westerfield, Monte

    2013-01-01

    ZFIN, the Zebrafish Model Organism Database (http://zfin.org), is the central resource for zebrafish genetic, genomic, phenotypic and developmental data. ZFIN curators manually curate and integrate comprehensive data involving zebrafish genes, mutants, transgenics, phenotypes, genotypes, gene expressions, morpholinos, antibodies, anatomical structures and publications. Integrated views of these data, as well as data gathered through collaborations and data exchanges, are provided through a wide selection of web-based search forms. Among the vertebrate model organisms, zebrafish are uniquely well suited for rapid and targeted generation of mutant lines. The recent rapid production of mutants and transgenic zebrafish is making management of data associated with these resources particularly important to the research community. Here, we describe recent enhancements to ZFIN aimed at improving our support for mutant and transgenic lines, including (i) enhanced mutant/transgenic search functionality; (ii) more expressive phenotype curation methods; (iii) new downloads files and archival data access; (iv) incorporation of new data loads from laboratories undertaking large-scale generation of mutant or transgenic lines and (v) new GBrowse tracks for transgenic insertions, genes with antibodies and morpholinos. PMID:23074187

  1. Recruiter's Manual.

    ERIC Educational Resources Information Center

    Reed, Michael; Recio, Manuel

    The manual assists both experienced and inexperienced personnel in defining and completing the entire range of tasks associated with the position of Pennsylvania Migrant Education Recruiter. The recruiter's primary responsibilities are to identify migrant children in the area and enroll those children eligible under Title I ESEA (Elementary and…

  2. Boilermaking Manual.

    ERIC Educational Resources Information Center

    British Columbia Dept. of Education, Victoria.

    This manual is intended (1) to provide an information resource to supplement the formal training program for boilermaker apprentices; (2) to assist the journeyworker to build on present knowledge to increase expertise and qualify for formal accreditation in the boilermaking trade; and (3) to serve as an on-the-job reference with sound, up-to-date…

  3. Computer Manual.

    ERIC Educational Resources Information Center

    Illinois State Office of Education, Springfield.

    This manual designed to provide the teacher with methods of understanding the computer and its potential in the classroom includes four units with exercises and an answer sheet. Unit 1 covers computer fundamentals, the mini computer, programming languages, an introduction to BASIC, and control instructions. Variable names and constants described…

  4. Student Manual.

    ERIC Educational Resources Information Center

    Stapleton, Diana L., Comp.

    This manual for student assistants employed in the government document section of the Eastern Kentucky University Library covers policy and procedures and use of the major reference tools in this area. General policies and procedures relating to working hours and conditions, and general responsibilities are discussed, as well as shelving rules and…

  5. WormBase 2014: new views of curated biology

    PubMed Central

    Harris, Todd W.; Baran, Joachim; Bieri, Tamberlyn; Cabunoc, Abigail; Chan, Juancarlos; Chen, Wen J.; Davis, Paul; Done, James; Grove, Christian; Howe, Kevin; Kishore, Ranjana; Lee, Raymond; Li, Yuling; Muller, Hans-Michael; Nakamura, Cecilia; Ozersky, Philip; Paulini, Michael; Raciti, Daniela; Schindelman, Gary; Tuli, Mary Ann; Auken, Kimberly Van; Wang, Daniel; Wang, Xiaodong; Williams, Gary; Wong, J. D.; Yook, Karen; Schedl, Tim; Hodgkin, Jonathan; Berriman, Matthew; Kersey, Paul; Spieth, John; Stein, Lincoln; Sternberg, Paul W.

    2014-01-01

    WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest. PMID:24194605

  6. Clean and Cold Sample Curation

    NASA Technical Reports Server (NTRS)

    Allen, C. C.; Agee, C. B.; Beer, R.; Cooper, B. L.

    2000-01-01

    Curation of Mars samples includes both samples that are returned to Earth, and samples that are collected, examined, and archived on Mars. Both kinds of curation operations will require careful planning to ensure that the samples are not contaminated by the instruments that are used to collect and contain them. In both cases, sample examination and subdivision must take place in an environment that is organically, inorganically, and biologically clean. Some samples will need to be prepared for analysis under ultra-clean or cryogenic conditions. Inorganic and biological cleanliness are achievable separately by cleanroom and biosafety lab techniques. Organic cleanliness to the <50 ng/sq cm level requires material control and sorbent removal - techniques being applied in our Class 10 cleanrooms and sample processing gloveboxes.

  7. SpliceDisease database: linking RNA splicing and disease.

    PubMed

    Wang, Juan; Zhang, Jie; Li, Kaibo; Zhao, Wei; Cui, Qinghua

    2012-01-01

    RNA splicing is an important aspect of gene regulation in many organisms. Splicing of RNA is regulated by complicated mechanisms involving numerous RNA-binding proteins and the intricate network of interactions among them. Mutations in cis-acting splicing elements or its regulatory proteins have been shown to be involved in human diseases. Defects in pre-mRNA splicing process have emerged as a common disease-causing mechanism. Therefore, a database integrating RNA splicing and disease associations would be helpful for understanding not only the RNA splicing but also its contribution to disease. In SpliceDisease database, we manually curated 2337 splicing mutation disease entries involving 303 genes and 370 diseases, which have been supported experimentally in 898 publications. The SpliceDisease database provides information including the change of the nucleotide in the sequence, the location of the mutation on the gene, the reference Pubmed ID and detailed description for the relationship among gene mutations, splicing defects and diseases. We standardized the names of the diseases and genes and provided links for these genes to NCBI and UCSC genome browser for further annotation and genomic sequences. For the location of the mutation, we give direct links of the entry to the respective position/region in the genome browser. The users can freely browse, search and download the data in SpliceDisease at http://cmbi.bjmu.edu.cn/sdisease.

  8. PSORTdb: a protein subcellular localization database for bacteria

    PubMed Central

    Rey, Sébastien; Acab, Michael; Gardy, Jennifer L.; Laird, Matthew R.; deFays, Katalin; Lambert, Christophe; Brinkman, Fiona S. L.

    2005-01-01

    Information about bacterial subcellular localization (SCL) is important for protein function prediction and identification of suitable drug/vaccine/diagnostic targets. PSORTdb (http://db.psort.org/) is a web-accessible database of SCL for bacteria that contains both information determined through laboratory experimentation and computational predictions. The dataset of experimentally verified information (∼2000 proteins) was manually curated by us and represents the largest dataset of its kind. Earlier versions have been used for training SCL predictors, and its incorporation now into this new PSORTdb resource, with its associated additional annotation information and dataset version control, should aid researchers in future development of improved SCL predictors. The second component of this database contains computational analyses of proteins deduced from the most recent NCBI dataset of completely sequenced genomes. Analyses are currently calculated using PSORTb, the most precise automated SCL predictor for bacterial proteins. Both datasets can be accessed through the web using a very flexible text search engine, a data browser, or using BLAST, and the entire database or search results may be downloaded in various formats. Features such as GO ontologies and multiple accession numbers are incorporated to facilitate integration with other bioinformatics resources. PSORTdb is freely available under GNU General Public License. PMID:15608169

  9. The SUPERFAMILY database in 2007: families and functions.

    PubMed

    Wilson, Derek; Madera, Martin; Vogel, Christine; Chothia, Cyrus; Gough, Julian

    2007-01-01

    The SUPERFAMILY database provides protein domain assignments, at the SCOP 'superfamily' level, for the predicted protein sequences in over 400 completed genomes. A superfamily groups together domains of different families which have a common evolutionary ancestor based on structural, functional and sequence data. SUPERFAMILY domain assignments are generated using an expert curated set of profile hidden Markov models. All models and structural assignments are available for browsing and download from http://supfam.org. The web interface includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches. In this update we describe the SUPERFAMILY database and outline two major developments: (i) incorporation of family level assignments and (ii) a superfamily-level functional annotation. The SUPERFAMILY database can be used for general protein evolution and superfamily-specific studies, genomic annotation, and structural genomics target suggestion and assessment.

  10. EVOG: a database for evolutionary analysis of overlapping genes.

    PubMed

    Kim, Dae-Soo; Cho, Chi-Young; Huh, Jae-Won; Kim, Heui-Soo; Cho, Hwan-Gue

    2009-01-01

    Overlapping genes are defined as a pair of genes whose transcripts are overlapped. Recently, many cases of overlapped genes have been investigated in various eukaryotic organisms; however, their origin and transcriptional control mechanism has not yet been clearly determined. In this study, we implemented evolutionary visualizer for overlapping genes (EVOG), a Web-based DB with a novel visualization interface, to investigate the evolutionary relationship between overlapping genes. Using this technique, we collected and analyzed all overlapping genes in human, chimpanzee, orangutan, marmoset, rhesus, cow, dog, mouse, rat, chicken, Xenopus, zebrafish and Drosophila. This integrated database provides a manually curated database that displays the evolutionary features of overlapping genes. The EVOG DB components included a number of overlapping genes (10074 in human, 10,009 in chimpanzee, 67,039 in orangutan, 51,001 in marmoset, 219 in rhesus, 3627 in cow, 209 in dog, 10,700 in mouse, 7987 in rat, 1439 in chicken, 597 in Xenopus, 2457 in zebrafish and 4115 in Drosophila). The EVOG database is very effective and easy to use for the analysis of the evolutionary process of overlapping genes when comparing different species. Therefore, EVOG could potentially be used as the main tool to investigate the evolution of the human genome in relation to disease by comparing the expression profiles of overlapping genes. EVOG is available at http://neobio.cs.pusan.ac.kr/evog/.

  11. The MetaboLights repository: curation challenges in metabolomics

    PubMed Central

    Salek, Reza M.; Haug, Kenneth; Conesa, Pablo; Hastings, Janna; Williams, Mark; Mahendraker, Tejasvi; Maguire, Eamonn; González-Beltrán, Alejandra N.; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Steinbeck, Christoph

    2013-01-01

    MetaboLights is the first general-purpose open-access curated repository for metabolomic studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Increases in the number of depositions, number of samples per study and the file size of data submitted to MetaboLights present a challenge for the objective of ensuring high-quality and standardized data in the context of diverse metabolomic workflows and data representations. Here, we describe the MetaboLights curation pipeline, its challenges and its practical application in quality control of complex data depositions. Database URL: http://www.ebi.ac.uk/metabolights PMID:23630246

  12. LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs.

    PubMed

    Ma, Lina; Li, Ang; Zou, Dong; Xu, Xingjian; Xia, Lin; Yu, Jun; Bajic, Vladimir B; Zhang, Zhang

    2015-01-01

    Long non-coding RNAs (lncRNAs) perform a diversity of functions in numerous important biological processes and are implicated in many human diseases. In this report we present lncRNAWiki (http://lncrna.big.ac.cn), a wiki-based platform that is open-content and publicly editable and aimed at community-based curation and collection of information on human lncRNAs. Current related databases are dependent primarily on curation by experts, making it laborious to annotate the exponentially accumulated information on lncRNAs, which inevitably requires collective efforts in community-based curation of lncRNAs. Unlike existing databases, lncRNAWiki features comprehensive integration of information on human lncRNAs obtained from multiple different resources and allows not only existing lncRNAs to be edited, updated and curated by different users but also the addition of newly identified lncRNAs by any user. It harnesses community collective knowledge in collecting, editing and annotating human lncRNAs and rewards community-curated efforts by providing explicit authorship based on quantified contributions. LncRNAWiki relies on the underling knowledge of scientific community for collective and collaborative curation of human lncRNAs and thus has the potential to serve as an up-to-date and comprehensive knowledgebase for human lncRNAs. PMID:25399417

  13. The Candidate Cancer Gene Database: a database of cancer driver genes from forward genetic screens in mice.

    PubMed

    Abbott, Kenneth L; Nyre, Erik T; Abrahante, Juan; Ho, Yen-Yi; Isaksson Vogel, Rachel; Starr, Timothy K

    2015-01-01

    Identification of cancer driver gene mutations is crucial for advancing cancer therapeutics. Due to the overwhelming number of passenger mutations in the human tumor genome, it is difficult to pinpoint causative driver genes. Using transposon mutagenesis in mice many laboratories have conducted forward genetic screens and identified thousands of candidate driver genes that are highly relevant to human cancer. Unfortunately, this information is difficult to access and utilize because it is scattered across multiple publications using different mouse genome builds and strength metrics. To improve access to these findings and facilitate meta-analyses, we developed the Candidate Cancer Gene Database (CCGD, http://ccgd-starrlab.oit.umn.edu/). The CCGD is a manually curated database containing a unified description of all identified candidate driver genes and the genomic location of transposon common insertion sites (CISs) from all currently published transposon-based screens. To demonstrate relevance to human cancer, we performed a modified gene set enrichment analysis using KEGG pathways and show that human cancer pathways are highly enriched in the database. We also used hierarchical clustering to identify pathways enriched in blood cancers compared to solid cancers. The CCGD is a novel resource available to scientists interested in the identification of genetic drivers of cancer.

  14. MET network in PubMed: a text-mined network visualization and curation system

    PubMed Central

    Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian

    2016-01-01

    Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET. Database URL: http://btm.tmu.edu.tw/metastasisway PMID:27242035

  15. MET network in PubMed: a text-mined network visualization and curation system.

    PubMed

    Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian

    2016-01-01

    Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway. PMID:27242035

  16. Carbohydrate structure database merged from bacterial, archaeal, plant and fungal parts.

    PubMed

    Toukach, Philip V; Egorova, Ksenia S

    2016-01-01

    The Carbohydrate Structure Databases (CSDBs, http://csdb.glycoscience.ru) store structural, bibliographic, taxonomic, NMR spectroscopic, and other data on natural carbohydrates and their derivatives published in the scientific literature. The CSDB project was launched in 2005 for bacterial saccharides (as BCSDB). Currently, it includes two parts, the Bacterial CSDB and the Plant&Fungal CSDB. In March 2015, these databases were merged to the single CSDB. The combined CSDB includes information on bacterial and archaeal glycans and derivatives (the coverage is close to complete), as well as on plant and fungal glycans and glycoconjugates (almost all structures published up to 1998). CSDB is regularly updated via manual expert annotation of original publications. Both newly annotated data and data imported from other databases are manually curated. The CSDB data are exportable in a number of modern formats, such as GlycoRDF. CSDB provides additional services for simulation of (1)H, (13)C and 2D NMR spectra of saccharides, NMR-based structure prediction, glycan-based taxon clustering and other.

  17. Carbohydrate structure database merged from bacterial, archaeal, plant and fungal parts

    PubMed Central

    Toukach, Philip V.; Egorova, Ksenia S.

    2016-01-01

    The Carbohydrate Structure Databases (CSDBs, http://csdb.glycoscience.ru) store structural, bibliographic, taxonomic, NMR spectroscopic, and other data on natural carbohydrates and their derivatives published in the scientific literature. The CSDB project was launched in 2005 for bacterial saccharides (as BCSDB). Currently, it includes two parts, the Bacterial CSDB and the Plant&Fungal CSDB. In March 2015, these databases were merged to the single CSDB. The combined CSDB includes information on bacterial and archaeal glycans and derivatives (the coverage is close to complete), as well as on plant and fungal glycans and glycoconjugates (almost all structures published up to 1998). CSDB is regularly updated via manual expert annotation of original publications. Both newly annotated data and data imported from other databases are manually curated. The CSDB data are exportable in a number of modern formats, such as GlycoRDF. CSDB provides additional services for simulation of 1H, 13C and 2D NMR spectra of saccharides, NMR-based structure prediction, glycan-based taxon clustering and other. PMID:26286194

  18. The Web-Based DNA Vaccine Database DNAVaxDB and Its Usage for Rational DNA Vaccine Design.

    PubMed

    Racz, Rebecca; He, Yongqun

    2016-01-01

    A DNA vaccine is a vaccine that uses a mammalian expression vector to express one or more protein antigens and is administered in vivo to induce an adaptive immune response. Since the 1990s, a significant amount of research has been performed on DNA vaccines and the mechanisms behind them. To meet the needs of the DNA vaccine research community, we created DNAVaxDB ( http://www.violinet.org/dnavaxdb ), the first Web-based database and analysis resource of experimentally verified DNA vaccines. All the data in DNAVaxDB, which includes plasmids, antigens, vaccines, and sources, is manually curated and experimentally verified. This chapter goes over the detail of DNAVaxDB system and shows how the DNA vaccine database, combined with the Vaxign vaccine design tool, can be used for rational design of a DNA vaccine against a pathogen, such as Mycobacterium bovis.

  19. Field procedures manual: Sample handling, Salton Sea Scientific Drilling Project

    SciTech Connect

    Goff, S.; Mehegan, J.; Michels, D.

    1989-02-01

    This Field Procedures Manual is the comprehensive operations guide that was used to curate samples obtained from the Salton Sea Scientific Drilling Project (SSSDP). It is being published in the form used on site by the curation team. Samples recovered from the SSSDP were curated following the Policy Guidelines established for the Department of Energy/Office of Basic Energy Sciences (DOE/OBES) Continental Scientific Drilling Program (CSDP)/Thermal Regimes effort, which recognizes the uniqueness and site-specific nature of each drilling project. The SSSDP is a rotary drilling project that has provided cuttings and spot cores as well as liquid and gas samples. This manual provides details on handling all of these sample types. 6 refs., 10 figs.

  20. National Radiobiology Archives Distributed Access user's manual

    SciTech Connect

    Watson, C.; Smith, S. ); Prather, J. )

    1991-11-01

    This User's Manual describes installation and use of the National Radiobiology Archives (NRA) Distributed Access package. The package consists of a distributed subset of information representative of the NRA databases and database access software which provide an introduction to the scope and style of the NRA Information Systems.

  1. Biological Databases for Human Research

    PubMed Central

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-01-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. PMID:25712261

  2. Biological databases for human research.

    PubMed

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-02-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. PMID:25712261

  3. FR database 1.0: a resource focused on fruit development and ripening.

    PubMed

    Yue, Junyang; Ma, Xiaojing; Ban, Rongjun; Huang, Qianli; Wang, Wenjie; Liu, Jia; Liu, Yongsheng

    2015-01-01

    Fruits form unique growing period in the life cycle of higher plants. They provide essential nutrients and have beneficial effects on human health. Characterizing the genes involved in fruit development and ripening is fundamental to understanding the biological process and improving horticultural crops. Although, numerous genes that have been characterized are participated in regulating fruit development and ripening at different stages, no dedicated bioinformatic resource for fruit development and ripening is available. In this study, we have developed such a database, FR database 1.0, using manual curation from 38 423 articles published before 1 April 2014, and integrating protein interactomes and several transcriptome datasets. It provides detailed information for 904 genes derived from 53 organisms reported to participate in fleshy fruit development and ripening. Genes from climacteric and non-climacteric fruits are also annotated, with several interesting Gene Ontology (GO) terms being enriched for these two gene sets and seven ethylene-related GO terms found only in the climacteric fruit group. Furthermore, protein-protein interaction analysis by integrating information from FR database presents the possible function network that affects fleshy fruit size formation. Collectively, FR database will be a valuable platform for comprehensive understanding and future experiments in fruit biology. Database URL: http://www.fruitech.org/

  4. GLASS: a comprehensive database for experimentally validated GPCR-ligand associations

    PubMed Central

    Chan, Wallace K. B.; Zhang, Hongjiu; Yang, Jianyi; Brender, Jeffrey R.; Hur, Junguk; Özgür, Arzucan; Zhang, Yang

    2015-01-01

    Motivation: G protein-coupled receptors (GPCRs) are probably the most attractive drug target membrane proteins, which constitute nearly half of drug targets in the contemporary drug discovery industry. While the majority of drug discovery studies employ existing GPCR and ligand interactions to identify new compounds, there remains a shortage of specific databases with precisely annotated GPCR-ligand associations. Results: We have developed a new database, GLASS, which aims to provide a comprehensive, manually curated resource for experimentally validated GPCR-ligand associations. A new text-mining algorithm was proposed to collect GPCR-ligand interactions from the biomedical literature, which is then crosschecked with five primary pharmacological datasets, to enhance the coverage and accuracy of GPCR-ligand association data identifications. A special architecture has been designed to allow users for making homologous ligand search with flexible bioactivity parameters. The current database contains ∼500 000 unique entries, of which the vast majority stems from ligand associations with rhodopsin- and secretin-like receptors. The GLASS database should find its most useful application in various in silico GPCR screening and functional annotation studies. Availability and implementation: The website of GLASS database is freely available at http://zhanglab.ccmb.med.umich.edu/GLASS/. Contact: zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25971743

  5. Tools and databases of the KOMICS web portal for preprocessing, mining, and dissemination of metabolomics data.

    PubMed

    Sakurai, Nozomu; Ara, Takeshi; Enomoto, Mitsuo; Motegi, Takeshi; Morishita, Yoshihiko; Kurabayashi, Atsushi; Iijima, Yoko; Ogata, Yoshiyuki; Nakajima, Daisuke; Suzuki, Hideyuki; Shibata, Daisuke

    2014-01-01

    A metabolome--the collection of comprehensive quantitative data on metabolites in an organism--has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data. PMID:24949426

  6. FR database 1.0: a resource focused on fruit development and ripening

    PubMed Central

    Yue, Junyang; Ma, Xiaojing; Ban, Rongjun; Huang, Qianli; Wang, Wenjie; Liu, Jia; Liu, Yongsheng

    2015-01-01

    Fruits form unique growing period in the life cycle of higher plants. They provide essential nutrients and have beneficial effects on human health. Characterizing the genes involved in fruit development and ripening is fundamental to understanding the biological process and improving horticultural crops. Although, numerous genes that have been characterized are participated in regulating fruit development and ripening at different stages, no dedicated bioinformatic resource for fruit development and ripening is available. In this study, we have developed such a database, FR database 1.0, using manual curation from 38 423 articles published before 1 April 2014, and integrating protein interactomes and several transcriptome datasets. It provides detailed information for 904 genes derived from 53 organisms reported to participate in fleshy fruit development and ripening. Genes from climacteric and non-climacteric fruits are also annotated, with several interesting Gene Ontology (GO) terms being enriched for these two gene sets and seven ethylene-related GO terms found only in the climacteric fruit group. Furthermore, protein–protein interaction analysis by integrating information from FR database presents the possible function network that affects fleshy fruit size formation. Collectively, FR database will be a valuable platform for comprehensive understanding and future experiments in fruit biology. Database URL: http://www.fruitech.org/ PMID:25725058

  7. Tools and databases of the KOMICS web portal for preprocessing, mining, and dissemination of metabolomics data.

    PubMed

    Sakurai, Nozomu; Ara, Takeshi; Enomoto, Mitsuo; Motegi, Takeshi; Morishita, Yoshihiko; Kurabayashi, Atsushi; Iijima, Yoko; Ogata, Yoshiyuki; Nakajima, Daisuke; Suzuki, Hideyuki; Shibata, Daisuke

    2014-01-01

    A metabolome--the collection of comprehensive quantitative data on metabolites in an organism--has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data.

  8. MVsCarta: A protein database of matrix vesicles to aid understanding of biomineralization.

    PubMed

    Cui, Yazhou; Xu, Quan; Luan, Jing; Hu, Shichang; Pan, Jianbo; Han, Jinxiang; Ji, Zhiliang

    2015-06-01

    Matrix vesicles (MVs) are membranous nanovesicles released by chondrocytes, osteoblasts, and odontoblasts. They play a critical role in modulating mineralization. Here, we present a manually curated database of MV proteins, namely MVsCara to provide comprehensive information on MVs of protein components. In the current version, the database contains 2,713 proteins of six organisms identified in bone, cartilage, tooth tissues, and cells capable of producing a mineralized bone matrix. The MVsCarta database is now freely assessed at http://bioinf.xmu.edu.cn/MVsCarta. The search and browse methods were developed for better retrieval of data. In addition, bioinformatic tools like Gene Ontology (GO) analysis, network visualization and protein-protein interaction analysis were implemented for a functional understanding of MVs components. Similar database hasn't been reported yet. We believe that this free web-based database might serve as a useful repository to elucidate the novel function and regulation of MVs during mineralization, and to stimulate the advancement of MV studies. PMID:26166372

  9. The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse.

    PubMed

    Blake, Judith A; Bult, Carol J; Eppig, Janan T; Kadin, James A; Richardson, Joel E

    2014-01-01

    The Mouse Genome Database (MGD) (http://www.informatics.jax.org) is the community model organism database resource for the laboratory mouse, a premier animal model for the study of genetic and genomic systems relevant to human biology and disease. MGD maintains a comprehensive catalog of genes, functional RNAs and other genome features as well as heritable phenotypes and quantitative trait loci. The genome feature catalog is generated by the integration of computational and manual genome annotations generated by NCBI, Ensembl and Vega/HAVANA. MGD curates and maintains the comprehensive listing of functional annotations for mouse genes using the Gene Ontology, and MGD curates and integrates comprehensive phenotype annotations including associations of mouse models with human diseases. Recent improvements include integration of the latest mouse genome build (GRCm38), improved access to comparative and functional annotations for mouse genes with expanded representation of comparative vertebrate genomes and new loads of phenotype data from high-throughput phenotyping projects. All MGD resources are freely available to the research community.

  10. EpiFactors: a comprehensive database of human epigenetic factors and complexes

    PubMed Central

    Medvedeva, Yulia A.; Lennartsson, Andreas; Ehsani, Rezvan; Kulakovskiy, Ivan V.; Vorontsov, Ilya E.; Panahandeh, Pouda; Khimulya, Grigory; Kasukawa, Takeya; Drabløs, Finn

    2015-01-01

    Epigenetics refers to stable and long-term alterations of cellular traits that are not caused by changes in the DNA sequence per se. Rather, covalent modifications of DNA and histones affect gene expression and genome stability via proteins that recognize and act upon such modifications. Many enzymes that catalyse epigenetic modifications or are critical for enzymatic complexes have been discovered, and this is encouraging investigators to study the role of these proteins in diverse normal and pathological processes. Rapidly growing knowledge in the area has resulted in the need for a resource that compiles, organizes and presents curated information to the researchers in an easily accessible and user-friendly form. Here we present EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets and products. EpiFactors contains information on 815 proteins, including 95 histones and protamines. For 789 of these genes, we include expressions values across several samples, in particular a collection of 458 human primary cell samples (for approximately 200 cell types, in many cases from three individual donors), covering most mammalian cell steady states, 255 different cancer cell lines (representing approximately 150 cancer subtypes) and 134 human postmortem tissues. Expression values were obtained by the FANTOM5 consortium using Cap Analysis of Gene Expression technique. EpiFactors also contains information on 69 protein complexes that are involved in epigenetic regulation. The resource is practical for a wide range of users, including biologists, pharmacologists and clinicians. Database URL: http://epifactors.autosome.ru PMID:26153137

  11. Curcumin Resource Database

    PubMed Central

    Kumar, Anil; Chetia, Hasnahana; Sharma, Swagata; Kabiraj, Debajyoti; Talukdar, Narayan Chandra; Bora, Utpal

    2015-01-01

    Curcumin is one of the most intensively studied diarylheptanoid, Curcuma longa being its principal producer. This apart, a class of promising curcumin analogs has been generated in laboratories, aptly named as Curcuminoids which are showing huge potential in the fields of medicine, food technology, etc. The lack of a universal source of data on curcumin as well as curcuminoids has been felt by the curcumin research community for long. Hence, in an attempt to address this stumbling block, we have developed Curcumin Resource Database (CRDB) that aims to perform as a gateway-cum-repository to access all relevant data and related information on curcumin and its analogs. Currently, this database encompasses 1186 curcumin analogs, 195 molecular targets, 9075 peer reviewed publications, 489 patents and 176 varieties of C. longa obtained by extensive data mining and careful curation from numerous sources. Each data entry is identified by a unique CRDB ID (identifier). Furnished with a user-friendly web interface and in-built search engine, CRDB provides well-curated and cross-referenced information that are hyperlinked with external sources. CRDB is expected to be highly useful to the researchers working on structure as well as ligand-based molecular design of curcumin analogs. Database URL: http://www.crdb.in PMID:26220923

  12. Curation of Osiris-REx Asteroid Samples

    NASA Technical Reports Server (NTRS)

    Righter, K.; Nakamura-Messenger, K.; Lauretta, D. S.

    2013-01-01

    The New Frontiers mission, OSIRIS-REx, will encounter carbonaceous asteroid 101955 Bennu (1999 RQ36; [1]) in 2018, collect a sample and return it to Earth and deliver it to NASA-JSC for curation in 2023. The mission curation plan is being developed and an overview will be given, including the main elements of contamination control, sample recovery, cleanroom construction, and curation support once the sample is returned to Earth.

  13. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    SciTech Connect

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  14. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  15. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

    PubMed

    Welter, Danielle; MacArthur, Jacqueline; Morales, Joannella; Burdett, Tony; Hall, Peggy; Junkins, Heather; Klemm, Alan; Flicek, Paul; Manolio, Teri; Hindorff, Lucia; Parkinson, Helen

    2014-01-01

    The National Human Genome Research Institute (NHGRI) Catalog of Published Genome-Wide Association Studies (GWAS) Catalog provides a publicly available manually curated collection of published GWAS assaying at least 100,000 single-nucleotide polymorphisms (SNPs) and all SNP-trait associations with P <1 × 10(-5). The Catalog includes 1751 curated publications of 11 912 SNPs. In addition to the SNP-trait association data, the Catalog also publishes a quarterly diagram of all SNP-trait associations mapped to the SNPs' chromosomal locations. The Catalog can be accessed via a tabular web interface, via a dynamic visualization on the human karyotype, as a downloadable tab-delimited file and as an OWL knowledge base. This article presents a number of recent improvements to the Catalog, including novel ways for users to interact with the Catalog and changes to the curation infrastructure.

  16. Ci4SeR--curation interface for semantic resources--evaluation with adverse drug reactions.

    PubMed

    Souvignet, Julien; Asfari, Hadyl; Declerck, Gunnar; Lardon, Jérémy; Trombert-Paviot, Béatrice; Jaulent, Marie-Christine; Bousquet, Cédric

    2014-01-01

    Evaluation and validation have become a crucial problem for the development of semantic resources. We developed Ci4SeR, a Graphical User Interface to optimize the curation work (not taking into account structural aspects), suitable for any type of resource with lightweight description logic. We tested it on OntoADR, an ontology of adverse drug reactions. A single curator has reviewed 326 terms (1020 axioms) in an estimated time of 120 hours (2.71 concepts and 8.5 axioms reviewed per hour) and added 1874 new axioms (15.6 axioms per hour). Compared with previous manual endeavours, the interface allows increasing the speed-rate of reviewed concepts by 68% and axiom addition by 486%. A wider use of Ci4SeR would help semantic resources curation and improve completeness of knowledge modelling.

  17. The Listeria monocytogenes strain 10403S BioCyc database.

    PubMed

    Orsi, Renato H; Bergholz, Teresa M; Wiedmann, Martin; Boor, Kathryn J

    2015-01-01

    Listeria monocytogenes is a food-borne pathogen of humans and other animals. The striking ability to survive several stresses usually used for food preservation makes L. monocytogenes one of the biggest concerns to the food industry, while the high mortality of listeriosis in specific groups of humans makes it a great concern for public health. Previous studies have shown that a regulatory network involving alternative sigma (σ) factors and transcription factors is pivotal to stress survival. However, few studies have evaluated at the metabolic networks controlled by these regulatory mechanisms. The L. monocytogenes BioCyc database uses the strain 10403S as a model. Computer-generated initial annotation for all genes also allowed for identification, annotation and display of predicted reactions and pathways carried out by a single cell. Further ongoing manual curation based on published data as well as database mining for selected genes allowed the more refined annotation of functions, which, in turn, allowed for annotation of new pathways and fine-tuning of previously defined pathways to more L. monocytogenes-specific pathways. Using RNA-Seq data, several transcription start sites and promoter regions were mapped to the 10403S genome and annotated within the database. Additionally, the identification of promoter regions and a comprehensive review of available literature allowed the annotation of several regulatory interactions involving σ factors and transcription factors. The L. monocytogenes 10403S BioCyc database is a new resource for researchers studying Listeria and related organisms. It allows users to (i) have a comprehensive view of all reactions and pathways predicted to take place within the cell in the cellular overview, as well as to (ii) upload their own data, such as differential expression data, to visualize the data in the scope of predicted pathways and regulatory networks and to carry on enrichment analyses using several different annotations

  18. HistoneDB 2.0: a histone database with variants—an integrated resource to explore histones and their variants

    PubMed Central

    Draizen, Eli J.; Shaytan, Alexey K.; Mariño-Ramírez, Leonardo; Talbert, Paul B.; Landsman, David; Panchenko, Anna R.

    2016-01-01

    Compaction of DNA into chromatin is a characteristic feature of eukaryotic organisms. The core (H2A, H2B, H3, H4) and linker (H1) histone proteins are responsible for this compaction through the formation of nucleosomes and higher order chromatin aggregates. Moreover, histones are intricately involved in chromatin functioning and provide a means for genome dynamic regulation through specific histone variants and histone post-translational modifications. ‘HistoneDB 2.0 – with variants’ is a comprehensive database of histone protein sequences, classified by histone types and variants. All entries in the database are supplemented by rich sequence and structural annotations with many interactive tools to explore and compare sequences of different variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant. HistoneDB 2.0 is a resource for the interactive comparative analysis of histone protein sequences and their implications for chromatin function. Database URL: http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0 PMID:26989147

  19. Data Curation Is for Everyone! The Case for Master's and Baccalaureate Institutional Engagement with Data Curation

    ERIC Educational Resources Information Center

    Shorish, Yasmeen

    2012-01-01

    This article describes the fundamental challenges to data curation, how these challenges may be compounded for smaller institutions, and how data management is an essential and manageable component of data curation. Data curation is often discussed within the confines of large research universities. As a result, master's and baccalaureate…

  20. DESTAF: a database of text-mined associations for reproductive toxins potentially affecting human fertility.

    PubMed

    Dawe, Adam S; Radovanovic, Aleksandar; Kaur, Mandeep; Sagar, Sunil; Seshadri, Sundararajan V; Schaefer, Ulf; Kamau, Allan A; Christoffels, Alan; Bajic, Vladimir B

    2012-01-01

    The Dragon Exploration System for Toxicants and Fertility (DESTAF) is a publicly available resource which enables researchers to efficiently explore both known and potentially novel information and associations in the field of reproductive toxicology. To create DESTAF we used data from the literature (including over 10500 PubMed abstracts), several publicly available biomedical repositories, and specialized, curated dictionaries. DESTAF has an interface designed to facilitate rapid assessment of the key associations between relevant concepts, allowing for a more in-depth exploration of information based on different gene/protein-, enzyme/metabolite-, toxin/chemical-, disease- or anatomically centric perspectives. As a special feature, DESTAF allows for the creation and initial testing of potentially new association hypotheses that suggest links between biological entities identified through the database. DESTAF, along with a PDF manual, can be found at http://cbrc.kaust.edu.sa/destaf. It is free to academic and non-commercial users and will be updated quarterly. PMID:22198179

  1. Manual compactor

    NASA Technical Reports Server (NTRS)

    Stevenson, Grant E. (Inventor)

    1979-01-01

    A manual compactor having two handles each pivoted at one end for movement through adjacent arcs toward and away from each other, such reciprocating activation motion being translated into rotary motion in a single direction by means of ratchet and pawl arrangements about the pivot shaft of each handle, and thenceforth to rotary motion of opposing screws one each of which is driven by each handle, which in turn act through ball nut structures to forcibly draw together plates with force sufficient for compacting, the handles also having provisions for actuating push rod within the handles for the purpose of disengaging the pawls from the ratchets thereby allowing retraction through spring loading of the plates and repositioning of the apparatus for subsequent compacting.

  2. The RadAssessor manual

    SciTech Connect

    Seitz, Sharon L.

    2007-02-01

    THIS manual will describe the functions and capabilities that are available from the RadAssessor database and will demonstrate how to retrieve and view its information. You’ll learn how to start the database application, how to log in, how to use the common commands, and how to use the online help if you have a question or need extra guidance. RadAssessor can be viewed from any standard web browser. Therefore, you will not need to install any special software before using RadAssessor.

  3. How should the completeness and quality of curated nanomaterial data be evaluated?

    NASA Astrophysics Data System (ADS)

    Marchese Robinson, Richard L.; Lynch, Iseult; Peijnenburg, Willie; Rumble, John; Klaessig, Fred; Marquardt, Clarissa; Rauscher, Hubert; Puzyn, Tomasz; Purian, Ronit; Åberg, Christoffer; Karcher, Sandra; Vriens, Hanne; Hoet, Peter; Hoover, Mark D.; Hendren, Christine Ogilvie; Harper, Stacey L.

    2016-05-01

    Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated?Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict

  4. The Pfam protein families database: towards a more sustainable future.

    PubMed

    Finn, Robert D; Coggill, Penelope; Eberhardt, Ruth Y; Eddy, Sean R; Mistry, Jaina; Mitchell, Alex L; Potter, Simon C; Punta, Marco; Qureshi, Matloob; Sangrador-Vegas, Amaia; Salazar, Gustavo A; Tate, John; Bateman, Alex

    2016-01-01

    In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

  5. TarNet: An Evidence-Based Database for Natural Medicine Research

    PubMed Central

    Ren, Guomin; Sun, Guibo; Sun, Xiaobo

    2016-01-01

    Background Complex diseases seriously threaten human health. Drug discovery approaches based on “single genes, single drugs, and single targets” are limited in targeting complex diseases. The development of new multicomponent drugs for complex diseases is imperative, and the establishment of a suitable solution for drug group-target protein network analysis is a key scientific problem that must be addressed. Herbal medicines have formed the basis of sophisticated systems of traditional medicine and have given rise to some key drugs that remain in use today. The search for new molecules is currently taking a different route, whereby scientific principles of ethnobotany and ethnopharmacognosy are being used by chemists in the discovery of different sources and classes of compounds. Results In this study, we developed TarNet, a manually curated database and platform of traditional medicinal plants with natural compounds that includes potential bio-target information. We gathered information on proteins that are related to or affected by medicinal plant ingredients and data on protein–protein interactions (PPIs). TarNet includes in-depth information on both plant–compound–protein relationships and PPIs. Additionally, TarNet can provide researchers with network construction analyses of biological pathways and protein–protein interactions (PPIs) associated with specific diseases. Researchers can upload a gene or protein list mapped to our PPI database that has been manually curated to generate relevant networks. Multiple functions are accessible for network topological calculations, subnetwork analyses, pathway analyses, and compound–protein relationships. Conclusions TarNet will serve as a useful analytical tool that will provide information on medicinal plant compound-affected proteins (potential targets) and system-level analyses for systems biology and network pharmacology researchers. TarNet is freely available at http://www.herbbol.org:8001/tarnet

  6. Plant Omics Data Center: an integrated web repository for interspecies gene expression networks with NLP-based curation.

    PubMed

    Ohyanagi, Hajime; Takano, Tomoyuki; Terashima, Shin; Kobayashi, Masaaki; Kanno, Maasa; Morimoto, Kyoko; Kanegae, Hiromi; Sasaki, Yohei; Saito, Misa; Asano, Satomi; Ozaki, Soichi; Kudo, Toru; Yokoyama, Koji; Aya, Koichiro; Suwabe, Keita; Suzuki, Go; Aoki, Koh; Kubo, Yasutaka; Watanabe, Masao; Matsuoka, Makoto; Yano, Kentaro

    2015-01-01

    Comprehensive integration of large-scale omics resources such as genomes, transcriptomes and metabolomes will provide deeper insights into broader aspects of molecular biology. For better understanding of plant biology, we aim to construct a next-generation sequencing (NGS)-derived gene expression network (GEN) repository for a broad range of plant species. So far we have incorporated information about 745 high-quality mRNA sequencing (mRNA-Seq) samples from eight plant species (Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Sorghum bicolor, Vitis vinifera, Solanum tuberosum, Medicago truncatula and Glycine max) from the public short read archive, digitally profiled the entire set of gene expression profiles, and drawn GENs by using correspondence analysis (CA) to take advantage of gene expression similarities. In order to understand the evolutionary significance of the GENs from multiple species, they were linked according to the orthology of each node (gene) among species. In addition to other gene expression information, functional annotation of the genes will facilitate biological comprehension. Currently we are improving the given gene annotations with natural language processing (NLP) techniques and manual curation. Here we introduce the current status of our analyses and the web database, PODC (Plant Omics Data Center; http://bioinf.mind.meiji.ac.jp/podc/), now open to the public, providing GENs, functional annotations and additional comprehensive omics resources.

  7. Plant Omics Data Center: An Integrated Web Repository for Interspecies Gene Expression Networks with NLP-Based Curation

    PubMed Central

    Ohyanagi, Hajime; Takano, Tomoyuki; Terashima, Shin; Kobayashi, Masaaki; Kanno, Maasa; Morimoto, Kyoko; Kanegae, Hiromi; Sasaki, Yohei; Saito, Misa; Asano, Satomi; Ozaki, Soichi; Kudo, Toru; Yokoyama, Koji; Aya, Koichiro; Suwabe, Keita; Suzuki, Go; Aoki, Koh; Kubo, Yasutaka; Watanabe, Masao; Matsuoka, Makoto; Yano, Kentaro

    2015-01-01

    Comprehensive integration of large-scale omics resources such as genomes, transcriptomes and metabolomes will provide deeper insights into broader aspects of molecular biology. For better understanding of plant biology, we aim to construct a next-generation sequencing (NGS)-derived gene expression network (GEN) repository for a broad range of plant species. So far we have incorporated information about 745 high-quality mRNA sequencing (mRNA-Seq) samples from eight plant species (Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Sorghum bicolor, Vitis vinifera, Solanum tuberosum, Medicago truncatula and Glycine max) from the public short read archive, digitally profiled the entire set of gene expression profiles, and drawn GENs by using correspondence analysis (CA) to take advantage of gene expression similarities. In order to understand the evolutionary significance of the GENs from multiple species, they were linked according to the orthology of each node (gene) among species. In addition to other gene expression information, functional annotation of the genes will facilitate biological comprehension. Currently we are improving the given gene annotations with natural language processing (NLP) techniques and manual curation. Here we introduce the current status of our analyses and the web database, PODC (Plant Omics Data Center; http://bioinf.mind.meiji.ac.jp/podc/), now open to the public, providing GENs, functional annotations and additional comprehensive omics resources. PMID:25505034

  8. Curatable Named-Entity Recognition Using Semantic Relations.

    PubMed

    Hsu, Yi-Yu; Kao, Hung-Yu

    2015-01-01

    Named-entity recognition (NER) plays an important role in the development of biomedical databases. However, the existing NER tools produce multifarious named-entities which may result in both curatable and non-curatable markers. To facilitate biocuration with a straightforward approach, classifying curatable named-entities is helpful with regard to accelerating the biocuration workflow. Co-occurrence Interaction Nexus with Named-entity Recognition (CoINNER) is a web-based tool that allows users to identify genes, chemicals, diseases, and action term mentions in the Comparative Toxicogenomic Database (CTD). To further discover interactions, CoINNER uses multiple advanced algorithms to recognize the mentions in the BioCreative IV CTD Track. CoINNER is developed based on a prototype system that annotated gene, chemical, and disease mentions in PubMed abstracts at BioCreative 2012 Track I (literature triage). We extended our previous system in developing CoINNER. The pre-tagging results of CoINNER were developed based on the state-of-the-art named entity recognition tools in BioCreative III. Next, a method based on conditional random fields (CRFs) is proposed to predict chemical and disease mentions in the articles. Finally, action term mentions were collected by latent Dirichlet allocation (LDA). At the BioCreative IV CTD Track, the best F-measures reached for gene/protein, chemical/drug and disease NER were 54 percent while CoINNER achieved a 61.5 percent F-measure. System URL: http://ikmbio.csie.ncku.edu.tw/coinner/ introduction.htm. PMID:26357317

  9. Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing

    PubMed Central

    Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E.; de Magalhães, João Pedro

    2013-01-01

    The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology. PMID:23193293

  10. Human Ageing Genomic Resources: integrated databases and tools for the biology and genetics of ageing.

    PubMed

    Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E; de Magalhães, João Pedro

    2013-01-01

    The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology.

  11. BioModels Database: a repository of mathematical models of biological processes.

    PubMed

    Chelliah, Vijayalakshmi; Laibe, Camille; Le Novère, Nicolas

    2013-01-01

    BioModels Database is a public online resource that allows storing and sharing of published, peer-reviewed quantitative, dynamic models of biological processes. The model components and behaviour are thoroughly checked to correspond the original publication and manually curated to ensure reliability. Furthermore, the model elements are annotated with terms from controlled vocabularies as well as linked to relevant external data resources. This greatly helps in model interpretation and reuse. Models are stored in SBML format, accepted in SBML and CellML formats, and are available for download in various other common formats such as BioPAX, Octave, SciLab, VCML, XPP and PDF, in addition to SBML. The reaction network diagram of the models is also available in several formats. BioModels Database features a search engine, which provides simple and more advanced searches. Features such as online simulation and creation of smaller models (submodels) from the selected model elements of a larger one are provided. BioModels Database can be accessed both via a web interface and programmatically via web services. New models are available in BioModels Database at regular releases, about every 4 months. PMID:23715986

  12. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions.

    PubMed

    Bretaudeau, Anthony; Coste, François; Humily, Florian; Garczarek, Laurence; Le Corguillé, Gildas; Six, Christophe; Ratin, Morgane; Collin, Olivier; Schluchter, Wendy M; Partensky, Frédéric

    2013-01-01

    CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers.

  13. Using the Reactome Database

    PubMed Central

    Haw, Robin

    2012-01-01

    There is considerable interest in the bioinformatics community in creating pathway databases. The Reactome project (a collaboration between the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University Medical Center and the European Bioinformatics Institute) is one such pathway database and collects structured information on all the biological pathways and processes in the human. It is an expert-authored and peer-reviewed, curated collection of well-documented molecular reactions that span the gamut from simple intermediate metabolism to signaling pathways and complex cellular events. This information is supplemented with likely orthologous molecular reactions in mouse, rat, zebrafish, worm and other model organisms. This unit describes how to use the Reactome database to learn the steps of a biological pathway; navigate and browse through the Reactome database; identify the pathways in which a molecule of interest is involved; use the Pathway and Expression analysis tools to search the database for and visualize possible connections within user-supplied experimental data set and Reactome pathways; and the Species Comparison tool to compare human and model organism pathways. PMID:22700314

  14. Curcumin Resource Database.

    PubMed

    Kumar, Anil; Chetia, Hasnahana; Sharma, Swagata; Kabiraj, Debajyoti; Talukdar, Narayan Chandra; Bora, Utpal

    2015-01-01

    Curcumin is one of the most intensively studied diarylheptanoid, Curcuma longa being its principal producer. This apart, a class of promising curcumin analogs has been generated in laboratories, aptly named as Curcuminoids which are showing huge potential in the fields of medicine, food technology, etc. The lack of a universal source of data on curcumin as well as curcuminoids has been felt by the curcumin research community for long. Hence, in an attempt to address this stumbling block, we have developed Curcumin Resource Database (CRDB) that aims to perform as a gateway-cum-repository to access all relevant data and related information on curcumin and its analogs. Currently, this database encompasses 1186 curcumin analogs, 195 molecular targets, 9075 peer reviewed publications, 489 patents and 176 varieties of C. longa obtained by extensive data mining and careful curation from numerous sources. Each data entry is identified by a unique CRDB ID (identifier). Furnished with a user-friendly web interface and in-built search engine, CRDB provides well-curated and cross-referenced information that are hyperlinked with external sources. CRDB is expected to be highly useful to the researchers working on structure as well as ligand-based molecular design of curcumin analogs.

  15. Curcumin Resource Database.

    PubMed

    Kumar, Anil; Chetia, Hasnahana; Sharma, Swagata; Kabiraj, Debajyoti; Talukdar, Narayan Chandra; Bora, Utpal

    2015-01-01

    Curcumin is one of the most intensively studied diarylheptanoid, Curcuma longa being its principal producer. This apart, a class of promising curcumin analogs has been generated in laboratories, aptly named as Curcuminoids which are showing huge potential in the fields of medicine, food technology, etc. The lack of a universal source of data on curcumin as well as curcuminoids has been felt by the curcumin research community for long. Hence, in an attempt to address this stumbling block, we have developed Curcumin Resource Database (CRDB) that aims to perform as a gateway-cum-repository to access all relevant data and related information on curcumin and its analogs. Currently, this database encompasses 1186 curcumin analogs, 195 molecular targets, 9075 peer reviewed publications, 489 patents and 176 varieties of C. longa obtained by extensive data mining and careful curation from numerous sources. Each data entry is identified by a unique CRDB ID (identifier). Furnished with a user-friendly web interface and in-built search engine, CRDB provides well-curated and cross-referenced information that are hyperlinked with external sources. CRDB is expected to be highly useful to the researchers working on structure as well as ligand-based molecular design of curcumin analogs. PMID:26220923

  16. Use of Semantic Technology to Create Curated Data Albums

    NASA Technical Reports Server (NTRS)

    Ramachandran, Rahul; Kulkarni, Ajinkya; Li, Xiang; Sainju, Roshan; Bakare, Rohan; Basyal, Sabin; Fox, Peter (Editor); Norack, Tom (Editor)

    2014-01-01

    One of the continuing challenges in any Earth science investigation is the discovery and access of useful science content from the increasingly large volumes of Earth science data and related information available online. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. Those who know exactly the data sets they need can obtain the specific files using these systems. However, in cases where researchers are interested in studying an event of research interest, they must manually assemble a variety of relevant data sets by searching the different distributed data systems. Consequently, there is a need to design and build specialized search and discovery tools in Earth science that can filter through large volumes of distributed online data and information and only aggregate the relevant resources needed to support climatology and case studies. This paper presents a specialized search and discovery tool that automatically creates curated Data Albums. The tool was designed to enable key elements of the search process such as dynamic interaction and sense-making. The tool supports dynamic interaction via different modes of interactivity and visual presentation of information. The compilation of information and data into a Data Album is analogous to a shoebox within the sense-making framework. This tool automates most of the tedious information/data gathering tasks for researchers. Data curation by the tool is achieved via an ontology-based, relevancy ranking algorithm that filters out non-relevant information and data. The curation enables better search results as compared to the simple keyword searches provided by existing data systems in Earth science.

  17. Use of Semantic Technology to Create Curated Data Albums

    NASA Technical Reports Server (NTRS)

    Ramachandran, Rahul; Kulkarni, Ajinkya; Li, Xiang; Sainju, Roshan; Bakare, Rohan; Basyal, Sabin

    2014-01-01

    One of the continuing challenges in any Earth science investigation is the discovery and access of useful science content from the increasingly large volumes of Earth science data and related information available online. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. Those who know exactly the data sets they need can obtain the specific files using these systems. However, in cases where researchers are interested in studying an event of research interest, they must manually assemble a variety of relevant data sets by searching the different distributed data systems. Consequently, there is a need to design and build specialized search and discover tools in Earth science that can filter through large volumes of distributed online data and information and only aggregate the relevant resources needed to support climatology and case studies. This paper presents a specialized search and discovery tool that automatically creates curated Data Albums. The tool was designed to enable key elements of the search process such as dynamic interaction and sense-making. The tool supports dynamic interaction via different modes of interactivity and visual presentation of information. The compilation of information and data into a Data Album is analogous to a shoebox within the sense-making framework. This tool automates most of the tedious information/data gathering tasks for researchers. Data curation by the tool is achieved via an ontology-based, relevancy ranking algorithm that filters out nonrelevant information and data. The curation enables better search results as compared to the simple keyword searches provided by existing data systems in Earth science.

  18. Literature-Based Gene Curation and Proposed Genetic Nomenclature for Cryptococcus

    PubMed Central

    Inglis, Diane O.; Skrzypek, Marek S.; Liaw, Edward; Moktali, Venkatesh; Sherlock, Gavin

    2014-01-01

    Cryptococcus, a major cause of disseminated infections in immunocompromised patients, kills over 600,000 people per year worldwide. Genes involved in the virulence of the meningitis-causing fungus are being characterized at an increasing rate, and to date, at least 648 Cryptococcus gene names have been published. However, these data are scattered throughout the literature and are challenging to find. Furthermore, conflicts in locus identification exist, so that named genes have been subsequently published under new names or names associated with one locus have been used for another locus. To avoid these conflicts and to provide a central source of Cryptococcus gene information, we have collected all published Cryptococcus gene names from the scientific literature and associated them with standard Cryptococcus locus identifiers and have incorporated them into FungiDB (www.fungidb.org). FungiDB is a panfungal genome database that collects gene information and functional data and provides search tools for 61 species of fungi and oomycetes. We applied these published names to a manually curated ortholog set of all Cryptococcus species currently in FungiDB, including Cryptococcus neoformans var. neoformans strains JEC21 and B-3501A, C. neoformans var. grubii strain H99, and Cryptococcus gattii strains R265 and WM276, and have written brief descriptions of their functions. We also compiled a protocol for gene naming that summarizes guidelines proposed by members of the Cryptococcus research community. The centralization of genomic and literature-based information for Cryptococcus at FungiDB will help researchers communicate about genes of interest, such as those related to virulence, and will further facilitate research on the pathogen. PMID:24813190

  19. Asphalt Raking. Instructor Manual. Trainee Manual.

    ERIC Educational Resources Information Center

    Laborers-AGC Education and Training Fund, Pomfret Center, CT.

    This packet consists of the instructor and trainee manuals for an asphalt raking course. The instructor manual contains a course schedule for 4 days of instruction, content outline, and instructor outline. The trainee manual is divided into five sections: safety, asphalt basics, placing methods, repair and patching, and clean-up and maintenance.…

  20. The immune epitope database: a historical retrospective of the first decade

    PubMed Central

    Salimi, Nima; Fleri, Ward; Peters, Bjoern; Sette, Alessandro

    2012-01-01

    As the amount of biomedical information available in the literature continues to increase, databases that aggregate this information continue to grow in importance and scope. The population of databases can occur either through fully automated text mining approaches or through manual curation by human subject experts. We here report our experiences in populating the National Institute of Allergy and Infectious Diseases sponsored Immune Epitope Database and Analysis Resource (IEDB, http://iedb.org), which was created in 2003, and as of 2012 captures the epitope information from approximately 99% of all papers published to date that describe immune epitopes (with the exception of cancer and HIV data). This was achieved using a hybrid model based on automated document categorization and extensive human expert involvement. This task required automated scanning of over 22 million PubMed abstracts followed by classification and curation of over 13 000 references, including over 7000 infectious disease-related manuscripts, over 1000 allergy-related manuscripts, roughly 4000 related to autoimmunity, and 1000 transplant/alloantigen-related manuscripts. The IEDB curation involves an unprecedented level of detail, capturing for each paper the actual experiments performed for each different epitope structure. Key to enabling this process was the extensive use of ontologies to ensure rigorous and consistent data representation as well as interoperability with other bioinformatics resources, including the Protein Data Bank, Chemical Entities of Biological Interest, and the NIAID Bioinformatics Resource Centers. A growing fraction of the IEDB data derives from direct submissions by research groups engaged in epitope discovery, and is being facilitated by the implementation of novel data submission tools. The present explosion of information contained in biological databases demands effective query and display capabilities to optimize the user experience. Accordingly, the

  1. Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments

    PubMed Central

    Petryszak, Robert; Burdett, Tony; Fiorelli, Benedetto; Fonseca, Nuno A.; Gonzalez-Porta, Mar; Hastings, Emma; Huber, Wolfgang; Jupp, Simon; Keays, Maria; Kryvych, Nataliya; McMurry, Julie; Marioni, John C.; Malone, James; Megy, Karine; Rustici, Gabriella; Tang, Amy Y.; Taubert, Jan; Williams, Eleanor; Mannion, Oliver; Parkinson, Helen E.; Brazma, Alvis

    2014-01-01

    Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of ‘baseline’ expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful ‘contrasts’, i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user. PMID:24304889

  2. Reflections on curative health care in Nicaragua.

    PubMed Central

    Slater, R G

    1989-01-01

    Improved health care in Nicaragua is a major priority of the Sandinista revolution; it has been pursued by major reforms of the national health care system, something few developing countries have attempted. In addition to its internationally recognized advances in public health, considerable progress has been made in health care delivery by expanding curative medical services through training more personnel and building more facilities to fulfill a commitment to free universal health coverage. The very uneven quality of medical care is the leading problem facing curative medicine now. Underlying factors include the difficulty of adequately training the greatly increased number of new physicians. Misdiagnosis and mismanagement continue to be major problems. The curative medical system is not well coordinated with the preventive sector. Recent innovations include initiation of a "medicina integral" residency, similar to family practice. Despite its inadequacies and the handicaps of war and poverty, the Nicaraguan curative medical system has made important progress. PMID:2705603

  3. Human transporter database: comprehensive knowledge and discovery tools in the human transporter genes.

    PubMed

    Ye, Adam Y; Liu, Qing-Rong; Li, Chuan-Yun; Zhao, Min; Qu, Hong

    2014-01-01

    Transporters are essential in homeostatic exchange of endogenous and exogenous substances at the systematic, organic, cellular, and subcellular levels. Gene mutations of transporters are often related to pharmacogenetics traits. Recent developments in high throughput technologies on genomics, transcriptomics and proteomics allow in depth studies of transporter genes in normal cellular processes and diverse disease conditions. The flood of high throughput data have resulted in urgent need for an updated knowledgebase with curated, organized, and annotated human transporters in an easily accessible way. Using a pipeline with the combination of automated keywords query, sequence similarity search and manual curation on transporters, we collected 1,555 human non-redundant transporter genes to develop the Human Transporter Database (HTD) (http://htd.cbi.pku.edu.cn). Based on the extensive annotations, global properties of the transporter genes were illustrated, such as expression patterns and polymorphisms in relationships with their ligands. We noted that the human transporters were enriched in many fundamental biological processes such as oxidative phosphorylation and cardiac muscle contraction, and significantly associated with Mendelian and complex diseases such as epilepsy and sudden infant death syndrome. Overall, HTD provides a well-organized interface to facilitate research communities to search detailed molecular and genetic information of transporters for development of personalized medicine.

  4. Human Transporter Database: Comprehensive Knowledge and Discovery Tools in the Human Transporter Genes

    PubMed Central

    Ye, Adam Y.; Liu, Qing-Rong; Li, Chuan-Yun; Zhao, Min; Qu, Hong

    2014-01-01

    Transporters are essential in homeostatic exchange of endogenous and exogenous substances at the systematic, organic, cellular, and subcellular levels. Gene mutations of transporters are often related to pharmacogenetics traits. Recent developments in high throughput technologies on genomics, transcriptomics and proteomics allow in depth studies of transporter genes in normal cellular processes and diverse disease conditions. The flood of high throughput data have resulted in urgent need for an updated knowledgebase with curated, organized, and annotated human transporters in an easily accessible way. Using a pipeline with the combination of automated keywords query, sequence similarity search and manual curation on transporters, we collected 1,555 human non-redundant transporter genes to develop the Human Transporter Database (HTD) (http://htd.cbi.pku.edu.cn). Based on the extensive annotations, global properties of the transporter genes were illustrated, such as expression patterns and polymorphisms in relationships with their ligands. We noted that the human transporters were enriched in many fundamental biological processes such as oxidative phosphorylation and cardiac muscle contraction, and significantly associated with Mendelian and complex diseases such as epilepsy and sudden infant death syndrome. Overall, HTD provides a well-organized interface to facilitate research communities to search detailed molecular and genetic information of transporters for development of personalized medicine. PMID:24558441

  5. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles

    PubMed Central

    Mathelier, Anthony; Zhao, Xiaobei; Zhang, Allen W.; Parcy, François; Worsley-Hunt, Rebecca; Arenillas, David J.; Buchman, Sorana; Chen, Chih-yu; Chou, Alice; Ienasescu, Hans; Lim, Jonathan; Shyr, Casper; Tan, Ge; Zhou, Michelle; Lenhard, Boris; Sandelin, Albin; Wasserman, Wyeth W.

    2014-01-01

    JASPAR (http://jaspar.genereg.net) is the largest open-access database of matrix-based nucleotide profiles describing the binding preference of transcription factors from multiple species. The fifth major release greatly expands the heart of JASPAR—the JASPAR CORE subcollection, which contains curated, non-redundant profiles—with 135 new curated profiles (74 in vertebrates, 8 in Drosophila melanogaster, 10 in Caenorhabditis elegans and 43 in Arabidopsis thaliana; a 30% increase in total) and 43 older updated profiles (36 in vertebrates, 3 in D. melanogaster and 4 in A. thaliana; a 9% update in total). The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets. In addition, the web interface has been enhanced with advanced capabilities in browsing, searching and subsetting. Finally, the new JASPAR release is accompanied by a new BioPython package, a new R tool package and a new R/Bioconductor data package to facilitate access for both manual and automated methods. PMID:24194598

  6. The Pathogen-Host Interactions database (PHI-base): additions and future developments

    PubMed Central

    Urban, Martin; Pant, Rashmi; Raghunath, Arathi; Irvine, Alistair G.; Pedro, Helder; Hammond-Kosack, Kim E.

    2015-01-01

    Rapidly evolving pathogens cause a diverse array of diseases and epidemics that threaten crop yield, food security as well as human, animal and ecosystem health. To combat infection greater comparative knowledge is required on the pathogenic process in multiple species. The Pathogen-Host Interactions database (PHI-base) catalogues experimentally verified pathogenicity, virulence and effector genes from bacterial, fungal and protist pathogens. Mutant phenotypes are associated with gene information. The included pathogens infect a wide range of hosts including humans, animals, plants, insects, fish and other fungi. The current version, PHI-base 3.6, available at http://www.phi-base.org, stores information on 2875 genes, 4102 interactions, 110 host species, 160 pathogenic species (103 plant, 3 fungal and 54 animal infecting species) and 181 diseases drawn from 1243 references. Phenotypic and gene function information has been obtained by manual curation of the peer-reviewed literature. A controlled vocabulary consisting of nine high-level phenotype terms permits comparisons and data analysis across the taxonomic space. PHI-base phenotypes were mapped via their associated gene information to reference genomes available in Ensembl Genomes. Virulence genes and hotspots can be visualized directly in genome browsers. Future plans for PHI-base include development of tools facilitating community-led curation and inclusion of the corresponding host target(s). PMID:25414340

  7. OpenTrials: towards a collaborative open database of all available information on all clinical trials.

    PubMed

    Goldacre, Ben; Gray, Jonathan

    2016-01-01

    OpenTrials is a collaborative and open database for all available structured data and documents on all clinical trials, threaded together by individual trial. With a versatile and expandable data schema, it is initially designed to host and match the following documents and data for each trial: registry entries; links, abstracts, or texts of academic journal papers; portions of regulatory documents describing individual trials; structured data on methods and results extracted by systematic reviewers or other researchers; clinical study reports; and additional documents such as blank consent forms, blank case report forms, and protocols. The intention is to create an open, freely re-usable index of all such information and to increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data and drive up standards around open data in evidence-based medicine. The project has phase I funding. This will allow us to create a practical data schema and populate the database initially through web-scraping, basic record linkage techniques, crowd-sourced curation around selected drug areas, and import of existing sources of structured and documents. It will also allow us to create user-friendly web interfaces onto the data and conduct user engagement workshops to optimise the database and interface designs. Where other projects have set out to manually and perfectly curate a narrow range of information on a smaller number of trials, we aim to use a broader range of techniques and attempt to match a very large quantity of information on all trials. We are currently seeking feedback and additional sources of structured data. PMID:27056367

  8. Curating NASA's Past, Present, and Future Extraterrestrial Sample Collections

    NASA Technical Reports Server (NTRS)

    McCubbin, F. M.; Allton, J. H.; Evans, C. A.; Fries, M. D.; Nakamura-Messenger, K.; Righter, K.; Zeigler, R. A.; Zolensky, M.; Stansbery, E. K.

    2016-01-01

    The Astromaterials Acquisition and Curation Office (henceforth referred to herein as NASA Curation Office) at NASA Johnson Space Center (JSC) is responsible for curating all of NASA's extraterrestrial samples. Under the governing document, NASA Policy Directive (NPD) 7100.10E "Curation of Extraterrestrial Materials", JSC is charged with "...curation of all extra-terrestrial material under NASA control, including future NASA missions." The Directive goes on to define Curation as including "...documentation, preservation, preparation, and distribution of samples for research, education, and public outreach." Here we describe some of the past, present, and future activities of the NASA Curation Office.

  9. The Molecular Biology Database Collection: 2002 update

    PubMed Central

    Baxevanis, Andreas D.

    2002-01-01

    The Molecular Biology Database Collection is an online resource listing key databases of value to the biological community. This Collection is intended to bring fellow scientists’ attention to high-quality databases that are available throughout the world, rather than just be a lengthy listing of all available databases. As such, this up-to-date listing is intended to serve as the initial point from which to find specialized databases that may be of use in biological research. The databases included in this Collection provide new value to the underlying data by virtue of curation, new data connections or other innovative approaches. Short, searchable summaries and updates for each of the databases included in the Collection are available through the Nucleic Acids Research Web site at http://nar.oupjournals.org. PMID:11752241

  10. The Molecular Biology Database Collection: 2003 update

    PubMed Central

    Baxevanis, Andreas D.

    2003-01-01

    The Molecular Biology Database Collection is an online resource listing key databases of value to the biological community. This Collection is intended to bring fellow scientists' attention to high-quality databases that are available throughout the world, rather than just be a lengthy listing of all available databases. As such, this up-to-date listing is intended to serve as the jumping-off point from which to find specialized databases that may be of use in advancing biological research. The databases included in this Collection provide new value to the underlying data by virtue of curation, new data connections or other innovative approaches. Short, searchable summaries and updates for each of the databases included in this Collection are available through the Nucleic Acids Research Web site at http://nar.oupjournals.org. PMID:12519937

  11. RefSeq curation and annotation of antizyme and antizyme inhibitor genes in vertebrates

    PubMed Central

    Rajput, Bhanu; Murphy, Terence D.; Pruitt, Kim D.

    2015-01-01

    Polyamines are ubiquitous cations that are involved in regulating fundamental cellular processes such as cell growth and proliferation; hence, their intracellular concentration is tightly regulated. Antizyme and antizyme inhibitor have a central role in maintaining cellular polyamine levels. Antizyme is unique in that it is expressed via a novel programmed ribosomal frameshifting mechanism. Conventional computational tools are unable to predict a programmed frameshift, resulting in misannotation of antizyme transcripts and proteins on transcript and genomic sequences. Correct annotation of a programmed frameshifting event requires manual evaluation. Our goal was to provide an accurately curated and annotated Reference Sequence (RefSeq) data set of antizyme transcript and protein records across a broad taxonomic scope that would serve as standards for accurate representation of these gene products. As antizyme and antizyme inhibitor proteins are functionally connected, we also curated antizyme inhibitor genes to more fully represent the elegant biology of polyamine regulation. Manual review of genes for three members of the antizyme family and two members of the antizyme inhibitor family in 91 vertebrate organisms resulted in a total of 461 curated RefSeq records. PMID:26170238

  12. RefSeq curation and annotation of antizyme and antizyme inhibitor genes in vertebrates.

    PubMed

    Rajput, Bhanu; Murphy, Terence D; Pruitt, Kim D

    2015-09-01

    Polyamines are ubiquitous cations that are involved in regulating fundamental cellular processes such as cell growth and proliferation; hence, their intracellular concentration is tightly regulated. Antizyme and antizyme inhibitor have a central role in maintaining cellular polyamine levels. Antizyme is unique in that it is expressed via a novel programmed ribosomal frameshifting mechanism. Conventional computational tools are unable to predict a programmed frameshift, resulting in misannotation of antizyme transcripts and proteins on transcript and genomic sequences. Correct annotation of a programmed frameshifting event requires manual evaluation. Our goal was to provide an accurately curated and annotated Reference Sequence (RefSeq) data set of antizyme transcript and protein records across a broad taxonomic scope that would serve as standards for accurate representation of these gene products. As antizyme and antizyme inhibitor proteins are functionally connected, we also curated antizyme inhibitor genes to more fully represent the elegant biology of polyamine regulation. Manual review of genes for three members of the antizyme family and two members of the antizyme inhibitor family in 91 vertebrate organisms resulted in a total of 461 curated RefSeq records. PMID:26170238

  13. Project ADVANCE. Evaluation Manual. Mt. Diablo Unified School District.

    ERIC Educational Resources Information Center

    Tenenbaum, Bonnie

    The organizational structure of this manual parallels the evolutionary stages in school improvement--needs assessment, planning, implementation, and outcomes. The manual provides procedures for data-based management and includes within each section, sample instruments, data collection and analyses procedures, and questions for decision-making. The…

  14. MoonProt: a database for proteins that are known to moonlight

    PubMed Central

    Mani, Mathew; Chen, Chang; Amblee, Vaishak; Liu, Haipeng; Mathur, Tanu; Zwicke, Grant; Zabad, Shadi; Patel, Bansi; Thakkar, Jagravi; Jeffery, Constance J.

    2015-01-01

    Moonlighting proteins comprise a class of multifunctional proteins in which a single polypeptide chain performs multiple biochemical functions that are not due to gene fusions, multiple RNA splice variants or pleiotropic effects. The known moonlighting proteins perform a variety of diverse functions in many different cell types and species, and information about their structures and functions is scattered in many publications. We have constructed the manually curated, searchable, internet-based MoonProt Database (http://www.moonlightingproteins.org) with information about the over 200 proteins that have been experimentally verified to be moonlighting proteins. The availability of this organized information provides a more complete picture of what is currently known about moonlighting proteins. The database will also aid researchers in other fields, including determining the functions of genes identified in genome sequencing projects, interpreting data from proteomics projects and annotating protein sequence and structural databases. In addition, information about the structures and functions of moonlighting proteins can be helpful in understanding how novel protein functional sites evolved on an ancient protein scaffold, which can also help in the design of proteins with novel functions. PMID:25324305

  15. uORFdb—a comprehensive literature database on eukaryotic uORF biology

    PubMed Central

    Wethmar, Klaus; Barbosa-Silva, Adriano; Andrade-Navarro, Miguel A.; Leutz, Achim

    2014-01-01

    Approximately half of all human transcripts contain at least one upstream translational initiation site that precedes the main coding sequence (CDS) and gives rise to an upstream open reading frame (uORF). We generated uORFdb, publicly available at http://cbdm.mdc-berlin.de/tools/uorfdb, to serve as a comprehensive literature database on eukaryotic uORF biology. Upstream ORFs affect downstream translation by interfering with the unrestrained progression of ribosomes across the transcript leader sequence. Although the first uORF-related translational activity was observed >30 years ago, and an increasing number of studies link defective uORF-mediated translational control to the development of human diseases, the features that determine uORF-mediated regulation of downstream translation are not well understood. The uORFdb was manually curated from all uORF-related literature listed at the PubMed database. It categorizes individual publications by a variety of denominators including taxon, gene and type of study. Furthermore, the database can be filtered for multiple structural and functional uORF-related properties to allow convenient and targeted access to the complex field of eukaryotic uORF biology. PMID:24163100

  16. PhenoMiner: from text to a database of phenotypes associated with OMIM diseases.

    PubMed

    Collier, Nigel; Groza, Tudor; Smedley, Damian; Robinson, Peter N; Oellrich, Anika; Rebholz-Schuhmann, Dietrich

    2015-01-01

    Analysis of scientific and clinical phenotypes reported in the experimental literature has been curated manually to build high-quality databases such as the Online Mendelian Inheritance in Man (OMIM). However, the identification and harmonization of phenotype descriptions struggles with the diversity of human expressivity. We introduce a novel automated extraction approach called PhenoMiner that exploits full parsing and conceptual analysis. Apriori association mining is then used to identify relationships to human diseases. We applied PhenoMiner to the BMC open access collection and identified 13,636 phenotype candidates. We identified 28,155 phenotype-disorder hypotheses covering 4898 phenotypes and 1659 Mendelian disorders. Analysis showed: (i) the semantic distribution of the extracted terms against linked ontologies; (ii) a comparison of term overlap with the Human Phenotype Ontology (HP); (iii) moderate support for phenotype-disorder pairs in both OMIM and the literature; (iv) strong associations of phenotype-disorder pairs to known disease-genes pairs using PhenoDigm. The full list of PhenoMiner phenotypes (S1), phenotype-disorder associations (S2), association-filtered linked data (S3) and user database documentation (S5) is available as supplementary data and can be downloaded at http://github.com/nhcollier/PhenoMiner under a Creative Commons Attribution 4.0 license. Database URL: phenominer.mml.cam.ac.uk.

  17. dbPEC: a comprehensive literature-based database for preeclampsia related genes and phenotypes

    PubMed Central

    Uzun, Alper; Triche, Elizabeth W.; Schuster, Jessica; Dewan, Andrew T.; Padbury, James F.

    2016-01-01

    Preeclampsia is one of the most common causes of fetal and maternal morbidity and mortality in the world. We built a Database for Preeclampsia (dbPEC) consisting of the clinical features, concurrent conditions, published literature and genes associated with Preeclampsia. We included gene sets associated with severity, concurrent conditions, tissue sources and networks. The published scientific literature is the primary repository for all information documenting human disease. We used semantic data mining to retrieve and extract the articles pertaining to preeclampsia-associated genes and performed manual curation. We deposited the articles, genes, preeclampsia phenotypes and other supporting information into the dbPEC. It is publicly available and freely accessible. Previously, we developed a database for preterm birth (dbPTB) using a similar approach. Using the gene sets in dbPTB, we were able to successfully analyze a genome-wide study of preterm birth including 4000 women and children. We identified important genes and pathways associated with preterm birth that were not otherwise demonstrable using genome-wide approaches. dbPEC serves not only as a resources for genes and articles associated with preeclampsia, it is a robust source of gene sets to analyze a wide range of high-throughput data for gene set enrichment analysis. Database URL: http://ptbdb.cs.brown.edu/dbpec/ PMID:26946289

  18. PhenoMiner: from text to a database of phenotypes associated with OMIM diseases

    PubMed Central

    Collier, Nigel; Groza, Tudor; Smedley, Damian; Robinson, Peter N.; Oellrich, Anika; Rebholz-Schuhmann, Dietrich

    2015-01-01

    Analysis of scientific and clinical phenotypes reported in the experimental literature has been curated manually to build high-quality databases such as the Online Mendelian Inheritance in Man (OMIM). However, the identification and harmonization of phenotype descriptions struggles with the diversity of human expressivity. We introduce a novel automated extraction approach called PhenoMiner that exploits full parsing and conceptual analysis. Apriori association mining is then used to identify relationships to human diseases. We applied PhenoMiner to the BMC open access collection and identified 13 636 phenotype candidates. We identified 28 155 phenotype-disorder hypotheses covering 4898 phenotypes and 1659 Mendelian disorders. Analysis showed: (i) the semantic distribution of the extracted terms against linked ontologies; (ii) a comparison of term overlap with the Human Phenotype Ontology (HP); (iii) moderate support for phenotype-disorder pairs in both OMIM and the literature; (iv) strong associations of phenotype-disorder pairs to known disease-genes pairs using PhenoDigm. The full list of PhenoMiner phenotypes (S1), phenotype-disorder associations (S2), association-filtered linked data (S3) and user database documentation (S5) is available as supplementary data and can be downloaded at http://github.com/nhcollier/PhenoMiner under a Creative Commons Attribution 4.0 license. Database URL: phenominer.mml.cam.ac.uk PMID:26507285

  19. dbPEC: a comprehensive literature-based database for preeclampsia related genes and phenotypes.

    PubMed

    Uzun, Alper; Triche, Elizabeth W; Schuster, Jessica; Dewan, Andrew T; Padbury, James F

    2016-01-01

    Preeclampsia is one of the most common causes of fetal and maternal morbidity and mortality in the world. We built a Database for Preeclampsia (dbPEC) consisting of the clinical features, concurrent conditions, published literature and genes associated with Preeclampsia. We included gene sets associated with severity, concurrent conditions, tissue sources and networks. The published scientific literature is the primary repository for all information documenting human disease. We used semantic data mining to retrieve and extract the articles pertaining to preeclampsia-associated genes and performed manual curation. We deposited the articles, genes, preeclampsia phenotypes and other supporting information into the dbPEC. It is publicly available and freely accessible. Previously, we developed a database for preterm birth (dbPTB) using a similar approach. Using the gene sets in dbPTB, we were able to successfully analyze a genome-wide study of preterm birth including 4000 women and children. We identified important genes and pathways associated with preterm birth that were not otherwise demonstrable using genome-wide approaches. dbPEC serves not only as a resources for genes and articles associated with preeclampsia, it is a robust source of gene sets to analyze a wide range of high-throughput data for gene set enrichment analysis. Database URL: http://ptbdb.cs.brown.edu/dbpec/. PMID:26946289

  20. Follicle Online: an integrated database of follicle assembly, development and ovulation.

    PubMed

    Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Cooke, Howard J; Zhang, Yuanwei; Shi, Qinghua

    2015-01-01

    Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database 'Follicle Online' that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43,000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php PMID:25931457

  1. Follicle Online: an integrated database of follicle assembly, development and ovulation

    PubMed Central

    Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Zhang, Yuanwei; Shi, Qinghua

    2015-01-01

    Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database ‘Follicle Online’ that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43 000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php PMID:25931457

  2. AtomPy: an open atomic-data curation environment

    NASA Astrophysics Data System (ADS)

    Bautista, Manuel; Mendoza, Claudio; Boswell, Josiah S; Ajoku, Chukwuemeka

    2014-06-01

    We present a cloud-computing environment for atomic data curation, networking among atomic data providers and users, teaching-and-learning, and interfacing with spectral modeling software. The system is based on Google-Drive Sheets, Pandas (Python Data Analysis Library) DataFrames, and IPython Notebooks for open community-driven curation of atomic data for scientific and technological applications. The atomic model for each ionic species is contained in a multi-sheet Google-Drive workbook, where the atomic parameters from all known public sources are progressively stored. Metadata (provenance, community discussion, etc.) accompanying every entry in the database are stored through Notebooks. Education tools on the physics of atomic processes as well as their relevance to plasma and spectral modeling are based on IPython Notebooks that integrate written material, images, videos, and active computer-tool workflows. Data processing workflows and collaborative software developments are encouraged and managed through the GitHub social network. Relevant issues this platform intends to address are: (i) data quality by allowing open access to both data producers and users in order to attain completeness, accuracy, consistency, provenance and currentness; (ii) comparisons of different datasets to facilitate accuracy assessment; (iii) downloading to local data structures (i.e. Pandas DataFrames) for further manipulation and analysis by prospective users; and (iv) data preservation by avoiding the discard of outdated sets.

  3. Science Orders Systems and Operations Manual.

    ERIC Educational Resources Information Center

    Kriz, Harry M.

    This manual describes the implementation and operation of SCIENCE ORDERS, an online orders management system used by the Science and Technology Department of Newman Library at Virginia Polytechnic Institute and State University. Operational since January 1985, the system is implemented using the SPIRES database management system and is used to (1)…

  4. Pesticide Dealers Training Manual. Manual 100.

    ERIC Educational Resources Information Center

    Missouri Univ., Columbia. Agricultural Experiment Station.

    This training manual provides information needed by pesticide dealers to meet the requirements of the Missouri Pesticide Act of 1974. A complete copy of the Act accompanies the manual. Pertinent sections to dealers are quoted and additionally explained through comment, interpretation, and example. (CS)

  5. Concrete Practices & Procedures. Instructor Manual. Trainee Manual.

    ERIC Educational Resources Information Center

    Laborers-AGC Education and Training Fund, Pomfret Center, CT.

    This packet consists of the instructor and trainee manuals for a concrete practices and procedures course. The instructor manual contains a schedule for an 80-hour, 10-day course and instructor outline. The outline provides a step-by-step description of the instructor's activities and includes answer sheets to accompany questions on information…

  6. Value, but high costs in post-deposition data curation

    PubMed Central

    ten Hoopen, Petra; Amid, Clara; Luigi Buttigieg, Pier; Pafilis, Evangelos; Bravakos, Panos; Cerdeño-Tárraga, Ana M.; Gibson, Richard; Kahlke, Tim; Legaki, Aglaia; Narayana Murthy, Kada; Papastefanou, Gabriella; Pereira, Emiliano; Rossello, Marc; Luisa Toribio, Ana; Cochrane, Guy

    2016-01-01

    Discoverability of sequence data in primary data archives is proportional to the richness of contextual information associated with the data. Here, we describe an exercise in the improvement of contextual information surrounding sample records associated with metagenomics sequence reads available in the European Nucleotide Archive. We outline the annotation process and summarize findings of this effort aimed at increasing usability of publicly available environmental data. Furthermore, we emphasize the benefits of such an exercise and detail its costs. We conclude that such a third party annotation approach is expensive and has value as an element of curation, but should form only part of a more sustainable submitter-driven approach. Database URL: http://www.ebi.ac.uk/ena PMID:26861660

  7. The BioGRID interaction database: 2015 update.

    PubMed

    Chatr-Aryamontri, Andrew; Breitkreutz, Bobby-Joe; Oughtred, Rose; Boucher, Lorrie; Heinicke, Sven; Chen, Daici; Stark, Chris; Breitkreutz, Ashton; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Nixon, Julie; Ramage, Lindsay; Winter, Andrew; Sellam, Adnane; Chang, Christie; Hirschman, Jodi; Theesfeld, Chandra; Rust, Jennifer; Livstone, Michael S; Dolinski, Kara; Tyers, Mike

    2015-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein interactions curated from the primary biomedical literature for all major model organism species and humans. As of September 2014, the BioGRID contains 749,912 interactions as drawn from 43,149 publications that represent 30 model organisms. This interaction count represents a 50% increase compared to our previous 2013 BioGRID update. BioGRID data are freely distributed through partner model organism databases and meta-databases and are directly downloadable in a variety of formats. In addition to general curation of the published literature for the major model species, BioGRID undertakes themed curation projects in areas of particular relevance for biomedical sciences, such as the ubiquitin-proteasome system and various human disease-associated interaction networks. BioGRID curation is coordinated through an Interaction Management System (IMS) that facilitates the compilation interaction records through structured evidence codes, phenotype ontologies, and gene annotation. The BioGRID architecture has been improved in order to support a broader range of interaction and post-translational modification types, to allow the representation of more complex multi-gene/protein interactions, to account for cellular phenotypes through structured ontologies, to expedite curation through semi-automated text-mining approaches, and to enhance curation quality control.

  8. The BioGRID interaction database: 2015 update

    PubMed Central

    Chatr-aryamontri, Andrew; Breitkreutz, Bobby-Joe; Oughtred, Rose; Boucher, Lorrie; Heinicke, Sven; Chen, Daici; Stark, Chris; Breitkreutz, Ashton; Kolas, Nadine; O'Donnell, Lara; Reguly, Teresa; Nixon, Julie; Ramage, Lindsay; Winter, Andrew; Sellam, Adnane; Chang, Christie; Hirschman, Jodi; Theesfeld, Chandra; Rust, Jennifer; Livstone, Michael S.; Dolinski, Kara; Tyers, Mike

    2015-01-01

    The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein interactions curated from the primary biomedical literature for all major model organism species and humans. As of September 2014, the BioGRID contains 749 912 interactions as drawn from 43 149 publications that represent 30 model organisms. This interaction count represents a 50% increase compared to our previous 2013 BioGRID update. BioGRID data are freely distributed through partner model organism databases and meta-databases and are directly downloadable in a variety of formats. In addition to general curation of the published literature for the major model species, BioGRID undertakes themed curation projects in areas of particular relevance for biomedical sciences, such as the ubiquitin-proteasome system and various human disease-associated interaction networks. BioGRID curation is coordinated through an Interaction Management System (IMS) that facilitates the compilation interaction records through structured evidence codes, phenotype ontologies, and gene annotation. The BioGRID architecture has been improved in order to support a broader range of interaction and post-translational modification types, to allow the representation of more complex multi-gene/protein interactions, to account for cellular phenotypes through structured ontologies, to expedite curation through semi-automated text-mining approaches, and to enhance curation quality control. PMID:25428363

  9. Nutrient Control Design Manual

    EPA Science Inventory

    The Nutrient Control Design Manual will present an extensive state-of-the-technology review of the engineering design and operation of nitrogen and phosphorous control technologies and techniques applied at municipal wastewater treatment plants (WWTPs). This manual will present ...

  10. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  11. RepeatsDB: a database of tandem repeat protein structures

    PubMed Central

    Di Domenico, Tomás; Potenza, Emilio; Walsh, Ian; Gonzalo Parra, R.; Giollo, Manuel; Minervini, Giovanni; Piovesan, Damiano; Ihsan, Awais; Ferrari, Carlo; Kajava, Andrey V.; Tosatto, Silvio C.E.

    2014-01-01

    RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services. PMID:24311564

  12. Pancreatic Cancer Database: an integrative resource for pancreatic cancer.

    PubMed

    Thomas, Joji Kurian; Kim, Min-Sik; Balakrishnan, Lavanya; Nanjappa, Vishalakshi; Raju, Rajesh; Marimuthu, Arivusudar; Radhakrishnan, Aneesha; Muthusamy, Babylakshmi; Khan, Aafaque Ahmad; Sakamuri, Sruthi; Tankala, Shantal Gupta; Singal, Mukul; Nair, Bipin; Sirdeshmukh, Ravi; Chatterjee, Aditi; Prasad, T S Keshava; Maitra, Anirban; Gowda, Harsha; Hruban, Ralph H; Pandey, Akhilesh

    2014-08-01

    Pancreatic cancer is the fourth leading cause of cancer-related death in the world. The etiology of pancreatic cancer is heterogeneous with a wide range of alterations that have already been reported at the level of the genome, transcriptome, and proteome. The past decade has witnessed a large number of experimental studies using high-throughput technology platforms to identify genes whose expression at the transcript or protein levels is altered in pancreatic cancer. Based on expression studies, a number of molecules have also been proposed as potential biomarkers for diagnosis and prognosis of this deadly cancer. Currently, there are no repositories which provide an integrative view of multiple Omics data sets from published research on pancreatic cancer. Here, we describe the development of a web-based resource, Pancreatic Cancer Database (http://www.pancreaticcancerdatabase.org), as a unified platform for pancreatic cancer research. PCD contains manually curated information pertaining to quantitative alterations in miRNA, mRNA, and proteins obtained from small-scale as well as high-throughput studies of pancreatic cancer tissues and cell lines. We believe that PCD will serve as an integrative platform for scientific community involved in pancreatic cancer research.

  13. CancerHSP: anticancer herbs database of systems pharmacology

    NASA Astrophysics Data System (ADS)

    Tao, Weiyang; Li, Bohui; Gao, Shuo; Bai, Yaofei; Shar, Piar Ali; Zhang, Wenjuan; Guo, Zihu; Sun, Ke; Fu, Yingxue; Huang, Chao; Zheng, Chunli; Mu, Jiexin; Pei, Tianli; Wang, Yuan; Li, Yan; Wang, Yonghua

    2015-06-01

    The numerous natural products and their bioactivity potentially afford an extraordinary resource for new drug discovery and have been employed in cancer treatment. However, the underlying pharmacological mechanisms of most natural anticancer compounds remain elusive, which has become one of the major obstacles in developing novel effective anticancer agents. Here, to address these unmet needs, we developed an anticancer herbs database of systems pharmacology (CancerHSP), which records anticancer herbs related information through manual curation. Currently, CancerHSP contains 2439 anticancer herbal medicines with 3575 anticancer ingredients. For each ingredient, the molecular structure and nine key ADME parameters are provided. Moreover, we also provide the anticancer activities of these compounds based on 492 different cancer cell lines. Further, the protein targets of the compounds are predicted by state-of-art methods or collected from literatures. CancerHSP will help reveal the molecular mechanisms of natural anticancer products and accelerate anticancer drug development, especially facilitate future investigations on drug repositioning and drug discovery. CancerHSP is freely available on the web at http://lsp.nwsuaf.edu.cn/CancerHSP.php.

  14. The Mouse Genome Database (MGD): mouse biology and model systems.

    PubMed

    Bult, Carol J; Eppig, Janan T; Kadin, James A; Richardson, Joel E; Blake, Judith A

    2008-01-01

    The Mouse Genome Database, (MGD, http://www.informatics.jax.org/), integrates genetic, genomic and phenotypic information about the laboratory mouse, a primary animal model for studying human biology and disease. MGD data content includes comprehensive characterization of genes and their functions, standardized descriptions of mouse phenotypes, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information including comparative data on mammalian genes. Data within MGD are obtained from diverse sources including manual curation of the biomedical literature, direct contributions from individual investigator's laboratories and major informatics resource centers such as Ensembl, UniProt and NCBI. MGD collaborates with the bioinformatics community on the development of data and semantic standards such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology. MGD provides a data-mining platform that enables the development of translational research hypotheses based on comparative genotype, phenotype and functional analyses. Both web-based querying and computational access to data are provided. Recent improvements in MGD described here include the association of gene trap data with mouse genes and a new batch query capability for customized data access and retrieval.

  15. Correcting Inconsistencies and Errors in Bacterial Genome Metadata Using an Automated Curation Tool in Excel (AutoCurE).

    PubMed

    Schmedes, Sarah E; King, Jonathan L; Budowle, Bruce

    2015-01-01

    Whole-genome data are invaluable for large-scale comparative genomic studies. Current sequencing technologies have made it feasible to sequence entire bacterial genomes with relative ease and time with a substantially reduced cost per nucleotide, hence cost per genome. More than 3,000 bacterial genomes have been sequenced and are available at the finished status. Publically available genomes can be readily downloaded; however, there are challenges to verify the specific supporting data contained within the download and to identify errors and inconsistencies that may be present within the organizational data content and metadata. AutoCurE, an automated tool for bacterial genome database curation in Excel, was developed to facilitate local database curation of supporting data that accompany downloaded genomes from the National Center for Biotechnology Information. AutoCurE provides an automated approach to curate local genomic databases by flagging inconsistencies or errors by comparing the downloaded supporting data to the genome reports to verify genome name, RefSeq accession numbers, the presence of archaea, BioProject/UIDs, and sequence file descriptions. Flags are generated for nine metadata fields if there are inconsistencies between the downloaded genomes and genomes reports and if erroneous or missing data are evident. AutoCurE is an easy-to-use tool for local database curation for large-scale genome data prior to downstream analyses.

  16. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  17. Nutrient Control Design Manual

    EPA Science Inventory

    The purpose of this EPA design manual is to provide updated, state‐of‐the‐technology design guidance on nitrogen and phosphorus control at municipal Wastewater Treatment Plants (WWTPs). Similar to previous EPA manuals, this manual contains extensive information on the principles ...

  18. KDYNA user's manual

    SciTech Connect

    Levatin, J.A.L.; Attia, A.V.; Hallquist, J.O.

    1990-09-28

    This report is a complete user's manual for KDYNA, the Earth Sciences version of DYNA2D. Because most features of DYNA2D have been retained in KDYNA much of this manual is identical to the DYNA2D user's manual.

  19. Astromaterials Acquisition and Curation Office (KT) Overview

    NASA Technical Reports Server (NTRS)

    Allen, Carlton

    2014-01-01

    The Astromaterials Acquisition and Curation Office has the unique responsibility to curate NASA's extraterrestrial samples - from past and forthcoming missions - into the indefinite future. Currently, curation includes documentation, preservation, physical security, preparation, and distribution of samples from the Moon, asteroids, comets, the solar wind, and the planet Mars. Each of these sample sets has a unique history and comes from a unique environment. The curation laboratories and procedures developed over 40 years have proven both necessary and sufficient to serve the evolving needs of a worldwide research community. A new generation of sample return missions to destinations across the solar system is being planned and proposed. The curators are developing the tools and techniques to meet the challenges of these new samples. Extraterrestrial samples pose unique curation requirements. These samples were formed and exist under conditions strikingly different from those on the Earth's surface. Terrestrial contamination would destroy much of the scientific significance of extraterrestrial materials. To preserve the research value of these precious samples, contamination must be minimized, understood, and documented. In addition, the samples must be preserved - as far as possible - from physical and chemical alteration. The elaborate curation facilities at JSC were designed and constructed, and have been operated for many years, to keep sample contamination and alteration to a minimum. Currently, JSC curates seven collections of extraterrestrial samples: (a)) Lunar rocks and soils collected by the Apollo astronauts, (b) Meteorites collected on dedicated expeditions to Antarctica, (c) Cosmic dust collected by high-altitude NASA aircraft,t (d) Solar wind atoms collected by the Genesis spacecraft, (e) Comet particles collected by the Stardust spacecraft, (f) Interstellar dust particles collected by the Stardust spacecraft, and (g) Asteroid soil particles collected

  20. SymbioGenomesDB: a database for the integration and access to knowledge on host-symbiont relationships.

    PubMed

    Reyes-Prieto, Mariana; Vargas-Chávez, Carlos; Latorre, Amparo; Moya, Andrés

    2015-01-01

    Symbiotic relationships occur naturally throughout the tree of life, either in a commensal, mutualistic or pathogenic manner. The genomes of multiple organisms involved in symbiosis are rapidly being sequenced and becoming available, especially those from the microbial world. Currently, there are numerous databases that offer information on specific organisms or models, but none offer a global understanding on relationships between organisms, their interactions and capabilities within their niche, as well as their role as part of a system, in this case, their role in symbiosis. We have developed the SymbioGenomesDB as a community database resource for laboratories which intend to investigate and use information on the genetics and the genomics of organisms involved in these relationships. The ultimate goal of SymbioGenomesDB is to host and support the growing and vast symbiotic-host relationship information, to uncover the genetic basis of such associations. SymbioGenomesDB maintains a comprehensive organization of information on genomes of symbionts from diverse hosts throughout the Tree of Life, including their sequences, their metadata and their genomic features. This catalog of relationships was generated using computational tools, custom R scripts and manual integration of data available in public literature. As a highly curated and comprehensive systems database, SymbioGenomesDB provides web access to all the information of symbiotic organisms, their features and links to the central database NCBI. Three different tools can be found within the database to explore symbiosis-related organisms, their genes and their genomes. Also, we offer an orthology search for one or multiple genes in one or multiple organisms within symbiotic relationships, and every table, graph and output file is downloadable and easy to parse for further analysis. The robust SymbioGenomesDB will be constantly updated to cope with all the data being generated and included in major

  1. SymbioGenomesDB: a database for the integration and access to knowledge on host–symbiont relationships

    PubMed Central

    Reyes-Prieto, Mariana; Vargas-Chávez, Carlos; Latorre, Amparo; Moya, Andrés

    2015-01-01

    Symbiotic relationships occur naturally throughout the tree of life, either in a commensal, mutualistic or pathogenic manner. The genomes of multiple organisms involved in symbiosis are rapidly being sequenced and becoming available, especially those from the microbial world. Currently, there are numerous databases that offer information on specific organisms or models, but none offer a global understanding on relationships between organisms, their interactions and capabilities within their niche, as well as their role as part of a system, in this case, their role in symbiosis. We have developed the SymbioGenomesDB as a community database resource for laboratories which intend to investigate and use information on the genetics and the genomics of organisms involved in these relationships. The ultimate goal of SymbioGenomesDB is to host and support the growing and vast symbiotic–host relationship information, to uncover the genetic basis of such associations. SymbioGenomesDB maintains a comprehensive organization of information on genomes of symbionts from diverse hosts throughout the Tree of Life, including their sequences, their metadata and their genomic features. This catalog of relationships was generated using computational tools, custom R scripts and manual integration of data available in public literature. As a highly curated and comprehensive systems database, SymbioGenomesDB provides web access to all the information of symbiotic organisms, their features and links to the central database NCBI. Three different tools can be found within the database to explore symbiosis-related organisms, their genes and their genomes. Also, we offer an orthology search for one or multiple genes in one or multiple organisms within symbiotic relationships, and every table, graph and output file is downloadable and easy to parse for further analysis. The robust SymbioGenomesDB will be constantly updated to cope with all the data being generated and included in major

  2. SymbioGenomesDB: a database for the integration and access to knowledge on host-symbiont relationships.

    PubMed

    Reyes-Prieto, Mariana; Vargas-Chávez, Carlos; Latorre, Amparo; Moya, Andrés

    2015-01-01

    Symbiotic relationships occur naturally throughout the tree of life, either in a commensal, mutualistic or pathogenic manner. The genomes of multiple organisms involved in symbiosis are rapidly being sequenced and becoming available, especially those from the microbial world. Currently, there are numerous databases that offer information on specific organisms or models, but none offer a global understanding on relationships between organisms, their interactions and capabilities within their niche, as well as their role as part of a system, in this case, their role in symbiosis. We have developed the SymbioGenomesDB as a community database resource for laboratories which intend to investigate and use information on the genetics and the genomics of organisms involved in these relationships. The ultimate goal of SymbioGenomesDB is to host and support the growing and vast symbiotic-host relationship information, to uncover the genetic basis of such associations. SymbioGenomesDB maintains a comprehensive organization of information on genomes of symbionts from diverse hosts throughout the Tree of Life, including their sequences, their metadata and their genomic features. This catalog of relationships was generated using computational tools, custom R scripts and manual integration of data available in public literature. As a highly curated and comprehensive systems database, SymbioGenomesDB provides web access to all the information of symbiotic organisms, their features and links to the central database NCBI. Three different tools can be found within the database to explore symbiosis-related organisms, their genes and their genomes. Also, we offer an orthology search for one or multiple genes in one or multiple organisms within symbiotic relationships, and every table, graph and output file is downloadable and easy to parse for further analysis. The robust SymbioGenomesDB will be constantly updated to cope with all the data being generated and included in major

  3. NASA's Astromaterials Curation Digital Repository: Enabling Research Through Increased Access to Sample Data, Metadata and Imagery

    NASA Astrophysics Data System (ADS)

    Evans, C. A.; Todd, N. S.

    2014-12-01

    The Astromaterials Acquisition & Curation Office at NASA's Johnson Space Center (JSC) is the designated facility for curating all of NASA's extraterrestrial samples. Today, the suite of collections includes the lunar samples from the Apollo missions, cosmic dust particles falling into the Earth's atmosphere, meteorites collected in Antarctica, comet and interstellar dust particles from the Stardust mission, asteroid particles from Japan's Hayabusa mission, solar wind atoms collected during the Genesis mission, and space-exposed hardware from several missions. To support planetary science research on these samples, JSC's Astromaterials Curation Office hosts NASA's Astromaterials Curation digital repository and data access portal [http://curator.jsc.nasa.gov/], providing descriptions of the missions and collections, and critical information about each individual sample. Our office is designing and implementing several informatics initiatives to better serve the planetary research community. First, we are re-hosting the basic database framework by consolidating legacy databases for individual collections and providing a uniform access point for information (descriptions, imagery, classification) on all of our samples. Second, we continue to upgrade and host digital compendia that summarize and highlight published findings on the samples (e.g., lunar samples, meteorites from Mars). We host high resolution imagery of samples as it becomes available, including newly scanned images of historical prints from the Apollo missions. Finally we are creating plans to collect and provide new data, including 3D imagery, point cloud data, micro CT data, and external links to other data sets on selected samples. Together, these individual efforts will provide unprecedented digital access to NASA's Astromaterials, enabling preservation of the samples through more specific and targeted requests, and supporting new planetary science research and collaborations on the samples.

  4. MINT: a Molecular INTeraction database.

    PubMed

    Zanzoni, Andreas; Montecchi-Palazzi, Luisa; Quondam, Michele; Ausiello, Gabriele; Helmer-Citterich, Manuela; Cesareni, Gianni

    2002-02-20

    Protein interaction databases represent unique tools to store, in a computer readable form, the protein interaction information disseminated in the scientific literature. Well organized and easily accessible databases permit the easy retrieval and analysis of large interaction data sets. Here we present MINT, a database (http://cbm.bio.uniroma2.it/mint/index.html) designed to store data on functional interactions between proteins. Beyond cataloguing binary complexes, MINT was conceived to store other types of functional interactions, including enzymatic modifications of one of the partners. Release 1.0 of MINT focuses on experimentally verified protein-protein interactions. Both direct and indirect relationships are considered. Furthermore, MINT aims at being exhaustive in the description of the interaction and, whenever available, information about kinetic and binding constants and about the domains participating in the interaction is included in the entry. MINT consists of entries extracted from the scientific literature by expert curators assisted by 'MINT Assistant', a software that targets abstracts containing interaction information and presents them to the curator in a user-friendly format. The interaction data can be easily extracted and viewed graphically through 'MINT Viewer'. Presently MINT contains 4568 interactions, 782 of which are indirect or genetic interactions.

  5. If we build it, will they come? Curation and use of the ESO telescope bibliography

    NASA Astrophysics Data System (ADS)

    Grothkopf, Uta; Meakins, Silvia; Bordelon, Dominic

    2015-12-01

    The ESO Telescope Bibliography (telbib) is a database of refereed papers published by the ESO users community. It links data in the ESO Science Archive with the published literature, and vice versa. Developed and maintained by the ESO library, telbib also provides insights into the organization's research output and impact as measured through bibliometric studies. Curating telbib is a multi-step process that involves extensive tagging of the database records. Based on selected use cases, this talk will explain how the rich metadata provide parameters for reports and statistics in order to investigate the performance of ESO's facilities and to understand trends and developments in the publishing behaviour of the user community.

  6. The Curative Fantasy and Psychic Recovery

    PubMed Central

    ORNSTEIN, ANNA

    1992-01-01

    The discovery of selfobject transferences and the interpretation of symptomatic behavior from within the patient’s perspective have altered the conduct of psychoanalytic psychotherapy in fundamental ways. A review of the treatment process from a self psychological perspective serves as a background for pointing to the significance of curative fantasies in the process of recovery. PMID:22700052

  7. Curating and Nudging in Virtual CLIL Environments

    ERIC Educational Resources Information Center

    Nielsen, Helle Lykke

    2014-01-01

    Foreign language teachers can benefit substantially from the notions of curation and nudging when scaffolding CLIL activities on the internet. This article shows how these principles can be integrated into CLILstore, a free multimedia-rich learning tool with seamless access to online dictionaries, and presents feedback from first and second year…

  8. Curating Media Learning: Towards a Porous Expertise

    ERIC Educational Resources Information Center

    McDougall, Julian; Potter, John

    2015-01-01

    This article combines research results from a range of projects with two consistent themes. Firstly, we explore the potential for curation to offer a productive metaphor for the convergence of digital media learning across and between home/lifeworld and formal educational/system-world spaces--or between the public and private spheres. Secondly, we…

  9. A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE).

    PubMed

    Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja

    2014-01-01

    Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http

  10. Clinical Genomic Database

    PubMed Central

    Solomon, Benjamin D.; Nguyen, Anh-Dao; Bear, Kelly A.; Wolfsberg, Tyra G.

    2013-01-01

    Technological advances have greatly increased the availability of human genomic sequencing. However, the capacity to analyze genomic data in a clinically meaningful way lags behind the ability to generate such data. To help address this obstacle, we reviewed all conditions with genetic causes and constructed the Clinical Genomic Database (CGD) (http://research.nhgri.nih.gov/CGD/), a searchable, freely Web-accessible database of conditions based on the clinical utility of genetic diagnosis and the availability of specific medical interventions. The CGD currently includes a total of 2,616 genes organized clinically by affected organ systems and interventions (including preventive measures, disease surveillance, and medical or surgical interventions) that could be reasonably warranted by the identification of pathogenic mutations. To aid independent analysis and optimize new data incorporation, the CGD also includes all genetic conditions for which genetic knowledge may affect the selection of supportive care, informed medical decision-making, prognostic considerations, reproductive decisions, and allow avoidance of unnecessary testing, but for which specific interventions are not otherwise currently available. For each entry, the CGD includes the gene symbol, conditions, allelic conditions, clinical categorization (for both manifestations and interventions), mode of inheritance, affected age group, description of interventions/rationale, links to other complementary databases, including databases of variants and presumed pathogenic mutations, and links to PubMed references (>20,000). The CGD will be regularly maintained and updated to keep pace with scientific discovery. Further content-based expert opinions are actively solicited. Eventually, the CGD may assist the rapid curation of individual genomes as part of active medical care. PMID:23696674

  11. How Workflow Documentation Facilitates Curation Planning

    NASA Astrophysics Data System (ADS)

    Wickett, K.; Thomer, A. K.; Baker, K. S.; DiLauro, T.; Asangba, A. E.

    2013-12-01

    The description of the specific processes and artifacts that led to the creation of a data product provide a detailed picture of data provenance in the form of a workflow. The Site-Based Data Curation project, hosted by the Center for Informatics Research in Science and Scholarship at the University of Illinois, has been investigating how workflows can be used in developing curation processes and policies that move curation "upstream" in the research process. The team has documented an individual workflow for geobiology data collected during a single field trip to Yellowstone National Park. This specific workflow suggests a generalized three-part process for field data collection that comprises three distinct elements: a Planning Stage, a Fieldwork Stage, and a Processing and Analysis Stage. Beyond supplying an account of data provenance, the workflow has allowed the team to identify 1) points of intervention for curation processes and 2) data products that are likely candidates for sharing or deposit. Although these objects may be viewed by individual researchers as 'intermediate' data products, discussions with geobiology researchers have suggested that with appropriate packaging and description they may serve as valuable observational data for other researchers. Curation interventions may include the introduction of regularized data formats during the planning process, data description procedures, the identification and use of established controlled vocabularies, and data quality and validation procedures. We propose a poster that shows the individual workflow and our generalization into a three-stage process. We plan to discuss with attendees how well the three-stage view applies to other types of field-based research, likely points of intervention, and what kinds of interventions are appropriate and feasible in the example workflow.

  12. AlzPathway, an Updated Map of Curated Signaling Pathways: Towards Deciphering Alzheimer's Disease Pathogenesis.

    PubMed

    Ogishima, Soichi; Mizuno, Satoshi; Kikuchi, Masataka; Miyashita, Akinori; Kuwano, Ryozo; Tanaka, Hiroshi; Nakaya, Jun

    2016-01-01

    Alzheimer's disease (AD) is a complex neurodegenerative disorder in which loss of neurons and synaptic function causes dementia in the elderly. To clarify AD pathogenesis and develop drugs for AD, thousands of studies have elucidated signaling pathways involved. However, knowledge of AD signaling pathways has not been compiled as a pathway map. In this chapter, we introduce the manual construction of a pathway map in AD which we call "AlzPathway", that comprehensively catalogs signaling pathways in the field of AD. We have collected and manually curated over 100 review articles related to AD, and have built the AD pathway map. AlzPathway is currently composed of thousands of molecules and reactions in neurons, brain blood barrier, presynaptic, postsynaptic, astrocyte, and microglial cells, with their cellular localizations. AlzPathway provides a systems-biology platform of comprehensive AD signaling and related pathways which is expected to contribute to clarification of AD pathogenesis and AD drug development.

  13. CSTEM User Manual

    NASA Technical Reports Server (NTRS)

    Hartle, M.; McKnight, R. L.

    2000-01-01

    This manual is a combination of a user manual, theory manual, and programmer manual. The reader is assumed to have some previous exposure to the finite element method. This manual is written with the idea that the CSTEM (Coupled Structural Thermal Electromagnetic-Computer Code) user needs to have a basic understanding of what the code is actually doing in order to properly use the code. For that reason, the underlying theory and methods used in the code are described to a basic level of detail. The manual gives an overview of the CSTEM code: how the code came into existence, a basic description of what the code does, and the order in which it happens (a flowchart). Appendices provide a listing and very brief description of every file used by the CSTEM code, including the type of file it is, what routine regularly accesses the file, and what routine opens the file, as well as special features included in CSTEM.

  14. ComPPI: a cellular compartment-specific database for protein–protein interaction network analysis

    PubMed Central

    Veres, Daniel V.; Gyurkó, Dávid M.; Thaler, Benedek; Szalay, Kristóf Z.; Fazekas, Dávid; Korcsmáros, Tamás; Csermely, Peter

    2015-01-01

    Here we present ComPPI, a cellular compartment-specific database of proteins and their interactions enabling an extensive, compartmentalized protein–protein interaction network analysis (URL: http://ComPPI.LinkGroup.hu). ComPPI enables the user to filter biologically unlikely interactions, where the two interacting proteins have no common subcellular localizations and to predict novel properties, such as compartment-specific biological functions. ComPPI is an integrated database covering four species (S. cerevisiae, C. elegans, D. melanogaster and H. sapiens). The compilation of nine protein–protein interaction and eight subcellular localization data sets had four curation steps including a manually built, comprehensive hierarchical structure of >1600 subcellular localizations. ComPPI provides confidence scores for protein subcellular localizations and protein–protein interactions. ComPPI has user-friendly search options for individual proteins giving their subcellular localization, their interactions and the likelihood of their interactions considering the subcellular localization of their interacting partners. Download options of search results, whole-proteomes, organelle-specific interactomes and subcellular localization data are available on its website. Due to its novel features, ComPPI is useful for the analysis of experimental results in biochemistry and molecular biology, as well as for proteome-wide studies in bioinformatics and network science helping cellular biology, medicine and drug design. PMID:25348397

  15. MetaboLights: An Open-Access Database Repository for Metabolomics Data.

    PubMed

    Kale, Namrata S; Haug, Kenneth; Conesa, Pablo; Jayseelan, Kalaivani; Moreno, Pablo; Rocca-Serra, Philippe; Nainala, Venkata Chandrasekhar; Spicer, Rachel A; Williams, Mark; Li, Xuefei; Salek, Reza M; Griffin, Julian L; Steinbeck, Christoph

    2016-01-01

    MetaboLights is the first general purpose, open-access database repository for cross-platform and cross-species metabolomics research at the European Bioinformatics Institute (EMBL-EBI). Based upon the open-source ISA framework, MetaboLights provides Metabolomics Standard Initiative (MSI) compliant metadata and raw experimental data associated with metabolomics experiments. Users can upload their study datasets into the MetaboLights Repository. These studies are then automatically assigned a stable and unique identifier (e.g., MTBLS1) that can be used for publication reference. The MetaboLights Reference Layer associates metabolites with metabolomics studies in the archive and is extensively annotated with data fields such as structural and chemical information, NMR and MS spectra, target species, metabolic pathways, and reactions. The database is manually curated with no specific release schedules. MetaboLights is also recommended by journals for metabolomics data deposition. This unit provides a guide to using MetaboLights, downloading experimental data, and depositing metabolomics datasets using user-friendly submission tools.

  16. EK3D: an E. coli K antigen 3-dimensional structure database

    PubMed Central

    Kunduru, Bharathi Reddy; Nair, Sanjana Anilkumar; Rathinavelan, Thenmalarchelvi

    2016-01-01

    A very high rate of multidrug resistance (MDR) seen among Gram-negative bacteria such as Escherichia, Klebsiella, Salmonella, Shigella, etc. is a major threat to public health and safety. One of the major virulent determinants of Gram-negative bacteria is capsular polysaccharide or K antigen located on the bacterial outer membrane surface, which is a potential drug & vaccine target. It plays a key role in host–pathogen interactions as well as host immune evasion and thus, mandates detailed structural information. Nonetheless, acquiring structural information of K antigens is not straightforward due to their innate enormous conformational flexibility. Here, we have developed a manually curated database of K antigens corresponding to various E. coli serotypes, which differ from each other in their monosaccharide composition, linkage between the monosaccharides and their stereoisomeric forms. Subsequently, we have modeled their 3D structures and developed an organized repository, namely EK3D that can be accessed through www.iith.ac.in/EK3D/. Such a database would facilitate the development of antibacterial drugs to combat E. coli infections as it has evolved resistance against 2 major drugs namely, third-generation cephalosporins and fluoroquinolones. EK3D also enables the generation of polymeric K antigens of varying lengths and thus, provides comprehensive information about E. coli K antigens. PMID:26615200

  17. DDMGD: the database of text-mined associations between genes methylated in diseases from different species.

    PubMed

    Bin Raies, Arwa; Mansour, Hicham; Incitti, Roberto; Bajic, Vladimir B

    2015-01-01

    Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases.

  18. EcoliNet: a database of cofunctional gene network for Escherichia coli.

    PubMed

    Kim, Hanhae; Shim, Jung Eun; Shin, Junha; Lee, Insuk

    2015-01-01

    During the past several decades, Escherichia coli has been a treasure chest for molecular biology. The molecular mechanisms of many fundamental cellular processes have been discovered through research on this bacterium. Although much basic research now focuses on more complex model organisms, E. coli still remains important in metabolic engineering and synthetic biology. Despite its long history as a subject of molecular investigation, more than one-third of the E. coli genome has no pathway annotation supported by either experimental evidence or manual curation. Recently, a network-assisted genetics approach to the efficient identification of novel gene functions has increased in popularity. To accelerate the speed of pathway annotation for the remaining uncharacterized part of the E. coli genome, we have constructed a database of cofunctional gene network with near-complete genome coverage of the organism, dubbed EcoliNet. We find that EcoliNet is highly predictive for diverse bacterial phenotypes, including antibiotic response, indicating that it will be useful in prioritizing novel candidate genes for a wide spectrum of bacterial phenotypes. We have implemented a web server where biologists can easily run network algorithms over EcoliNet to predict novel genes involved in a pathway or novel functions for a gene. All integrated cofunctional associations can be downloaded, enabling orthology-based reconstruction of gene networks for other bacterial species as well. Database URL: http://www.inetbio.org/ecolinet.

  19. BaAMPs: the database of biofilm-active antimicrobial peptides.

    PubMed

    Di Luca, Mariagrazia; Maccari, Giuseppe; Maisetta, Giuseppantonio; Batoni, Giovanna

    2015-01-01

    Antimicrobial peptides (AMPs) are increasingly being considered as novel agents against biofilms. The development of AMP-based anti-biofilm strategies strongly relies on the design of sequences optimized to target specific features of sessile bacterial/fungal communities. Although several AMP databases have been created and successfully exploited for AMP design, all of these use data collected on peptides tested against planktonic microorganisms. Here, an open-access, manually curated database of AMPs specifically assayed against microbial biofilms (BaAMPs) is presented for the first time. In collecting relevant data from the literature an effort was made to define a minimal standard set of essential information including, for each AMP, the microbial species and biofilm conditions against which it was tested, and the specific assay and peptide concentration used. The availability of these data in an organized framework will benefit anti-biofilm research and support the design of novel molecules active against biofilm. BaAMPs is accessible at http://www.baamps.it. PMID:25760404

  20. EK3D: an E. coli K antigen 3-dimensional structure database.

    PubMed

    Kunduru, Bharathi Reddy; Nair, Sanjana Anilkumar; Rathinavelan, Thenmalarchelvi

    2016-01-01

    A very high rate of multidrug resistance (MDR) seen among Gram-negative bacteria such as Escherichia, Klebsiella, Salmonella, Shigella, etc. is a major threat to public health and safety. One of the major virulent determinants of Gram-negative bacteria is capsular polysaccharide or K antigen located on the bacterial outer membrane surface, which is a potential drug & vaccine target. It plays a key role in host-pathogen interactions as well as host immune evasion and thus, mandates detailed structural information. Nonetheless, acquiring structural information of K antigens is not straightforward due to their innate enormous conformational flexibility. Here, we have developed a manually curated database of K antigens corresponding to various E. coli serotypes, which differ from each other in their monosaccharide composition, linkage between the monosaccharides and their stereoisomeric forms. Subsequently, we have modeled their 3D structures and developed an organized repository, namely EK3D that can be accessed through www.iith.ac.in/EK3D/. Such a database would facilitate the development of antibacterial drugs to combat E. coli infections as it has evolved resistance against 2 major drugs namely, third-generation cephalosporins and fluoroquinolones. EK3D also enables the generation of polymeric K antigens of varying lengths and thus, provides comprehensive information about E. coli K antigens. PMID:26615200

  1. MetaboLights: An Open-Access Database Repository for Metabolomics Data.

    PubMed

    Kale, Namrata S; Haug, Kenneth; Conesa, Pablo; Jayseelan, Kalaivani; Moreno, Pablo; Rocca-Serra, Philippe; Nainala, Venkata Chandrasekhar; Spicer, Rachel A; Williams, Mark; Li, Xuefei; Salek, Reza M; Griffin, Julian L; Steinbeck, Christoph

    2016-01-01

    MetaboLights is the first general purpose, open-access database repository for cross-platform and cross-species metabolomics research at the European Bioinformatics Institute (EMBL-EBI). Based upon the open-source ISA framework, MetaboLights provides Metabolomics Standard Initiative (MSI) compliant metadata and raw experimental data associated with metabolomics experiments. Users can upload their study datasets into the MetaboLights Repository. These studies are then automatically assigned a stable and unique identifier (e.g., MTBLS1) that can be used for publication reference. The MetaboLights Reference Layer associates metabolites with metabolomics studies in the archive and is extensively annotated with data fields such as structural and chemical information, NMR and MS spectra, target species, metabolic pathways, and reactions. The database is manually curated with no specific release schedules. MetaboLights is also recommended by journals for metabolomics data deposition. This unit provides a guide to using MetaboLights, downloading experimental data, and depositing metabolomics datasets using user-friendly submission tools. PMID:27010336

  2. MediaDB: a database of microbial growth conditions in defined media.

    PubMed

    Richards, Matthew A; Cassen, Victor; Heavner, Benjamin D; Ajami, Nassim E; Herrmann, Andrea; Simeonidis, Evangelos; Price, Nathan D

    2014-01-01

    Isolating pure microbial cultures and cultivating them in the laboratory on defined media is used to more fully characterize the metabolism and physiology of organisms. However, identifying an appropriate growth medium for a novel isolate remains a challenging task. Even organisms with sequenced and annotated genomes can be difficult to grow, despite our ability to build genome-scale metabolic networks that connect genomic data with metabolic function. The scientific literature is scattered with information about defined growth media used successfully for cultivating a wide variety of organisms, but to date there exists no centralized repository to inform efforts to cultivate less characterized organisms by bridging the gap between genomic data and compound composition for growth media. Here we present MediaDB, a manually curated database of defined media that have been used for cultivating organisms with sequenced genomes, with an emphasis on organisms with metabolic network models. The database is accessible online, can be queried by keyword searches or downloaded in its entirety, and can generate exportable individual media formulation files. The data assembled in MediaDB facilitate comparative studies of organism growth media, serve as a starting point for formulating novel growth media, and contribute to formulating media for in silico investigation of metabolic networks. MediaDB is freely available for public use at https://mediadb.systemsbiology.net. PMID:25098325

  3. Industrial labor relations manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    The NASA Industrial Labor Relations Manual provides internal guidelines and procedures to assist NASA Field Installations in dealing with contractor labor management disputes, Service Contract Act variance hearings, and to provide access of Labor Union Representatives to NASA for the purpose of maintaining schedules and goals in connection with vital NASA programs. This manual will be revised by page changes as revisions become necessary. Initial distribution of this manual has been made to NASA Headquarters and Field Installations.

  4. The Molecular Biology Database Collection: an online compilation of relevant database resources

    PubMed Central

    Baxevanis, Andreas D.

    2000-01-01

    The Molecular Biology Database Collection represents an effort geared at making molecular biology database resources more accessible to biologists. This online resource, available at http://www.oup.co.uk/nar/Volume_28/Issue_01/html/gkd115_gml.html, is intended to serve as a searchable, up-to-date, centralized jumping-off point to individual Web sites. An emphasis has also been placed on including databases where new value is added to the underlying data by virtue of curation, new data connections, or other innovative approaches. PMID:10592167

  5. Database Manager

    ERIC Educational Resources Information Center

    Martin, Andrew

    2010-01-01

    It is normal practice today for organizations to store large quantities of records of related information as computer-based files or databases. Purposeful information is retrieved by performing queries on the data sets. The purpose of DATABASE MANAGER is to communicate to students the method by which the computer performs these queries. This…

  6. Maize databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  7. Instruct coders' manual

    NASA Technical Reports Server (NTRS)

    Friend, J.

    1971-01-01

    A manual designed both as an instructional manual for beginning coders and as a reference manual for the coding language INSTRUCT, is presented. The manual includes the major programs necessary to implement the teaching system and lists the limitation of current implementation. A detailed description is given of how to code a lesson, what buttons to push, and what utility programs to use. Suggestions for debugging coded lessons and the error messages that may be received during assembly or while running the lesson are given.

  8. EMSL Operations Manual

    SciTech Connect

    Foster, Nancy S.

    2009-06-18

    This manual is a general resource tool to assist EMSL users and Laboratory staff within EMSL locate official policy, practice and subject matter experts. It is not intended to replace or amend any formal Battelle policy or practice. Users of this manual should rely only on Battelle’s Standard Based Management System (SBMS) for official policy. No contractual commitment or right of any kind is created by this manual. Battelle management reserves the right to alter, change, or delete any information contained within this manual without prior notice.

  9. EMSL Operations Manual

    SciTech Connect

    Foster, Nancy S.

    2009-03-25

    This manual is a general resource tool to assist EMSL users and Laboratory staff within EMSL locate official policy, practice and subject matter experts. It is not intended to replace or amend any formal Battelle policy or practice. Users of this manual should rely only on Battelle’s Standard Based Management System (SBMS) for official policy. No contractual commitment or right of any kind is created by this manual. Battelle management reserves the right to alter, change, or delete any information contained within this manual without prior notice.

  10. Radiological Control Manual

    SciTech Connect

    Not Available

    1993-04-01

    This manual has been prepared by Lawrence Berkeley Laboratory to provide guidance for site-specific additions, supplements, and clarifications to the DOE Radiological Control Manual. The guidance provided in this manual is based on the requirements given in Title 10 Code of Federal Regulations Part 835, Radiation Protection for Occupational Workers, DOE Order 5480.11, Radiation Protection for Occupational Workers, and the DOE Radiological Control Manual. The topics covered are (1) excellence in radiological control, (2) radiological standards, (3) conduct of radiological work, (4) radioactive materials, (5) radiological health support operations, (6) training and qualification, and (7) radiological records.

  11. NASA's Earth Observatory Natural Event Tracker: Curating Metadata for Linking Data and Images to Natural Events

    NASA Astrophysics Data System (ADS)

    Ward, K.

    2015-12-01

    On any given date, there are multiple natural events occurring on our planet. Storms, wildfires, volcanoes and algal blooms can be analyzed and represented using multiple dataset parameters. These parameters, in turn, may be visualized in multiple ways and disseminated via multiple web services. Given these multiple-to-multiple relationships, we already have the makings of a microverse of linked data. In an attempt to begin putting this microverse to practical use, NASA's Earth Observatory Group has developed a prototype system called the Earth Observatory Natural Event Tracker (EONET). EONET is a metadata-driven service that is exploring digital curation as a means to adding value to the intersection of natural event-related data and existing web service-enabled visualization systems. A curated natural events database maps specific events to topical groups (e.g., storms, fires, volcanoes), from those groups to related web service visualization systems and, eventually, to the source data products themselves. I will discuss the complexities that arise from attempting to map event types to dataset parameters, and the issues of granularity that come from trying to define exactly what is, and what constrains, a single natural event, particularly in a system where one of the end goals is to provide a group-curated database.

  12. The Ribosomal Database Project (RDP).

    PubMed Central

    Maidak, B L; Olsen, G J; Larsen, N; Overbeek, R; McCaughey, M J; Woese, C R

    1996-01-01

    The Ribosomal Database Project (RDP) is a curated database that offers ribosome-related data, analysis services and associated computer programs. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams and various software for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (rdp.life.uiuc.edu), electronic mail (server@rdp.life.uiuc.edu), gopher (rdpgopher.life.uiuc.edu) and World Wide Web (WWW)(http://rdpwww.life.uiuc.edu/). The electronic mail and WWW servers provide ribosomal probe checking, screening for possible chimeric rRNA sequences, automated alignment and approximate phylogenetic placement of user-submitted sequences on an existing phylogenetic tree. PMID:8594608

  13. The Ribosomal Database Project (RDP).

    PubMed

    Maidak, B L; Olsen, G J; Larsen, N; Overbeek, R; McCaughey, M J; Woese, C R

    1996-01-01

    The Ribosomal Database Project (RDP) is a curated database that offers ribosome-related data, analysis services and associated computer programs. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams and various software for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (rdp.life.uiuc.edu), electronic mail (server@rdp.life.uiuc.edu), gopher (rdpgopher.life.uiuc.edu) and World Wide Web (WWW)(http://rdpwww.life.uiuc.edu/). The electronic mail and WWW servers provide ribosomal probe checking, screening for possible chimeric rRNA sequences, automated alignment and approximate phylogenetic placement of user-submitted sequences on an existing phylogenetic tree.

  14. Data Albums: An Event Driven Search, Aggregation and Curation Tool for Earth Science

    NASA Technical Reports Server (NTRS)

    Ramachandran, Rahul; Kulkarni, Ajinkya; Maskey, Manil; Bakare, Rohan; Basyal, Sabin; Li, Xiang; Flynn, Shannon

    2014-01-01

    Approaches used in Earth science research such as case study analysis and climatology studies involve discovering and gathering diverse data sets and information to support the research goals. To gather relevant data and information for case studies and climatology analysis is both tedious and time consuming. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. In cases where researchers are interested in studying a significant event, they have to manually assemble a variety of datasets relevant to it by searching the different distributed data systems. This paper presents a specialized search, aggregation and curation tool for Earth science to address these challenges. The search rool automatically creates curated 'Data Albums', aggregated collections of information related to a specific event, containing links to relevant data files [granules] from different instruments, tools and services for visualization and analysis, and information about the event contained in news reports, images or videos to supplement research analysis. Curation in the tool is driven via an ontology based relevancy ranking algorithm to filter out non relevant information and data.

  15. An emerging role: the nurse content curator.

    PubMed

    Brooks, Beth A

    2015-01-01

    A new phenomenon, the inverted or "flipped" classroom, assumes that students are no longer acquiring knowledge exclusively through textbooks or lectures. Instead, they are seeking out the vast amount of free information available to them online (the very essence of open source) to supplement learning gleaned in textbooks and lectures. With so much open-source content available to nursing faculty, it benefits the faculty to use readily available, technologically advanced content. The nurse content curator supports nursing faculty in its use of such content. Even more importantly, the highly paid, time-strapped faculty is not spending an inordinate amount of effort surfing for and evaluating content. The nurse content curator does that work, while the faculty uses its time more effectively to help students vet the truth, make meaning of the content, and learn to problem-solve. Brooks.

  16. Data Curation Education in Research Centers (DCERC)

    NASA Astrophysics Data System (ADS)

    Marlino, M. R.; Mayernik, M. S.; Kelly, K.; Allard, S.; Tenopir, C.; Palmer, C.; Varvel, V. E., Jr.

    2012-12-01

    Digital data both enable and constrain scientific research. Scientists are enabled by digital data to develop new research methods, utilize new data sources, and investigate new topics, but they also face new data collection, management, and preservation burdens. The current data workforce consists primarily of scientists who receive little formal training in data management and data managers who are typically educated through on-the-job training. The Data Curation Education in Research Centers (DCERC) program is investigating a new model for educating data professionals to contribute to scientific research. DCERC is a collaboration between the University of Illinois at Urbana-Champaign Graduate School of Library and Information Science, the University of Tennessee School of Information Sciences, and the National Center for Atmospheric Research. The program is organized around a foundations course in data curation and provides field experiences in research and data centers for both master's and doctoral students. This presentation will outline the aims and the structure of the DCERC program and discuss results and lessons learned from the first set of summer internships in 2012. Four masters students participated and worked with both data mentors and science mentors, gaining first hand experiences in the issues, methods, and challenges of scientific data curation. They engaged in a diverse set of topics, including climate model metadata, observational data management workflows, and data cleaning, documentation, and ingest processes within a data archive. The students learned current data management practices and challenges while developing expertise and conducting research. They also made important contributions to NCAR data and science teams by evaluating data management workflows and processes, preparing data sets to be archived, and developing recommendations for particular data management activities. The master's student interns will return in summer of 2013

  17. A Reflection on a Data Curation Journey

    PubMed Central

    van Zyl, Christa

    2015-01-01

    This commentary is a reflection on experience of data preservation and sharing (i.e., data curation) practices developed in a South African research organization. The lessons learned from this journey have echoes in the findings and recommendations emerging from the present study in Low and Middle-Income Countries (LMIC) and may usefully contribute to more general reflection on the management of change in data practice. PMID:26297756

  18. Demonstration and Validation Assets: User Manual Development

    SciTech Connect

    2008-06-30

    This report documents the development of a database-supported user manual for DEMVAL assets in the NSTI area of operations and focuses on providing comprehensive user information on DEMVAL assets serving businesses with national security technology applications in southern New Mexico. The DEMVAL asset program is being developed as part of the NSPP, funded by both Department of Energy (DOE) and NNSA. This report describes the development of a comprehensive user manual system for delivering indexed DEMVAL asset information to be used in marketing and visibility materials and to NSTI clients, prospective clients, stakeholders, and any person or organization seeking it. The data about area DEMVAL asset providers are organized in an SQL database with updateable application structure that optimizes ease of access and customizes search ability for the user.

  19. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking.

    PubMed

    Wang, Mingxun; Carver, Jeremy J; Phelan, Vanessa V; Sanchez, Laura M; Garg, Neha; Peng, Yao; Nguyen, Don Duy; Watrous, Jeramie; Kapono, Clifford A; Luzzatto-Knaan, Tal; Porto, Carla; Bouslimani, Amina; Melnik, Alexey V; Meehan, Michael J; Liu, Wei-Ting; Crüsemann, Max; Boudreau, Paul D; Esquenazi, Eduardo; Sandoval-Calderón, Mario; Kersten, Roland D; Pace, Laura A; Quinn, Robert A; Duncan, Katherine R; Hsu, Cheng-Chih; Floros, Dimitrios J; Gavilan, Ronnie G; Kleigrewe, Karin; Northen, Trent; Dutton, Rachel J; Parrot, Delphine; Carlson, Erin E; Aigle, Bertrand; Michelsen, Charlotte F; Jelsbak, Lars; Sohlenkamp, Christian; Pevzner, Pavel; Edlund, Anna; McLean, Jeffrey; Piel, Jörn; Murphy, Brian T; Gerwick, Lena; Liaw, Chih-Chuang; Yang, Yu-Liang; Humpf, Hans-Ulrich; Maansson, Maria; Keyzers, Robert A; Sims, Amy C; Johnson, Andrew R; Sidebottom, Ashley M; Sedio, Brian E; Klitgaard, Andreas; Larson, Charles B; Boya P, Cristopher A; Torres-Mendoza, Daniel; Gonzalez, David J; Silva, Denise B; Marques, Lucas M; Demarque, Daniel P; Pociute, Egle; O'Neill, Ellis C; Briand, Enora; Helfrich, Eric J N; Granatosky, Eve A; Glukhov, Evgenia; Ryffel, Florian; Houson, Hailey; Mohimani, Hosein; Kharbush, Jenan J; Zeng, Yi; Vorholt, Julia A; Kurita, Kenji L; Charusanti, Pep; McPhail, Kerry L; Nielsen, Kristian Fog; Vuong, Lisa; Elfeki, Maryam; Traxler, Matthew F; Engene, Niclas; Koyama, Nobuhiro; Vining, Oliver B; Baric, Ralph; Silva, Ricardo R; Mascuch, Samantha J; Tomasi, Sophie; Jenkins, Stefan; Macherla, Venkat; Hoffman, Thomas; Agarwal, Vinayak; Williams, Philip G; Dai, Jingqui; Neupane, Ram; Gurr, Joshua; Rodríguez, Andrés M C; Lamsa, Anne; Zhang, Chen; Dorrestein, Kathleen; Duggan, Brendan M; Almaliti, Jehad; Allard, Pierre-Marie; Phapale, Prasad; Nothias, Louis-Felix; Alexandrov, Theodore; Litaudon, Marc; Wolfender, Jean-Luc; Kyle, Jennifer E; Metz, Thomas O; Peryea, Tyler; Nguyen, Dac-Trung; VanLeer, Danielle; Shinn, Paul; Jadhav, Ajit; Müller, Rolf; Waters, Katrina M; Shi, Wenyuan; Liu, Xueting; Zhang, Lixin; Knight, Rob; Jensen, Paul R; Palsson, Bernhard Ø; Pogliano, Kit; Linington, Roger G; Gutiérrez, Marcelino; Lopes, Norberto P; Gerwick, William H; Moore, Bradley S; Dorrestein, Pieter C; Bandeira, Nuno

    2016-08-01

    The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data.

  20. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking.

    PubMed

    Wang, Mingxun; Carver, Jeremy J; Phelan, Vanessa V; Sanchez, Laura M; Garg, Neha; Peng, Yao; Nguyen, Don Duy; Watrous, Jeramie; Kapono, Clifford A; Luzzatto-Knaan, Tal; Porto, Carla; Bouslimani, Amina; Melnik, Alexey V; Meehan, Michael J; Liu, Wei-Ting; Crüsemann, Max; Boudreau, Paul D; Esquenazi, Eduardo; Sandoval-Calderón, Mario; Kersten, Roland D; Pace, Laura A; Quinn, Robert A; Duncan, Katherine R; Hsu, Cheng-Chih; Floros, Dimitrios J; Gavilan, Ronnie G; Kleigrewe, Karin; Northen, Trent; Dutton, Rachel J; Parrot, Delphine; Carlson, Erin E; Aigle, Bertrand; Michelsen, Charlotte F; Jelsbak, Lars; Sohlenkamp, Christian; Pevzner, Pavel; Edlund, Anna; McLean, Jeffrey; Piel, Jörn; Murphy, Brian T; Gerwick, Lena; Liaw, Chih-Chuang; Yang, Yu-Liang; Humpf, Hans-Ulrich; Maansson, Maria; Keyzers, Robert A; Sims, Amy C; Johnson, Andrew R; Sidebottom, Ashley M; Sedio, Brian E; Klitgaard, Andreas; Larson, Charles B; Boya P, Cristopher A; Torres-Mendoza, Daniel; Gonzalez, David J; Silva, Denise B; Marques, Lucas M; Demarque, Daniel P; Pociute, Egle; O'Neill, Ellis C; Briand, Enora; Helfrich, Eric J N; Granatosky, Eve A; Glukhov, Evgenia; Ryffel, Florian; Houson, Hailey; Mohimani, Hosein; Kharbush, Jenan J; Zeng, Yi; Vorholt, Julia A; Kurita, Kenji L; Charusanti, Pep; McPhail, Kerry L; Nielsen, Kristian Fog; Vuong, Lisa; Elfeki, Maryam; Traxler, Matthew F; Engene, Niclas; Koyama, Nobuhiro; Vining, Oliver B; Baric, Ralph; Silva, Ricardo R; Mascuch, Samantha J; Tomasi, Sophie; Jenkins, Stefan; Macherla, Venkat; Hoffman, Thomas; Agarwal, Vinayak; Williams, Philip G; Dai, Jingqui; Neupane, Ram; Gurr, Joshua; Rodríguez, Andrés M C; Lamsa, Anne; Zhang, Chen; Dorrestein, Kathleen; Duggan, Brendan M; Almaliti, Jehad; Allard, Pierre-Marie; Phapale, Prasad; Nothias, Louis-Felix; Alexandrov, Theodore; Litaudon, Marc; Wolfender, Jean-Luc; Kyle, Jennifer E; Metz, Thomas O; Peryea, Tyler; Nguyen, Dac-Trung; VanLeer, Danielle; Shinn, Paul; Jadhav, Ajit; Müller, Rolf; Waters, Katrina M; Shi, Wenyuan; Liu, Xueting; Zhang, Lixin; Knight, Rob; Jensen, Paul R; Palsson, Bernhard Ø; Pogliano, Kit; Linington, Roger G; Gutiérrez, Marcelino; Lopes, Norberto P; Gerwick, William H; Moore, Bradley S; Dorrestein, Pieter C; Bandeira, Nuno

    2016-08-01

    The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data. PMID:27504778

  1. School Fire Safety Manual.

    ERIC Educational Resources Information Center

    Arkansas State Dept. of Education, Little Rock. General Education Div.

    This manual provides the background information necessary for the planning of school fire safety programs by local school officials, particularly in Arkansas. The manual first discusses the need for such programs and cites the Arkansas state law regarding them. Policies established by the Arkansas State Board of Education to implement the legal…

  2. Manual for Refugee Sponsorship.

    ERIC Educational Resources Information Center

    Brewer, Kathleen; Taran, Patrick A.

    This manual provides guidelines for religious congregations interested in setting up programs to sponsor refugees. The manual describes the psychological implications of the refugee experience and discusses initial steps in organizing for sponsorship. Detailed information is provided for sponsors regarding: finances and mobilization of resources;…

  3. Technical Manual. The ACT®

    ERIC Educational Resources Information Center

    ACT, Inc., 2014

    2014-01-01

    This manual contains technical information about the ACT® college readiness assessment. The principal purpose of this manual is to document the technical characteristics of the ACT in light of its intended purposes. ACT regularly conducts research as part of the ongoing formative evaluation of its programs. The research is intended to ensure that…

  4. Quality Circle: Member Manual.

    ERIC Educational Resources Information Center

    Dewar, Donald L.

    This training manual provides information on the operation and function of "quality circles," a factory program introduced in Japan in which workers participate voluntarily in quality control and management decisions through various means of communication. Following an introduction, the manual presents a series of frequently asked questions about…

  5. School District Energy Manual.

    ERIC Educational Resources Information Center

    Association of School Business Officials International, Reston, VA.

    This manual serves as an energy conservation reference and management guide for school districts. The School District Energy Program (SDEP) is designed to provide information and/or assistance to school administrators planning to implement a comprehensive energy management program. The manual consists of 15 parts. Part 1 describes the SDEP; Parts…

  6. Dental Charting. Student's Manual.

    ERIC Educational Resources Information Center

    Weaver, Trudy Karlene; Apfel, Maura

    This manual is part of a series dealing with skills and information needed by students in dental assisting. The individualized student materials are suitable for classroom, laboratory, or cooperative training programs. This student manual contains four units covering the following topics: dental anatomical terminology; tooth numbering systems;…

  7. Indoor Air Quality Manual.

    ERIC Educational Resources Information Center

    Baldwin Union Free School District, NY.

    This manual identifies ways to improve a school's indoor air quality (IAQ) and discusses practical actions that can be carried out by school staff in managing air quality. The manual includes discussions of the many sources contributing to school indoor air pollution and the preventive planning for each including renovation and repair work,…

  8. Functional Handwriting Manual.

    ERIC Educational Resources Information Center

    Metzger, Louise; Lehotsky, Rutheda R.

    An inservice project to review the functional handwriting being taught in the Williamsport, Pennsylvania, school district produced a handwriting manual that provides teachers and students with models of letter forms and instructional exercises leading to the development of an individualized style of handwriting. The manual describes student…

  9. Boating Safety Training Manual.

    ERIC Educational Resources Information Center

    Coast Guard, Washington, DC.

    The training manual serves as the text for the Coast Guard's boating safety 32-hour course and for the D-8 Qualification Code Recertification Course. The manual is designed for self-study or for use with an instructor-led course. Each chapter concludes with a quiz to be used as a review of chaper content. Opening chapters review the use of the…

  10. Circulation Aide Training Manual.

    ERIC Educational Resources Information Center

    Bergeson, Alan O.

    This training manual provides instruction on shelving and other duties for student assistants in the learning resources center at the College of Dupage, located in Illinois. It is noted that prospective student circulation aides are required to read the manual and pass a written test on policies and procedures before they are allowed to shelve…

  11. Home Maintenance Manual.

    ERIC Educational Resources Information Center

    Richmond, Jim; And Others

    This manual, written especially for the Navajo and Hopi Indian Relocation Commission, is a simply worded, step-by-step guide to home maintenance for new homeowners. It can be used for self-study or it can serve as instructional material for a training class on home ownership. The manual is organized in nine sections that cover the following…

  12. Materials inventory management manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    This NASA Materials Inventory Management Manual (NHB 4100.1) is issued pursuant to Section 203(c)(1) of the National Aeronautics and Space Act of 1958 (42 USC 2473). It sets forth policy, performance standards, and procedures governing the acquisition, management and use of materials. This Manual is effective upon receipt.

  13. Automatic Versus Manual Indexing

    ERIC Educational Resources Information Center

    Vander Meulen, W. A.; Janssen, P. J. F. C.

    1977-01-01

    A comparative evaluation of results in terms of recall and precision from queries submitted to systems with automatic and manual subject indexing. Differences were attributed to query formulation. The effectiveness of automatic indexing was found equivalent to manual indexing. (Author/KP)

  14. Biology Laboratory Safety Manual.

    ERIC Educational Resources Information Center

    Case, Christine L.

    The Centers for Disease Control (CDC) recommends that schools prepare or adapt a biosafety manual, and that instructors develop a list of safety procedures applicable to their own lab and distribute it to each student. In this way, safety issues will be brought to each student's attention. This document is an example of such a manual. It contains…

  15. Police Robbery Control Manual.

    ERIC Educational Resources Information Center

    Ward, Richard H.; And Others

    The manual was designed as a practical guide for police department personnel in developing robbery control programs. An introductory chapter defines types of robberies and suggests administrative and operational uses of the manual. Research and control strategies are reported according to five robbery types: street (visible and non-visible),…

  16. Induction brazing manual

    NASA Technical Reports Server (NTRS)

    1971-01-01

    Manual presents standards and techniques which are known or are particular to specific industry, and is useful as guide in closing tolerance brazing. Material and equipment specifications, tool setting tables, and quality control data and instructions are included. Since similar standards are available, manual is supplementary reference.

  17. Interlibrary Loan Training Manual.

    ERIC Educational Resources Information Center

    Kaufman, David, Ed.

    This manual is designed to provide information to interlibrary loan (ILL) practitioners about the philosophy behind interlibrary loan, as well as a functional knowledge of its routines. The first part of the manual presents general and background material about interlibrary loans. Included in this section are a glossary of library terms, the…

  18. [Physiotherapy as manual therapy].

    PubMed

    Torstensen, T A; Nielsen, L L; Jensen, R; Reginiussen, T; Wiesener, T; Kirkesola, G; Mengshoel, A M

    1999-05-30

    Manual therapy includes methods where the therapist's hands are used to stretch, mobilize or manipulate the spinal column, paravertebral structures or extremity joints. The aims of these methods are to relieve pain and improve function. In Norway only specially qualified physiotherapists and chiropractors are authorized to perform manipulation of joints (high velocity thrust techniques). To become a qualified manual therapist in Norway one must have a minimum of two years of clinical practice as physiotherapist followed by two year full time postgraduate training in manual therapy (a total of six years). Historically the Norwegian manual therapy system was developed in the 1950s by physiotherapists and medical doctors in England (James Cyriax and James Mennell) and Norway. As a result doctors allowed physiotherapists to use manipulation as a treatment method of both spinal and peripheral joints. In 1957 the Norwegian health authorities introduced reimbursement for manual therapy performed by physiotherapists.

  19. Completing Invoices. Student's Manual and Instructor's Manual.

    ERIC Educational Resources Information Center

    Hamer, Jean

    Supporting performance objective 46 of the V-TECS (Vocational-Technical Education Consortium of States) Secretarial Catalog, both a set of student materials and an instructor's manual on completing invoices are included in this packet. (The packet is the fifth in a set of nine on performing computational clerical activities--CE 016 951-959.) The…

  20. Research Problems in Data Curation: Outcomes from the Data Curation Education in Research Centers Program

    NASA Astrophysics Data System (ADS)

    Palmer, C. L.; Mayernik, M. S.; Weber, N.; Baker, K. S.; Kelly, K.; Marlino, M. R.; Thompson, C. A.

    2013-12-01

    The need for data curation is being recognized in numerous institutional settings as national research funding agencies extend data archiving mandates to cover more types of research grants. Data curation, however, is not only a practical challenge. It presents many conceptual and theoretical challenges that must be investigated to design appropriate technical systems, social practices and institutions, policies, and services. This presentation reports on outcomes from an investigation of research problems in data curation conducted as part of the Data Curation Education in Research Centers (DCERC) program. DCERC is developing a new model for educating data professionals to contribute to scientific research. The program is organized around foundational courses and field experiences in research and data centers for both master's and doctoral students. The initiative is led by the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign, in collaboration with the School of Information Sciences at the University of Tennessee, and library and data professionals at the National Center for Atmospheric Research (NCAR). At the doctoral level DCERC is educating future faculty and researchers in data curation and establishing a research agenda to advance the field. The doctoral seminar, Research Problems in Data Curation, was developed and taught in 2012 by the DCERC principal investigator and two doctoral fellows at the University of Illinois. It was designed to define the problem space of data curation, examine relevant concepts and theories related to both technical and social perspectives, and articulate research questions that are either unexplored or under theorized in the current literature. There was a particular emphasis on the Earth and environmental sciences, with guest speakers brought in from NCAR, National Snow and Ice Data Center (NSIDC), and Rensselaer Polytechnic Institute. Through the assignments, students

  1. Digital Management and Curation of the National Rock and Ore Collections at NMNH, Smithsonian

    NASA Astrophysics Data System (ADS)

    Cottrell, E.; Andrews, B.; Sorensen, S. S.; Hale, L. J.

    2011-12-01

    The National Museum of Natural History, Smithsonian Institution, is home to the world's largest curated rock collection. The collection houses 160,680 physical rock and ore specimen lots ("samples"), all of which already have a digital record that can be accessed by the public through a searchable web interface (http://collections.mnh.si.edu/search/ms/). In addition, there are 66 accessions pending that when catalogued will add approximately 60,000 specimen lots. NMNH's collections are digitally managed on the KE EMu° platform which has emerged as the premier system for managing collections in natural history museums worldwide. In 2010 the Smithsonian released an ambitious 5 year Digitization Strategic Plan. In Mineral Sciences, new digitization efforts in the next five years will focus on integrating various digital resources for volcanic specimens. EMu sample records will link to the corresponding records for physical eruption information housed within the database of Smithsonian's Global Volcanism Program (GVP). Linkages are also planned between our digital records and geochemical databases (like EarthChem or PetDB) maintained by third parties. We anticipate that these linkages will increase the use of NMNH collections as well as engender new scholarly directions for research. Another large project the museum is currently undertaking involves the integration of the functionality of in-house designed Transaction Management software with the EMu database. This will allow access to the details (borrower, quantity, date, and purpose) of all loans of a given specimen through its catalogue record. We hope this will enable cross-referencing and fertilization of research ideas while avoiding duplicate efforts. While these digitization efforts are critical, we propose that the greatest challenge to sample curation is not posed by digitization and that a global sample registry alone will not ensure that samples are available for reuse. We suggest instead that the ability

  2. Searching the MINT database for protein interaction information.

    PubMed

    Cesareni, Gianni; Chatr-aryamontri, Andrew; Licata, Luana; Ceol, Arnaud

    2008-06-01

    The Molecular Interactions Database (MINT) is a relational database designed to store information about protein interactions. Expert curators extract the relevant information from the scientific literature and deposit it in a computer readable form. Currently (April 2008), MINT contains information on almost 29,000 proteins and more than 100,000 interactions from more than 30 model organisms. This unit provides protocols for searching MINT over the Internet, using the MINT Viewer.

  3. Semantic mediation in the national geologic map database (US)

    USGS Publications Warehouse

    Percy, D.; Richard, S.; Soller, D.

    2008-01-01

    Controlled language is the primary challenge in merging heterogeneous databases of geologic information. Each agency or organization produces databases with different schema, and different terminology for describing the objects within. In order to make some progress toward merging these databases using current technology, we have developed software and a workflow that allows for the "manual semantic mediation" of these geologic map databases. Enthusiastic support from many state agencies (stakeholders and data stewards) has shown that the community supports this approach. Future implementations will move toward a more Artificial Intelligence-based approach, using expert-systems or knowledge-bases to process data based on the training sets we have developed manually.

  4. Challenges for an enzymatic reaction kinetics database.

    PubMed

    Wittig, Ulrike; Rey, Maja; Kania, Renate; Bittkowski, Meik; Shi, Lei; Golebiewski, Martin; Weidemann, Andreas; Müller, Wolfgang; Rojas, Isabel

    2014-01-01

    The scientific literature contains a tremendous amount of kinetic data describing the dynamic behaviour of biochemical reactions over time. These data are needed for computational modelling to create models of biochemical reaction networks and to obtain a better understanding of the processes in living cells. To extract the knowledge from the literature, biocurators are required to understand a paper and interpret the data. For modellers, as well as experimentalists, this process is very time consuming because the information is distributed across the publication and, in most cases, is insufficiently structured and often described without standard terminology. In recent years, biological databases for different data types have been developed. The advantages of these databases lie in their unified structure, searchability and the potential for augmented analysis by software, which supports the modelling process. We have developed the SABIO-RK database for biochemical reaction kinetics. In the present review, we describe the challenges for database developers and curators, beginning with an analysis of relevant publications up to the export of database information in a standardized format. The aim of the present review is to draw the experimentalist's attention to the problem (from a data integration point of view) of incompletely and imprecisely written publications. We describe how to lower the barrier to curators and improve this situation. At the same time, we are aware that curating experimental data takes time. There is a community concerned with making the task of publishing data with the proper structure and annotation to ontologies much easier. In this respect, we highlight some useful initiatives and tools.

  5. Fire Protection Program Manual

    SciTech Connect

    Sharry, J A

    2012-05-18

    This manual documents the Lawrence Livermore National Laboratory (LLNL) Fire Protection Program. Department of Energy (DOE) Orders 420.1B, Facility Safety, requires LLNL to have a comprehensive and effective fire protection program that protects LLNL personnel and property, the public and the environment. The manual provides LLNL and its facilities with general information and guidance for meeting DOE 420.1B requirements. The recommended readers for this manual are: fire protection officers, fire protection engineers, fire fighters, facility managers, directorage assurance managers, facility coordinators, and ES and H team members.

  6. Salinas : theory manual.

    SciTech Connect

    Walsh, Timothy Francis; Reese, Garth M.; Bhardwaj, Manoj Kumar

    2004-08-01

    This manual describes the theory behind many of the constructs in Salinas. For a more detailed description of how to use Salinas , we refer the reader to Salinas, User's Notes. Many of the constructs in Salinas are pulled directly from published material. Where possible, these materials are referenced herein. However, certain functions in Salinas are specific to our implementation. We try to be far more complete in those areas. The theory manual was developed from several sources including general notes, a programer-notes manual, the user's notes and of course the material in the open literature.

  7. Prophylactic and curative foodstuffs from topinambour

    NASA Astrophysics Data System (ADS)

    Bobrovnik, Leonid D.; Gulyi, Ivan S.; Remeslo, Natalia V.; Yefimov, Andrey S.; Melnik, Ivan M.; Vysotskiy, Vadim G.

    1999-01-01

    Topinambour belongs to medicinal pants with unique useful chemical composition which is characterized by a wide range of biological-active substances. This composition stipulates high nutrient value of topinambour. On this basis the researchers of the Ukrainian Sate University of Food Technologies elaborated some technologies of prophylactic- curative food-stuffs. These products promote the strengthening of immune system, therefore they are very useful for all people, particularly for those who live on the region contaminated by radionuclides. The results of medical-biological and clinical investigations have been discussed in this paper.

  8. Improving the Acquisition and Management of Sample Curation Data

    NASA Technical Reports Server (NTRS)

    Todd, Nancy S.; Evans, Cindy A.; Labasse, Dan

    2011-01-01

    This paper discusses the current sample documentation processes used during and after a mission, examines the challenges and special considerations needed for designing effective sample curation data systems, and looks at the results of a simulated sample result mission and the lessons learned from this simulation. In addition, it introduces a new data architecture for an integrated sample Curation data system being implemented at the NASA Astromaterials Acquisition and Curation department and discusses how it improves on existing data management systems.

  9. Creating a VAPEPS database: A VAPEPS tutorial

    NASA Technical Reports Server (NTRS)

    Graves, George

    1989-01-01

    A procedural method is outlined for creating a Vibroacoustic Payload Environment Prediction System (VAPEPS) Database. The method of presentation employs flowcharts of sequential VAPEPS Commands used to create a VAPEPS Database. The commands are accompanied by explanatory text to the right of the command in order to minimize the need for repetitive reference to the VAPEPS user's manual. The method is demonstrated by examples of varying complexity. It is assumed that the reader has acquired a basic knowledge of the VAPEPS software program.

  10. Crowdsourcing and curation: perspectives from biology and natural language processing

    PubMed Central

    Hirschman, Lynette; Fort, Karën; Boué, Stéphanie; Kyrpides, Nikos; Islamaj Doğan, Rezarta; Cohen, Kevin Bretonnel

    2016-01-01

    Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging ‘the crowd’; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9–11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing. Database URL: http://www.mitre.org/publications/technical-papers/crowdsourcing-and-curation-perspectives PMID:27504010

  11. Crowdsourcing and curation: perspectives from biology and natural language processing.

    PubMed

    Hirschman, Lynette; Fort, Karën; Boué, Stéphanie; Kyrpides, Nikos; Islamaj Doğan, Rezarta; Cohen, Kevin Bretonnel

    2016-01-01

    Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging 'the crowd'; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9-11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.Database URL: http://www.mitre.org/publications/technical-papers/crowdsourcing-and-curation-perspectives. PMID:27504010

  12. Crowdsourcing and curation: perspectives from biology and natural language processing.

    PubMed

    Hirschman, Lynette; Fort, Karën; Boué, Stéphanie; Kyrpides, Nikos; Islamaj Doğan, Rezarta; Cohen, Kevin Bretonnel

    2016-01-01

    Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging 'the crowd'; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9-11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.Database URL: http://www.mitre.org/publications/technical-papers/crowdsourcing-and-curation-perspectives.

  13. Peace Corps Aquaculture Training Manual. Training Manual T0057.

    ERIC Educational Resources Information Center

    Peace Corps, Washington, DC. Information Collection and Exchange Div.

    This Peace Corps training manual was developed from two existing manuals to provide a comprehensive training program in fish production for Peace Corps volunteers. The manual encompasses the essential elements of the University of Oklahoma program that has been training volunteers in aquaculture for 25 years. The 22 chapters of the manual are…

  14. FDAS EPROM programming manual

    SciTech Connect

    Baker, J.

    1982-07-01

    This brief manual is intended as a guide for producing programmed Erasable-Programmable-Read-Only-Memories (EPROMs) for the microprocessor-based data acquisition system. Also included is a section on verifying the EPROMs in an FDAS unit.

  15. Geochemical engineering reference manual

    SciTech Connect

    Owen, L.B.; Michels, D.E.

    1984-01-01

    The following topics are included in this manual: physical and chemical properties of geothermal brine and steam, scale and solids control, processing spent brine for reinjection, control of noncondensable gas emissions, and goethermal mineral recovery. (MHR)

  16. Interactive Office user's manual

    NASA Technical Reports Server (NTRS)

    Montgomery, Edward E.; Lowers, Benjamin; Nabors, Terri L.

    1990-01-01

    Given here is a user's manual for Interactive Office (IO), an executive office tool for organization and planning, written specifically for Macintosh. IO is a paperless management tool to automate a related group of individuals into one productive system.

  17. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  18. MaizeGDB update: New tools, data, and interface for the maize model organism database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB is a highly curated, community-oriented database and informatics service to researchers focused on the crop plant and model organism Zea mays ssp. mays. Although some form of the maize community database has existed over the last 25 years, there have only been two major releases. In 1991, ...

  19. CottonGen: a genomics, genetics and breeding database for cotton research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, vis...

  20. VAPEPS user's reference manual, version 5.0

    NASA Technical Reports Server (NTRS)

    Park, D. M.

    1988-01-01

    This is the reference manual for the VibroAcoustic Payload Environment Prediction System (VAPEPS). The system consists of a computer program and a vibroacoustic database. The purpose of the system is to collect measurements of vibroacoustic data taken from flight events and ground tests, and to retrieve this data and provide a means of using the data to predict future payload environments. This manual describes the operating language of the program. Topics covered include database commands, Statistical Energy Analysis (SEA) prediction commands, stress prediction command, and general computational commands.

  1. OCDB: a database collecting genes, miRNAs and drugs for obsessive-compulsive disorder

    PubMed Central

    Privitera, Anna P.; Distefano, Rosario; Wefer, Hugo A.; Ferro, Alfredo; Pulvirenti, Alfredo; Giugno, Rosalba

    2015-01-01

    Obsessive-compulsive disorder (OCD) is a psychiatric condition characterized by intrusive and unwilling thoughts (obsessions) giving rise to anxiety. The patients feel obliged to perform a behavior (compulsions) induced by the obsessions. The World Health Organization ranks OCD as one of the 10 most disabling medical conditions. In the class of Anxiety Disorders, OCD is a pathology that shows an hereditary component. Consequently, an online resource collecting and integrating scientific discoveries and genetic evidence about OCD would be helpful to improve the current knowledge on this disorder. We have developed a manually curated database, OCD Database (OCDB), collecting the relations between candidate genes in OCD, microRNAs (miRNAs) involved in the pathophysiology of OCD and drugs used in its treatments. We have screened articles from PubMed and MEDLINE. For each gene, the bibliographic references with a brief description of the gene and the experimental conditions are shown. The database also lists the polymorphisms within genes and its chromosomal regions. OCDB data is enriched with both validated and predicted miRNA-target and drug-target information. The transcription factors regulations, which are also included, are taken from David and TransmiR. Moreover, a scoring function ranks the relevance of data in the OCDB context. The database is also integrated with the main online resources (PubMed, Entrez-gene, HGNC, dbSNP, DrugBank, miRBase, PubChem, Kegg, Disease-ontology and ChEBI). The web interface has been developed using phpMyAdmin and Bootstrap software. This allows (i) to browse data by category and (ii) to navigate in the database by searching genes, miRNAs, drugs, SNPs, regions, drug targets and articles. The data can be exported in textual format as well as the whole database in.sql or tabular format. OCDB is an essential resource to support genome-wide analysis, genetic and pharmacological studies. It also facilitates the evaluation of genetic data

  2. Sequence resources at the Candida Genome Database.

    PubMed

    Arnaud, Martha B; Costanzo, Maria C; Skrzypek, Marek S; Shah, Prachi; Binkley, Gail; Lane, Christopher; Miyasato, Stuart R; Sherlock, Gavin

    2007-01-01

    The Candida Genome Database (CGD, http://www.candidagenome.org/) contains a curated collection of genomic information and community resources for researchers who are interested in the molecular biology of the opportunistic pathogen Candida albicans. With the recent release of a new assembly of the C.albicans genome, Assembly 20, C.albicans genomics has entered a new era. Although the C.albicans genome assembly continues to undergo refinement, multiple assemblies and gene nomenclatures will remain in widespread use by the research community. CGD has now taken on the responsibility of maintaining the most up-to-date version of the genome sequence by providing the data from this new assembly alongside the data from the previous assemblies, as well as any future corrections and refinements. In this database update, we describe the sequence information available for C.albicans, the sequence information contained in CGD, and the tools for sequence retrieval, analysis and comparison that CGD provides. CGD is freely accessible at http://www.candidagenome.org/ and CGD curators may be contacted by email at candida-curator@genome.stanford.edu.

  3. PAH Mutation Analysis Consortium Database: 1997. Prototype for relational locus-specific mutation databases.

    PubMed Central

    Nowacki, P M; Byck, S; Prevost, L; Scriver, C R

    1998-01-01

    PAHdb (http://www.mcgill.ca/pahdb ) is a curated relational database (Fig. 1) of nucleotide variation in the human PAH cDNA (GenBank U49897). Among 328 different mutations by state (Fig. 2) the majority are rare mutations causing hyperphenylalaninemia (HPA) (OMIM 261600), the remainder are polymorphic variants without apparent effect on phenotype. PAHdb modules contain mutations, polymorphic haplotypes, genotype-phenotype correlations, expression analysis, sources of information and the reference sequence; the database also contains pages of clinical information and data on three ENU mouse orthologues of human HPA. Only six different mutations account for 60% of human HPA chromosomes worldwide, mutations stratify by population and geographic region, and the Oriental and Caucasian mutation sets are different (Fig. 3). PAHdb provides curated electronic publication and one third of its incoming reports are direct submissions. Each different mutation receives a systematic (nucleotide) name and a unique identifier (UID). Data are accessed both by a Newsletter and a search engine on the website; integrity of the database is ensured by keeping the curated template offline. There have been >6500 online interrogations of the website. PMID:9399840

  4. Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing

    PubMed Central

    Burger, John D.; Doughty, Emily; Khare, Ritu; Wei, Chih-Hsuan; Mishra, Rajashree; Aberdeen, John; Tresner-Kirsch, David; Wellner, Ben; Kann, Maricel G.; Lu, Zhiyong; Hirschman, Lynette

    2014-01-01

    Background: This article describes capture of biological information using a hybrid approach that combines natural language processing to extract biological entities and crowdsourcing with annotators recruited via Amazon Mechanical Turk to judge correctness of candidate biological relations. These techniques were applied to extract gene– mutation relations from biomedical abstracts with the goal of supporting production scale capture of gene–mutation–disease findings as an open source resource for personalized medicine. Results: The hybrid system could be configured to provide good performance for gene–mutation extraction (precision ∼82%; recall ∼70% against an expert-generated gold standard) at a cost of $0.76 per abstract. This demonstrates that crowd labor platforms such as Amazon Mechanical Turk can be used to recruit quality annotators, even in an application requiring subject matter expertise; aggregated Turker judgments for gene–mutation relations exceeded 90% accuracy. Over half of the precision errors were due to mismatches against the gold standard hidden from annotator view (e.g. incorrect EntrezGene identifier or incorrect mutation position extracted), or incomplete task instructions (e.g. the need to exclude nonhuman mutations). Conclusions: The hybrid curation model provides a readily scalable cost-effective approach to curation, particularly if coupled with expert human review to filter precision errors. We plan to generalize the framework and make it available as open source software. Database URL: http://www.mitre.org/publications/technical-papers/hybrid-curation-of-gene-mutation-relations-combining-automated PMID:25246425

  5. Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems

    PubMed Central

    Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K.; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C.; Hoeng, Julia

    2015-01-01

    With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com PMID:25887162

  6. Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems.

    PubMed

    Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C; Hoeng, Julia

    2015-01-01

    With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com

  7. hERGAPDbase: a database documenting hERG channel inhibitory potentials and APD-prolongation activities of chemical compounds.

    PubMed

    Hishigaki, Haretsugu; Kuhara, Satoru

    2011-01-01

    Drug-induced QT interval prolongation is one of the most common reasons for the withdrawal of drugs from the market. In the past decade, at least nine drugs, i.e. terfenadine, astemizole, grepafloxacin, terodiline, droperidol, lidoflazine, sertindole, levomethadyl and cisapride, have been removed from the market or their use has been severely restricted because of drug-induced QT interval prolongation. Therefore, this irregularity is a major safety concern in the case of drugs submitted for regulatory approval. The most common mechanism of drug-induced QT interval prolongation may be drug-related inhibition of the human ether-á-go-go-related gene (hERG) channel, which subsequently results in prolongation of the cardiac action potential duration (APD). hERGAPDbase is a database of electrophysiological experimental data documenting potential hERG channel inhibitory actions and the APD-prolongation activities of chemical compounds. All data entries are manually collected from scientific papers and curated by a person. With hERGAPDbase, we aim to provide useful information for chemical and pharmacological scientists and enable easy access to electrophysiological experimental data on chemical compounds. Database URL: http://www.grt.kyushu-u.ac.jp/hergapdbase/.

  8. hERGAPDbase: a database documenting hERG channel inhibitory potentials and APD-prolongation activities of chemical compounds.

    PubMed

    Hishigaki, Haretsugu; Kuhara, Satoru

    2011-01-01

    Drug-induced QT interval prolongation is one of the most common reasons for the withdrawal of drugs from the market. In the past decade, at least nine drugs, i.e. terfenadine, astemizole, grepafloxacin, terodiline, droperidol, lidoflazine, sertindole, levomethadyl and cisapride, have been removed from the market or their use has been severely restricted because of drug-induced QT interval prolongation. Therefore, this irregularity is a major safety concern in the case of drugs submitted for regulatory approval. The most common mechanism of drug-induced QT interval prolongation may be drug-related inhibition of the human ether-á-go-go-related gene (hERG) channel, which subsequently results in prolongation of the cardiac action potential duration (APD). hERGAPDbase is a database of electrophysiological experimental data documenting potential hERG channel inhibitory actions and the APD-prolongation activities of chemical compounds. All data entries are manually collected from scientific papers and curated by a person. With hERGAPDbase, we aim to provide useful information for chemical and pharmacological scientists and enable easy access to electrophysiological experimental data on chemical compounds. Database URL: http://www.grt.kyushu-u.ac.jp/hergapdbase/. PMID:21586548

  9. AromaDeg, a novel database for phylogenomics of aerobic bacterial degradation of aromatics.

    PubMed

    Duarte, Márcia; Jauregui, Ruy; Vilchez-Vargas, Ramiro; Junca, Howard; Pieper, Dietmar H

    2014-01-01

    Understanding prokaryotic transformation of recalcitrant pollutants and the in-situ metabolic nets require the integration of massive amounts of biological data. Decades of biochemical studies together with novel next-generation sequencing data have exponentially increased information on aerobic aromatic degradation pathways. However, the majority of protein sequences in public databases have not been experimentally characterized and homology-based methods are still the most routinely used approach to assign protein function, allowing the propagation of misannotations. AromaDeg is a web-based resource targeting aerobic degradation of aromatics that comprises recently updated (September 2013) and manually curated databases constructed based on a phylogenomic approach. Grounded in phylogenetic analyses of protein sequences of key catabolic protein families and of proteins of documented function, AromaDeg allows query and data mining of novel genomic, metagenomic or metatranscriptomic data sets. Essentially, each query sequence that match a given protein family of AromaDeg is associated to a specific cluster of a given phylogenetic tree and further function annotation and/or substrate specificity may be inferred from the neighboring cluster members with experimentally validated function. This allows a detailed characterization of individual protein superfamilies as well as high-throughput functional classifications. Thus, AromaDeg addresses the deficiencies of homology-based protein function prediction, combining phylogenetic tree construction and integration of experimental data to obtain more accurate annotations of new biological data related to aerobic aromatic biodegradation pathways. We pursue in future the expansion of AromaDeg to other enzyme families involved in aromatic degradation and its regular update. Database URL: http://aromadeg.siona.helmholtz-hzi.de

  10. Evolutionary Characters, Phenotypes and Ontologies: Curating Data from the Systematic Biology Literature

    PubMed Central

    Dahdul, Wasila M.; Balhoff, James P.; Engeman, Jeffrey; Grande, Terry; Hilton, Eric J.; Kothari, Cartik; Lapp, Hilmar; Lundberg, John G.; Midford, Peter E.; Vision, Todd J.; Westerfield, Monte; Mabee, Paula M.

    2010-01-01

    Background The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. Methodology/Principal Findings We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. Conclusions/Significance The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes

  11. Drinking Water Treatability Database (Database)

    EPA Science Inventory

    The drinking Water Treatability Database (TDB) will provide data taken from the literature on the control of contaminants in drinking water, and will be housed on an interactive, publicly-available USEPA web site. It can be used for identifying effective treatment processes, rec...

  12. The Role of Preoperative TIPSS to Facilitate Curative Gastric Surgery

    SciTech Connect

    Norton, S.A.; Vickers, J.; Callaway, M.P. Alderson, D.

    2003-08-15

    The use of TIPSS to facilitate radical curative upper gastrointestinal surgery has not been reported. We describe a case in which curative gastric resection was performed for carcinoma of the stomach after a preoperative TIPSS and embolization of a large gastric varix in a patient with portal hypertension.

  13. The MIntAct Project and Molecular Interaction Databases.

    PubMed

    Licata, Luana; Orchard, Sandra

    2016-01-01

    Molecular interaction databases collect, organize, and enable the analysis of the increasing amounts of molecular interaction data being produced and published as we move towards a more complete understanding of the interactomes of key model organisms. The organization of these data in a structured format supports analyses such as the modeling of pairwise relationships between interactors into interaction networks and is a powerful tool for understanding the complex molecular machinery of the cell. This chapter gives an overview of the principal molecular interaction databases, in particular the IMEx databases, and their curation policies, use of standardized data formats and quality control rules. Special attention is given to the MIntAct project, in which IntAct and MINT joined forces to create a single resource to improve curation and software development efforts. This is exemplified as a model for the future of molecular interaction data collation and dissemination. PMID:27115627

  14. Data mining in forensic image databases

    NASA Astrophysics Data System (ADS)

    Geradts, Zeno J.; Bijhold, Jurrien

    2002-07-01

    Forensic Image Databases appear in a wide variety. The oldest computer database is with fingerprints. Other examples of databases are shoeprints, handwriting, cartridge cases, toolmarks drugs tablets and faces. In these databases searches are conducted on shape, color and other forensic features. There exist a wide variety of methods for searching in images in these databases. The result will be a list of candidates that should be compared manually. The challenge in forensic science is to combine the information acquired. The combination of the shape of a partial shoe print with information on a cartridge case can result in stronger evidence. It is expected that searching in the combination of these databases with other databases (e.g. network traffic information) more crimes will be solved. Searching in image databases is still difficult, as we can see in databases of faces. Due to lighting conditions and altering of the face by aging, it is nearly impossible to find a right face from a database of one million faces in top position by a image searching method, without using other information. The methods for data mining in images in databases (e.g. MPEG-7 framework) are discussed, and the expectations of future developments are presented in this study.

  15. Radmodl Modeling Manual

    SciTech Connect

    Murphy, S.L.; Stevens, K.A.; Weister, T.E.

    1993-04-01

    RADMODL is a set of computer codes that model the transport of material through a network of interconnected compartments. This version of RADMODL is designed to model the transport of radioactive material from either a loss of coolant accident (LOCA) or a fuel handling accident into a set of interconnected compartments and determine the equivalent dose each compartment will see for the duration of the accident. The code is flexible enough to allow modeling of other types of materials and their dilution/dispersion during transport by designing a new set of input files corresponding to the facility layout and the dilution/dispersion factors. This report documents the issuance of the Modeling Manual, originally developed by EGS Corporation and ISSI as volume 1 of the Programmer`s Manual, with the incorporation of minor editorial comments. Though a Modeling Manual in itself is not required for the RADMODL code certification, this manual does contain information that is required for the one of the requirements for code User`s Manual and hence certification.

  16. Sample Curation at a Lunar Outpost

    NASA Technical Reports Server (NTRS)

    Allen, Carlton C.; Lofgren, Gary E.; Treiman, A. H.; Lindstrom, Marilyn L.

    2007-01-01

    The six Apollo surface missions returned 2,196 individual rock and soil samples, with a total mass of 381.6 kg. Samples were collected based on visual examination by the astronauts and consultation with geologists in the science back room in Houston. The samples were photographed during collection, packaged in uniquely-identified containers, and transported to the Lunar Module. All samples collected on the Moon were returned to Earth. NASA's upcoming return to the Moon will be different. Astronauts will have extended stays at an out-post and will collect more samples than they will return. They will need curation and analysis facilities on the Moon in order to carefully select samples for return to Earth.

  17. HMDB: the Human Metabolome Database

    PubMed Central

    Wishart, David S.; Tzur, Dan; Knox, Craig; Eisner, Roman; Guo, An Chi; Young, Nelson; Cheng, Dean; Jewell, Kevin; Arndt, David; Sawhney, Summit; Fung, Chris; Nikolai, Lisa; Lewis, Mike; Coutouly, Marie-Aude; Forsythe, Ian; Tang, Peter; Shrivastava, Savita; Jeroncic, Kevin; Stothard, Paul; Amegbey, Godwin; Block, David; Hau, David. D.; Wagner, James; Miniaci, Jessica; Clements, Melisa; Gebremedhin, Mulu; Guo, Natalie; Zhang, Ying; Duggan, Gavin E.; MacInnis, Glen D.; Weljie, Alim M.; Dowlatabadi, Reza; Bamforth, Fiona; Clive, Derrick; Greiner, Russ; Li, Liang; Marrie, Tom; Sykes, Brian D.; Vogel, Hans J.; Querengesser, Lori

    2007-01-01

    The Human Metabolome Database (HMDB) is currently the most complete and comprehensive curated collection of human metabolite and human metabolism data in the world. It contains records for more than 2180 endogenous metabolites with information gathered from thousands of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the HMDB also contains an extensive collection of experimental metabolite concentration data compiled from hundreds of mass spectra (MS) and Nuclear Magnetic resonance (NMR) metabolomic analyses performed on urine, blood and cerebrospinal fluid samples. This is further supplemented with thousands of NMR and MS spectra collected on purified, reference metabolites. Each metabolite entry in the HMDB contains an average of 90 separate data fields including a comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, biofluid concentrations, disease associations, pathway information, enzyme data, gene sequence data, SNP and mutation data as well as extensive links to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided. The HMDB is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community. The HMDB is available at: PMID:17202168

  18. Italian Rett database and biobank.

    PubMed

    Sampieri, Katia; Meloni, Ilaria; Scala, Elisa; Ariani, Francesca; Caselli, Rossella; Pescucci, Chiara; Longo, Ilaria; Artuso, Rosangela; Bruttini, Mirella; Mencarelli, Maria Antonietta; Speciale, Caterina; Causarano, Vincenza; Hayek, Giuseppe; Zappella, Michele; Renieri, Alessandra; Mari, Francesca

    2007-04-01

    Rett syndrome is the second most common cause of severe mental retardation in females, with an incidence of approximately 1 out of 10,000 live female births. In addition to the classic form, a number of Rett variants have been described. MECP2 gene mutations are responsible for about 90% of classic cases and for a lower percentage of variant cases. Recently, CDKL5 mutations have been identified in the early onset seizures variant and other atypical Rett patients. While the high percentage of MECP2 mutations in classic patients supports the hypothesis of a single disease gene, the low frequency of mutated variant cases suggests genetic heterogeneity. Since 1998, we have performed clinical evaluation and molecular analysis of a large number of Italian Rett patients. The Italian Rett Syndrome (RTT) database has been developed to share data and samples of our RTT collection with the scientific community (http://www.biobank.unisi.it). This is the first RTT database that has been connected with a biobank. It allows the user to immediately visualize the list of available RTT samples and, using the "Search by" tool, to rapidly select those with specific clinical and molecular features. By contacting bank curators, users can request the samples of interest for their studies. This database encourages collaboration projects with clinicians and researchers from around the world and provides important resources that will help to better define the pathogenic mechanisms underlying Rett syndrome.

  19. Salinas : theory manual.

    SciTech Connect

    Walsh, Timothy Francis; Reese, Garth M.; Bhardwaj, Manoj Kumar

    2011-11-01

    Salinas provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of structural systems. This manual describes the theory behind many of the constructs in Salinas. For a more detailed description of how to use Salinas, we refer the reader to Salinas, User's Notes. Many of the constructs in Salinas are pulled directly from published material. Where possible, these materials are referenced herein. However, certain functions in Salinas are specific to our implementation. We try to be far more complete in those areas. The theory manual was developed from several sources including general notes, a programmer notes manual, the user's notes and of course the material in the open literature.

  20. Nuclear material operations manual

    SciTech Connect

    Tyler, R.P.

    1981-02-01

    This manual provides a concise and comprehensive documentation of the operating procedures currently practiced at Sandia National Laboratories with regard to the management, control, and accountability of nuclear materials. The manual is divided into chapters which are devoted to the separate functions performed in nuclear material operations-management, control, accountability, and safeguards, and the final two chapters comprise a document which is also issued separately to provide a summary of the information and operating procedures relevant to custodians and users of radioactive and nuclear materials. The manual also contains samples of the forms utilized in carrying out nuclear material activities. To enhance the clarity of presentation, operating procedures are presented in the form of playscripts in which the responsible organizations and necessary actions are clearly delineated in a chronological fashion from the initiation of a transaction to its completion.

  1. Aircraft operations management manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    The NASA aircraft operations program is a multifaceted, highly diverse entity that directly supports the agency mission in aeronautical research and development, space science and applications, space flight, astronaut readiness training, and related activities through research and development, program support, and mission management aircraft operations flights. Users of the program are interagency, inter-government, international, and the business community. This manual provides guidelines to establish policy for the management of NASA aircraft resources, aircraft operations, and related matters. This policy is an integral part of and must be followed when establishing field installation policy and procedures covering the management of NASA aircraft operations. Each operating location will develop appropriate local procedures that conform with the requirements of this handbook. This manual should be used in conjunction with other governing instructions, handbooks, and manuals.

  2. Developing a policy manual.

    PubMed

    Hotta, Tracey A

    2013-01-01

    Do you really need to have a policy and procedure in the office? Frequently they are seen sitting on the shelf, collecting dust. The answer is yes for a number of very important reasons. A policy and procedure manual is a tool to set guidelines and expectations on the basis of the mission and vision of the office. A well-written manual is a powerful training tool for new staff so they can get a feel for the office culture. Furthermore, it is a provincial or state legislative requirement that can reduce management's concern about potential legal issues or problems. If an office does not have a manual to set guidelines, the employees may be forced to make their own decisions to solve problems, which can often result in confusion, inconsistencies, and mistakes. PMID:23446507

  3. The IST-LISBON database on LXCat

    NASA Astrophysics Data System (ADS)

    Alves, L. L.

    2014-12-01

    LXCat is a web-based, community-wide project on the curation of data needed in the modelling of low-temperature plasmas. LXCat is organized in databases, contributed by members of the community around the world and indicated by the contributor's chosen title. This paper presents the status of the data available on the IST-LISBON database with LXCat. IST-LISBON contains up-to-date electron-neutral collisional data (together with the measured swarm parameters used to validate these data) resulting from the research effort of the Group of Gas Discharges and Gaseous Electronics with Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico, Lisbon, Portugal. Presently, the IST-LISBON database includes complete and consistent sets of electron scattering cross sections for argon, helium, nitrogen, oxygen, hydrogen and methane.

  4. Librarians Prefer Italian Food: An Alternative Approach to Introducing Database.

    ERIC Educational Resources Information Center

    Trickey, Keith V.

    1990-01-01

    Describes the development of a manual participative game that was designed to demonstrate the functions of a database. Use of the game in a variety of academic contexts is discussed, including technical college students and degree students; and use of the data with software to create a database is described. (LRW)

  5. ABS: a database of Annotated regulatory Binding Sites from orthologous promoters

    PubMed Central

    Blanco, Enrique; Farré, Domènec; Albà, M. Mar; Messeguer, Xavier; Guigó, Roderic

    2006-01-01

    Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel computational approaches to identify binding motifs on promoter sequences from related genes. ABS () is a public database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. A simple and easy-to-use web interface facilitates data retrieval allowing different views of the information. In addition, the release 1.0 of ABS includes a customizable generator of artificial datasets based on the known sites contained in the collection and an evaluation tool to aid during the training and the assessment of motif-finding programs. PMID:16381947

  6. InvFEST, a database integrating information of polymorphic inversions in the human genome.

    PubMed

    Martínez-Fundichely, Alexander; Casillas, Sònia; Egea, Raquel; Ràmia, Miquel; Barbadilla, Antonio; Pantano, Lorena; Puig, Marta; Cáceres, Mario

    2014-01-01

    The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome.

  7. OCDB: a database collecting genes, miRNAs and drugs for obsessive-compulsive disorder.

    PubMed

    Privitera, Anna P; Distefano, Rosario; Wefer, Hugo A; Ferro, Alfredo; Pulvirenti, Alfredo; Giugno, Rosalba

    2015-01-01

    Obsessive-compulsive disorder (OCD) is a psychiatric condition characterized by intrusive and unwilling thoughts (obsessions) giving rise to anxiety. The patients feel obliged to perform a behavior (compulsions) induced by the obsessions. The World Health Organization ranks OCD as one of the 10 most disabling medical conditions. In the class of Anxiety Disorders, OCD is a pathology that shows an hereditary component. Consequently, an online resource collecting and integrating scientific discoveries and genetic evidence about OCD would be helpful to improve the current knowledge on this disorder. We have developed a manually curated database, OCD Database (OCDB), collecting the relations between candidate genes in OCD, microRNAs (miRNAs) involved in the pathophysiology of OCD and drugs used in its treatments. We have screened articles from PubMed and MEDLINE. For each gene, the bibliographic references with a brief description of the gene and the experimental conditions are shown. The database also lists the polymorphisms within genes and its chromosomal regions. OCDB data is enriched with both validated and predicted miRNA-target and drug-target information. The transcription factors regulations, which are also included, are taken from David and TransmiR. Moreover, a scoring function ranks the relevance of data in the OCDB context. The database is also integrated with the main online resources (PubMed, Entrez-gene, HGNC, dbSNP, DrugBank, miRBase, PubChem, Kegg, Disease-ontology and ChEBI). The web interface has been developed using phpMyAdmin and Bootstrap software. This allows (i) to browse data by category and (ii) to navigate in the database by searching genes, miRNAs, drugs, SNPs, regions, drug targets and articles. The data can be exported in textual format as well as the whole database in.sql or tabular format. OCDB is an essential resource to support genome-wide analysis, genetic and pharmacological studies. It also facilitates the evaluation of genetic data

  8. OCDB: a database collecting genes, miRNAs and drugs for obsessive-compulsive disorder.

    PubMed

    Privitera, Anna P; Distefano, Rosario; Wefer, Hugo A; Ferro, Alfredo; Pulvirenti, Alfredo; Giugno, Rosalba

    2015-01-01

    Obsessive-compulsive disorder (OCD) is a psychiatric condition characterized by intrusive and unwilling thoughts (obsessions) giving rise to anxiety. The patients feel obliged to perform a behavior (compulsions) induced by the obsessions. The World Health Organization ranks OCD as one of the 10 most disabling medical conditions. In the class of Anxiety Disorders, OCD is a pathology that shows an hereditary component. Consequently, an online resource collecting and integrating scientific discoveries and genetic evidence about OCD would be helpful to improve the current knowledge on this disorder. We have developed a manually curated database, OCD Database (OCDB), collecting the relations between candidate genes in OCD, microRNAs (miRNAs) involved in the pathophysiology of OCD and drugs used in its treatments. We have screened articles from PubMed and MEDLINE. For each gene, the bibliographic references with a brief description of the gene and the experimental conditions are shown. The database also lists the polymorphisms within genes and its chromosomal regions. OCDB data is enriched with both validated and predicted miRNA-target and drug-target information. The transcription factors regulations, which are also included, are taken from David and TransmiR. Moreover, a scoring function ranks the relevance of data in the OCDB context. The database is also integrated with the main online resources (PubMed, Entrez-gene, HGNC, dbSNP, DrugBank, miRBase, PubChem, Kegg, Disease-ontology and ChEBI). The web interface has been developed using phpMyAdmin and Bootstrap software. This allows (i) to browse data by category and (ii) to navigate in the database by searching genes, miRNAs, drugs, SNPs, regions, drug targets and articles. The data can be exported in textual format as well as the whole database in.sql or tabular format. OCDB is an essential resource to support genome-wide analysis, genetic and pharmacological studies. It also facilitates the evaluation of genetic data

  9. Manuals of Cultural Systems

    NASA Astrophysics Data System (ADS)

    Ballonoff, Paul

    2014-10-01

    Ethnography often studies social networks including empirical descriptions of marriages and families. We initially concentrate on a special subset of networks which we call configurations. We show that descriptions of the possible outcomes of viable histories form a manual, and an orthoalgebra. We then study cases where family sizes vary, and show that this also forms a manual. In fact, it demonstrates adiabatic invariance, a property often associated with physical system conservation laws, and which here expresses conservation of the viability of a cultural system.

  10. Quality Assurance Manual

    SciTech Connect

    McGarrah, J.E.

    1995-05-01

    In order to provide clients with quality products and services, Pacific Northwest Laboratory (PNL) has established and implemented a formal quality assurance program. These management controls are documented in this manual (PNL-MA-70) and its accompanying standards and procedures. The QA Program meets the basic requirements and supplements of ANSI/ASME NQA-1, Quality Assurance Program Requirements for Nuclear Facilities, as interpreted for PNL activities. Additional, the quality requirements are augmented to include the Total Quality approach defined in the Department of Energy Order 5700.6C, Quality Assurance. This manual provides requirements and an overview of the administrative procedures that apply to projects and activities.

  11. AIS training manual

    SciTech Connect

    Kramer, C.F.; Barancik, J.I.

    1989-05-01

    This Training Manual was developed by the Injury Prevention and Analysis Group (IPAG) as part of a training program in AIS 85 and AIS-EM (Epidemiological Modifications) coding. The IPAG Program is designed primarily to train medical record and other health professionals from diverse backgrounds and experience levels in the use of AIS 85 and AIS 85-EM. The Manual is designed to be used as a reference text after completion of the Program and includes copies of visual projection materials used during the training sessions.

  12. Fastener Design Manual

    NASA Technical Reports Server (NTRS)

    Barrett, Richard T.

    1990-01-01

    This manual was written for design engineers to enable them to choose appropriate fasteners for their designs. Subject matter includes fastener material selection, platings, lubricants, corrosion, locking methods, washers, inserts, thread types and classes, fatigue loading, and fastener torque. A section on design criteria covers the derivation of torque formulas, loads on a fastener group, combining simultaneous shear and tension loads, pullout load for tapped holes, grip length, head styles, and fastener strengths. The second half of this manual presents general guidelines and selection criteria for rivets and lockbolts.

  13. Equipment Management Manual

    NASA Technical Reports Server (NTRS)

    1992-01-01

    The NASA Equipment Management Manual (NHB 4200.1) is issued pursuant to Section 203(c)(1) of the National Aeronautics and Space Act of 1958, as amended (42 USC 2473), and sets forth policy, uniform performance standards, and procedural guidance to NASA personnel for the acquisition, management, and use of NASA-owned equipment. This revision is effective upon receipt. This is a controlled manual, issued in loose-leaf form, and revised through page changes. Additional copies for internal use may be obtained through normal distribution.

  14. Literature mining of genetic variants for curation: quantifying the importance of supplementary material

    PubMed Central

    Jimeno Yepes, Antonio; Verspoor, Karin

    2014-01-01

    A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientific literature. Thus, there have been several text mining systems developed to target extraction of mutations and other genetic variation from the literature. We have performed the first study of the use of text mining for the recovery of genetic variants curated directly from the literature. We consider two curated databases, COSMIC (Catalogue Of Somatic Mutations In Cancer) and InSiGHT (International Society for Gastro-intestinal Hereditary Tumours), that contain explicit links to the source literature for each included mutation. Our analysis shows that the recall of the mutations catalogued in the databases using a text mining tool is very low, despite the well-established good performance of the tool and even when the full text of the associated article is available for processing. We demonstrate that this discrepancy can be explained by considering the supplementary material linked to the published articles, not previously considered by text mining tools. Although it is anecdotally known that supplementary material contains ‘all of the information’, and some researchers have speculated about the role of supplementary material (Schenck et al. Extraction of genetic mutations associated with cancer from public literature. J Health Med Inform 2012;S2:2.), our analysis substantiates the significant extent to which this material is critical. Our results highlight the need for literature mining tools to consider not only the narrative content of a publication but also the full set of material related to a publication. PMID:24520105

  15. Literature mining of genetic variants for curation: quantifying the importance of supplementary material.

    PubMed

    Jimeno Yepes, Antonio; Verspoor, Karin

    2014-01-01

    A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientific literature. Thus, there have been several text mining systems developed to target extraction of mutations and other genetic variation from the literature. We have performed the first study of the use of text mining for the recovery of genetic variants curated directly from the literature. We consider two curated databases, COSMIC (Catalogue Of Somatic Mutations In Cancer) and InSiGHT (International Society for Gastro-intestinal Hereditary Tumours), that contain explicit links to the source literature for each included mutation. Our analysis shows that the recall of the mutations catalogued in the databases using a text mining tool is very low, despite the well-established good performance of the tool and even when the full text of the associated article is available for processing. We demonstrate that this discrepancy can be explained by considering the supplementary material linked to the published articles, not previously considered by text mining tools. Although it is anecdotally known that supplementary material contains 'all of the information', and some researchers have speculated about the role of supplementary material (Schenck et al. Extraction of genetic mutations associated with cancer from public literature. J Health Med Inform 2012;S2:2.), our analysis substantiates the significant extent to which this material is critical. Our results highlight the need for literature mining tools to consider not only the narrative content of a publication but also the full set of material related to a publication. PMID:24520105

  16. FMiR: A Curated Resource of Mitochondrial DNA Information for Fish

    PubMed Central

    Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Pati, Rameshwar; Singh, Shri Prakash; Sarkar, Uttam Kumar

    2015-01-01

    Mitochondrial genome sequences have been widely used for evolutionary and phylogenetic studies. Among vertebrates, fish are an important, diverse group, and their mitogenome sequences are growing rapidly in public repositories. To facilitate mitochondrial genome analysis and to explore the valuable genetic information, we developed the Fish Mitogenome Resource (FMiR) database to provide a workbench for mitogenome annotation, species identification and microsatellite marker mining. The microsatellites are also known as simple sequence repeats (SSRs) and used as molecular markers in studies on population genetics, gene duplication and marker assisted selection. Here, easy-to-use tools have been implemented for mining SSRs and for designing primers to identify species/habitat specific markers. In addition, FMiR can analyze complete or partial mitochondrial genome sequence to identify species and to deduce relational distances among sequences across species. The database presently contains curated mitochondrial genomes from 1302 fish species belonging to 297 families and 47 orders reported from saltwater and freshwater ecosystems. In addition, the database covers information on fish species such as conservation status, ecosystem, family, distribution and occurrence downloaded from the FishBase and IUCN Red List databases. Those fish information have been used to browse mitogenome information for the species belonging to a particular category. The database is scalable in terms of content and inclusion of other analytical modules. The FMiR is running under Linux operating platform on high performance server accessible at URL http://mail.nbfgr.res.in/fmir. PMID:26317619

  17. A curated census of autophagy-modulating proteins and small molecules: candidate targets for cancer therapy.

    PubMed

    Lorenzi, Philip L; Claerhout, Sofie; Mills, Gordon B; Weinstein, John N

    2014-07-01

    Autophagy, a programmed process in which cell contents are delivered to lysosomes for degradation, appears to have both tumor-suppressive and tumor-promoting functions; both stimulation and inhibition of autophagy have been reported to induce cancer cell death, and particular genes and proteins have been associated both positively and negatively with autophagy. To provide a basis for incisive analysis of those complexities and ambiguities and to guide development of new autophagy-targeted treatments for cancer, we have compiled a comprehensive, curated inventory of autophagy modulators by integrating information from published siRNA screens, multiple pathway analysis algorithms, and extensive, manually curated text-mining of the literature. The resulting inventory includes 739 proteins and 385 chemicals (including drugs, small molecules, and metabolites). Because autophagy is still at an early stage of investigation, we provide extensive analysis of our sources of information and their complex relationships with each other. We conclude with a discussion of novel strategies that could potentially be used to target autophagy for cancer therapy.

  18. CAM Coordinator's Manual.

    ERIC Educational Resources Information Center

    Rodel, Lee J.

    The technical implementation and operation of the Comprehensive Achievement Monitoring (CAM) system are discussed in this manual. It is written as a guide for those responsible for implementing CAM and for processing CAM-related work requests. Subjects covered include: CAM system operation summary: objectives and test items; conferencing with the…

  19. Manually operated coded switch

    DOEpatents

    Barnette, Jon H.

    1978-01-01

    The disclosure relates to a manually operated recodable coded switch in which a code may be inserted, tried and used to actuate a lever controlling an external device. After attempting a code, the switch's code wheels must be returned to their zero positions before another try is made.

  20. Enforcement Field Operations Manual.

    ERIC Educational Resources Information Center

    Colorado State Dept. of Health, Denver. Div. of Air Pollution Control.

    This manual is designed as a practical handbook for air pollution control officials. A variety of situations and tasks encountered in field operations are described and the appropriate action which should be taken is explained Topics include source detection, source inspection, recording and reporting inspections, maintaining a recording system,…

  1. NASCAP user's manual, 1978

    NASA Technical Reports Server (NTRS)

    Cassidy, J. J., III

    1978-01-01

    NASCAP simulates the charging process for a complex object in either tenuous plasma (geosynchronous orbit) or ground test (electron gun source) environment. Program control words, the structure of user input files, and various user options available are described in this computer programmer's user manual.

  2. Therapeutic Recreation Practicum Manual.

    ERIC Educational Resources Information Center

    Schneegas, Kay

    This manual provides information on the practicum program offered by Moraine Valley Community College (MVCC) for students in its therapeutic recreation program. Sections I and II outline the rationale and goals for providing practical, on-the-job work experiences for therapeutic recreation students. Section III specifies MVCC's responsibilities…

  3. Student Attendance Accounting Manual.

    ERIC Educational Resources Information Center

    Freitas, Joseph M.

    In response to state legislation authorizing procedures for changes in academic calendars and measurement of student workload in California community colleges, this manual from the Chancellor's Office provides guidelines for student attendance accounting. Chapter 1 explains general items such as the academic calendar, admissions policies, student…

  4. Elementary School Finance Manual.

    ERIC Educational Resources Information Center

    National Catholic Educational Association, Washington, DC.

    Developed to assist those responsible for financial matters in Catholic elementary schools, this manual presents each topic briefly and simply, taking into account administrators' minimal formal financial training. It is divided into six sections. Chapter 1, "Daily Financial Operations," describes the specifics of handling receipts, billings, and…

  5. Water Quality Monitoring Manual.

    ERIC Educational Resources Information Center

    Mason, Fred J.; Houdart, Joseph F.

    This manual is designed for students involved in environmental education programs dealing with water pollution problems. By establishing a network of Environmental Monitoring Stations within the educational system, four steps toward the prevention, control, and abatement of water pollution are proposed. (1) Train students to recognize, monitor,…

  6. The TUTOR Manual.

    ERIC Educational Resources Information Center

    Avner, R. A.; Tenczar, Paul

    The TUTOR logic-building language to be used with the PLATO (Programmed Logic for Automated Teaching Operations) system is explained in this manual. TUTOR is designed to transcend the difficulties of FORTRAN for a computer-based educational system utilizing graphical screen displays. The language consists of about seventy words or "commands" which…

  7. Environmental Science Laboratory Manual.

    ERIC Educational Resources Information Center

    Strobbe, Maurice A.

    The objective of this manual is to provide a set of basic analytical procedures commonly used to determine environmental quality. Procedures are designed to be used in an introductory course in environmental science and are explicit enough to allow them to be performed by both the non-science or beginning science student. Stressing ecology and…

  8. General Drafting. Technical Manual.

    ERIC Educational Resources Information Center

    Department of the Army, Washington, DC.

    The manual provides instructional guidance and reference material in the principles and procedures of general drafting and constitutes the primary study text for personnel in drafting as a military occupational specialty. Included is information on drafting equipment and its use; line weights, conventions and formats; lettering; engineering charts…

  9. Enucleation Procedure Manual.

    ERIC Educational Resources Information Center

    Davis, Kevin; Poston, George

    This manual provides information on the enucleation procedure (removal of the eyes for organ banks). An introductory section focuses on the anatomy of the eye and defines each of the parts. Diagrams of the eye are provided. A list of enucleation materials follows. Other sections present outlines of (1) a sterile procedure; (2) preparation for eye…

  10. Manual for physical fitness

    NASA Technical Reports Server (NTRS)

    Coleman, A. E.

    1981-01-01

    Training manual used for preflight conditioning of NASA astronauts is written for audience with diverse backgrounds and interests. It suggests programs for various levels of fitness, including sample starter programs, safe progression schedules, and stretching exercises. Related information on equipment needs, environmental coonsiderations, and precautions can help readers design safe and effective running programs.

  11. Rescue Manual. Module 8.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The eighth of 10 modules contains 6 chapters: (1) trench rescue; (2) shoring and tunneling techniques; (3) farm accident rescue; (4) wilderness search and rescue; (5) aircraft rescue; and (6) helicopter information. Key points, an…

  12. Intrinsic Analysis Training Manual.

    ERIC Educational Resources Information Center

    Gow, Doris T.

    This manual is for the training of linking agents between Education R&D and schools and for training teachers in the process of intrinsic analysis of curriculum materials. Intrinsic analysis means analysis of the instruction or process through examination of the materials, or artifacts, including teacher and student materials, developer's…

  13. Fieldwork Supervisor's Manual.

    ERIC Educational Resources Information Center

    Bowler, Rosemarie M.; And Others

    This manual, intended for community agency personnel who supervise students in undergraduate field placements, presents suggestions to aid the supervisor in providing the student with a valuable and rewarding experience which will also be of value to the agency. Issues are presented in sections reflecting the common sequence of events in a…

  14. Rescue Manual. Module 4.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The fourth of 10 modules contains 8 chapters: (1) construction and characteristics of rescue rope; (2) knots, bends, and hitches; (3) critical angles; (4) raising systems; (5) rigging; (6) using the brake-bar rack for rope rescue; (7) rope…

  15. Typing Manuscripts. General Manual.

    ERIC Educational Resources Information Center

    Snapp, Jane

    Supporting performance objective 83 of the V-TECS (Vocational-Technical Education Consortium of States) Secretarial Catalog, both a set of student materials and an instructor's manual on typing manuscripts are included in this packet. (The packet is the tenth in a set of fifteen on typewriting--CE 016 920-934.) The packet includes four learning…

  16. Child Welfare Policy Manual

    ERIC Educational Resources Information Center

    Administration for Children & Families, 2008

    2008-01-01

    This document conveys mandatory policies that have their basis in Federal Law and/or program regulations. It also provides interpretations of Federal Statutes and program regulations initiated by inquiries from State Child Welfare agencies or Administration for Children and Families (ACF) Regional Offices. The manual replaces the Children's…

  17. Aquatic Microbiology Laboratory Manual.

    ERIC Educational Resources Information Center

    Cooper, Robert C.; And Others

    This laboratory manual presents information and techniques dealing with aquatic microbiology as it relates to environmental health science, sanitary engineering, and environmental microbiology. The contents are divided into three categories: (1) ecological and physiological considerations; (2) public health aspects; and (3)microbiology of water…

  18. Disaster Preparedness Manual.

    ERIC Educational Resources Information Center

    Michael, Douglas O.

    Prepared for use by the staff of a community college library, this manual describes the natural, building, and human hazards which can result in disaster in a library and outlines a set of disaster prevention measures and salvage procedures. A list of salvage priorities, floor plans for all three levels of Bourke Memorial Library, and staff duties…

  19. Manual for CLE Lecturers.

    ERIC Educational Resources Information Center

    Shellaberger, Donna J.

    This manual is designed to help lawyers develop the skills needed to present effective, stimulating continuing legal education (CLE) lectures. It focuses on the particular purpose and nature of CLE lecturing, relationships and interplay of personalities in CLE, commitments and constraints which lecturers should observe, program structure and…

  20. Marketing Manual: Workplace Literacy.

    ERIC Educational Resources Information Center

    Fanshawe Coll., Strathroy (Ontario).

    This manual applies marketing concepts and methods, selling techniques and principles to the workplace literacy program for the purpose of assisting individuals involved in promoting and selling these programs. Part I provides a rationale for marketing and discusses the following: the role of the sponsor in marketing, market versus marketing,…

  1. Records Management Manual.

    ERIC Educational Resources Information Center

    Alaska State Dept. of Education, Juneau. State Archives and Records Management.

    This manual, prepared primarily for state government agencies, describes the organization and management of Alaska government records. Information is presented in nine topic areas: (1) Alaska's Archives and Records Management Program, which describes the program, its mission, services available, and employee responsibilities; (2) Records in…

  2. Student Teacher Manual.

    ERIC Educational Resources Information Center

    Wright, Cheryl

    Written in outline form to provide basic information to student teachers about the University of Utah's Child and Family Development Center preschool program, this manual describes a model preschool program based on creative thinking skills. Included are: a description of the program's creative philosophy which also contains a list of blocks to…

  3. Consumer Decisions. Student Manual.

    ERIC Educational Resources Information Center

    Florida State Dept. of Education, Tallahassee. Div. of Vocational Education.

    This student manual covers five areas relating to consumer decisions. Titles of the five sections are Consumer Law, Consumer Decision Making, Buying a Car, Convenience Foods, and Books for Preschool Children. Each section may contain some or all of these materials: list of objectives, informative sections, questions on the information and answers,…

  4. Political Education Manual.

    ERIC Educational Resources Information Center

    League of United Latin American Citizens, Washington, DC.

    Written to help Hispanics understand the electoral process more thoroughly and to encourage them to participate more actively in the political arena, this manual begins by describing the present status of the Hispanic electorate and then explains how laws are made, how Hispanics can influence legislation, and how to organize a voter registration…

  5. The Tree Worker's Manual.

    ERIC Educational Resources Information Center

    Smithyman, S. J.

    This manual is designed to prepare students for entry-level positions as tree care professionals. Addressed in the individual chapters of the guide are the following topics: the tree service industry; clothing, eqiupment, and tools; tree workers; basic tree anatomy; techniques of pruning; procedures for climbing and working in the tree; aerial…

  6. Outdoor Education Manual.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    Creative ways to use the outdoors as a part of the regular school curriculum are outlined in this teacher's manual for the elementary grades. Presented for consideration are the general objectives of outdoor education, suggestions for evaluating outdoor education experiences, and techniques for teaching outdoor education. The purpose and functions…

  7. Educator Effectiveness Administrative Manual

    ERIC Educational Resources Information Center

    Pennsylvania Department of Education, 2014

    2014-01-01

    The goal of this manual is to provide guidance in the evaluation of educators, highlight critical components of effectiveness training, and offer opportunities for professional growth. The term "educator" includes teachers, all professional and temporary professional employees, education specialists, and school administrators/principals.…

  8. Consumer Education Reference Manual.

    ERIC Educational Resources Information Center

    Tennessee Univ., Knoxville. State Agency for Title I.

    This manual contains information for consumer education, which is defined as the process of imparting to an individual the skills, concepts, knowledges, and insights required to help each person evolve his or her own values, evaluate alternative choices in the marketplace, manage personal resources effectively, and obtain the best buys for his or…

  9. DYS Volunteer Services Manual.

    ERIC Educational Resources Information Center

    Boyles, Al

    This manual provides information for volunteers with the North Carolina Division of Youth Services. It describes the Division's history in developing correctional facilities, its philosophy and goals, and the administration of its training schools and detention centers. It cites examples of volunteer involvement in the areas of administrative and…

  10. System Documentation Manual.

    ERIC Educational Resources Information Center

    Semmel, Melvyn I.; Olson, Jerry

    The document is a system documentation manual of the Computer-Assisted Teacher Training System (CATTS) developed by the Center for Innovation in Teaching the Handicapped (Indiana University). CATTS is characterized as a system capable of providing continuous, instantaneous, and/or delayed feedback of relevant teacher-student interaction data to a…

  11. Radiological Defense Manual.

    ERIC Educational Resources Information Center

    Defense Civil Preparedness Agency (DOD), Washington, DC.

    Originally prepared for use as a student textbook in Radiological Defense (RADEF) courses, this manual provides the basic technical information necessary for an understanding of RADEF. It also briefly discusses the need for RADEF planning and expected postattack emergency operations. There are 14 chapters covering these major topics: introduction…

  12. Discretionary Grants Administration Manual.

    ERIC Educational Resources Information Center

    Office of Human Development Services (DHHS), Washington, DC.

    This manual sets forth applicable administrative policies and procedures to recipients of discretionary project grants or cooperative agreements awarded by program offices in the Office of Human Development Services (HDS). It is intended to serve as a basic reference for project directors and business officers of recipient organizations who are…

  13. DFLOW USER'S MANUAL

    EPA Science Inventory

    DFLOW is a computer program for estimating design stream flows for use in water quality studies. The manual describes the use of the program on both the EPA's IBM mainframe system and on a personal computer (PC). The mainframe version of DFLOW can extract a river's daily flow rec...

  14. Microcomputer Laboratory Manual.

    ERIC Educational Resources Information Center

    Van Dusseldorp, Ralph; And Others

    Intended to serve as a guide to the development of basic computer competencies for students in teacher preparation programs or as inservice training for teachers, this primarily self-instructional manual provides basic information, worksheets, computer programs, and hands-on learning activities for 16 proficiencies related to computer literacy.…

  15. LOGO Manual. Draft.

    ERIC Educational Resources Information Center

    Abelson, Hal; And Others

    This manual describes the LOGO system implemented for the PDP 11/45 at the MIT Artificial Intelligence Laboratory. The "system" includes a LOGO evaluator, a dedicated time-sharing system, and various special devices related to output such as robot turtles, tone generators, and cathode ray tube displays. (Author/SD)

  16. Head Start Facilities Manual.

    ERIC Educational Resources Information Center

    Research Assessment Management, Inc., Silver Spring, MD.

    A quality Head Start facility should provide a physical environment responsive both to the needs of the children and families served and to the needs of staff, volunteers, and community agencies that share space with Head Start. This manual is a tool for Head Start grantees and delegate agencies for assessing existing facilities, making…

  17. Rescue Manual. Module 6.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The sixth of 10 modules contains 4 chapters: (1) industrial rescue; (2) rescue from a confined space; (3) extrication from heavy equipment; and (4) rescue operations involving elevators. Key points, an introduction, and conclusion accompany…

  18. HEMPDS user's manual

    SciTech Connect

    Warren, K.H.

    1983-02-01

    HEMPDS, the double-slide version of two-dimensional HEMP, allows the intersection of slide lines and slide lines in any direction, thus making use of triangular zones. this revised user's manual aids the physicist, computer scientist, and computer technician in using, maintaining, and coverting HEMPDS. Equations, EOS models, and sample problems are included.

  19. Fiscal Accounting Manual.

    ERIC Educational Resources Information Center

    California State Dept. of Housing and Community Development, Sacramento. Indian Assistance Program.

    Written in simple, easy to understand form, the manual provides a vehicle for the untrained person in bookkeeping to control funds received from grants for Indian Tribal Councils and Indian organizations. The method used to control grants (federal, state, or private) is fund accounting, designed to organize rendering services on a non-profit…

  20. Rescue Manual. Module 5.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Instructional Materials Lab.

    This learner manual for rescuers covers the current techniques or practices required in the rescue service. The fifth of 10 modules contains information on hazardous materials. Key points, an introduction, and conclusion accompany substantive material in this module. In addition, the module contains a Department of Transportation guide chart on…