Sample records for resource integrated database

  1. Emission & Generation Resource Integrated Database (eGRID)

    EPA Pesticide Factsheets

    The Emissions & Generation Resource Integrated Database (eGRID) is an integrated source of data on environmental characteristics of electric power generation. Twelve federal databases are represented by eGRID, which provides air emission and resource mix information for thousands of power plants and generating companies. eGRID allows direct comparison of the environmental attributes of electricity from different plants, companies, States, or regions of the power grid.

  2. The designing and implementation of PE teaching information resource database based on broadband network

    NASA Astrophysics Data System (ADS)

    Wang, Jian

    2017-01-01

    In order to change traditional PE teaching mode and realize the interconnection, interworking and sharing of PE teaching resources, a distance PE teaching platform based on broadband network is designed and PE teaching information resource database is set up. The designing of PE teaching information resource database takes Windows NT 4/2000Server as operating system platform, Microsoft SQL Server 7.0 as RDBMS, and takes NAS technology for data storage and flow technology for video service. The analysis of system designing and implementation shows that the dynamic PE teaching information resource sharing platform based on Web Service can realize loose coupling collaboration, realize dynamic integration and active integration and has good integration, openness and encapsulation. The distance PE teaching platform based on Web Service and the design scheme of PE teaching information resource database can effectively solve and realize the interconnection, interworking and sharing of PE teaching resources and adapt to the informatization development demands of PE teaching.

  3. E-MSD: an integrated data resource for bioinformatics.

    PubMed

    Velankar, S; McNeil, P; Mittard-Runte, V; Suarez, A; Barrell, D; Apweiler, R; Henrick, K

    2005-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the 'Structure Integration with Function, Taxonomy and Sequences (SIFTS)' initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group.

  4. An Integrated Korean Biodiversity and Genetic Information Retrieval System

    PubMed Central

    Lim, Jeongheui; Bhak, Jong; Oh, Hee-Mock; Kim, Chang-Bae; Park, Yong-Ha; Paek, Woon Kee

    2008-01-01

    Background On-line biodiversity information databases are growing quickly and being integrated into general bioinformatics systems due to the advances of fast gene sequencing technologies and the Internet. These can reduce the cost and effort of performing biodiversity surveys and genetic searches, which allows scientists to spend more time researching and less time collecting and maintaining data. This will cause an increased rate of knowledge build-up and improve conservations. The biodiversity databases in Korea have been scattered among several institutes and local natural history museums with incompatible data types. Therefore, a comprehensive database and a nation wide web portal for biodiversity information is necessary in order to integrate diverse information resources, including molecular and genomic databases. Results The Korean Natural History Research Information System (NARIS) was built and serviced as the central biodiversity information system to collect and integrate the biodiversity data of various institutes and natural history museums in Korea. This database aims to be an integrated resource that contains additional biological information, such as genome sequences and molecular level diversity. Currently, twelve institutes and museums in Korea are integrated by the DiGIR (Distributed Generic Information Retrieval) protocol, with Darwin Core2.0 format as its metadata standard for data exchange. Data quality control and statistical analysis functions have been implemented. In particular, integrating molecular and genetic information from the National Center for Biotechnology Information (NCBI) databases with NARIS was recently accomplished. NARIS can also be extended to accommodate other institutes abroad, and the whole system can be exported to establish local biodiversity management servers. Conclusion A Korean data portal, NARIS, has been developed to efficiently manage and utilize biodiversity data, which includes genetic resources. NARIS aims to be integral in maximizing bio-resource utilization for conservation, management, research, education, industrial applications, and integration with other bioinformation data resources. It can be found at . PMID:19091024

  5. E-MSD: an integrated data resource for bioinformatics

    PubMed Central

    Velankar, S.; McNeil, P.; Mittard-Runte, V.; Suarez, A.; Barrell, D.; Apweiler, R.; Henrick, K.

    2005-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the ‘Structure Integration with Function, Taxonomy and Sequences (SIFTS)’ initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group. PMID:15608192

  6. National Vulnerability Database (NVD)

    National Institute of Standards and Technology Data Gateway

    National Vulnerability Database (NVD) (Web, free access)   NVD is a comprehensive cyber security vulnerability database that integrates all publicly available U.S. Government vulnerability resources and provides references to industry resources. It is based on and synchronized with the CVE vulnerability naming standard.

  7. biochem4j: Integrated and extensible biochemical knowledge through graph databases.

    PubMed

    Swainston, Neil; Batista-Navarro, Riza; Carbonell, Pablo; Dobson, Paul D; Dunstan, Mark; Jervis, Adrian J; Vinaixa, Maria; Williams, Alan R; Ananiadou, Sophia; Faulon, Jean-Loup; Mendes, Pedro; Kell, Douglas B; Scrutton, Nigel S; Breitling, Rainer

    2017-01-01

    Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. What is needed are integrated repositories to catalogue both biological entities and-crucially-the relationships between them. Such a resource should be extensible, such that newly discovered relationships-for example, those between novel, synthetic enzymes and non-natural products-can be added over time. With the introduction of graph databases, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://biochem4j.org, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists, biosystems engineers and the wider community of molecular biologists and biological chemists.

  8. biochem4j: Integrated and extensible biochemical knowledge through graph databases

    PubMed Central

    Batista-Navarro, Riza; Dunstan, Mark; Jervis, Adrian J.; Vinaixa, Maria; Ananiadou, Sophia; Faulon, Jean-Loup; Kell, Douglas B.

    2017-01-01

    Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. What is needed are integrated repositories to catalogue both biological entities and–crucially–the relationships between them. Such a resource should be extensible, such that newly discovered relationships–for example, those between novel, synthetic enzymes and non-natural products–can be added over time. With the introduction of graph databases, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://biochem4j.org, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists, biosystems engineers and the wider community of molecular biologists and biological chemists. PMID:28708831

  9. The ChEMBL database as linked open data

    PubMed Central

    2013-01-01

    Background Making data available as Linked Data using Resource Description Framework (RDF) promotes integration with other web resources. RDF documents can natively link to related data, and others can link back using Uniform Resource Identifiers (URIs). RDF makes the data machine-readable and uses extensible vocabularies for additional information, making it easier to scale up inference and data analysis. Results This paper describes recent developments in an ongoing project converting data from the ChEMBL database into RDF triples. Relative to earlier versions, this updated version of ChEMBL-RDF uses recently introduced ontologies, including CHEMINF and CiTO; exposes more information from the database; and is now available as dereferencable, linked data. To demonstrate these new features, we present novel use cases showing further integration with other web resources, including Bio2RDF, Chem2Bio2RDF, and ChemSpider, and showing the use of standard ontologies for querying. Conclusions We have illustrated the advantages of using open standards and ontologies to link the ChEMBL database to other databases. Using those links and the knowledge encoded in standards and ontologies, the ChEMBL-RDF resource creates a foundation for integrated semantic web cheminformatics applications, such as the presented decision support. PMID:23657106

  10. An Integrated Molecular Database on Indian Insects.

    PubMed

    Pratheepa, Maria; Venkatesan, Thiruvengadam; Gracy, Gandhi; Jalali, Sushil Kumar; Rangheswaran, Rajagopal; Antony, Jomin Cruz; Rai, Anil

    2018-01-01

    MOlecular Database on Indian Insects (MODII) is an online database linking several databases like Insect Pest Info, Insect Barcode Information System (IBIn), Insect Whole Genome sequence, Other Genomic Resources of National Bureau of Agricultural Insect Resources (NBAIR), Whole Genome sequencing of Honey bee viruses, Insecticide resistance gene database and Genomic tools. This database was developed with a holistic approach for collecting information about phenomic and genomic information of agriculturally important insects. This insect resource database is available online for free at http://cib.res.in. http://cib.res.in/.

  11. Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance

    PubMed Central

    Squires, R. Burke; Noronha, Jyothi; Hunt, Victoria; García‐Sastre, Adolfo; Macken, Catherine; Baumgarth, Nicole; Suarez, David; Pickett, Brett E.; Zhang, Yun; Larsen, Christopher N.; Ramsey, Alvin; Zhou, Liwei; Zaremba, Sam; Kumar, Sanjeev; Deitrich, Jon; Klem, Edward; Scheuermann, Richard H.

    2012-01-01

    Please cite this paper as: Squires et al. (2012) Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and Other Respiratory Viruses 6(6), 404–416. Background  The recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics. Design  The Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly‐accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user‐friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in‐protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature. Results  To demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross‐protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks. Conclusions  The IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics. PMID:22260278

  12. Spatial database for a global assessment of undiscovered copper resources: Chapter Z in Global mineral resource assessment

    USGS Publications Warehouse

    Dicken, Connie L.; Dunlap, Pamela; Parks, Heather L.; Hammarstrom, Jane M.; Zientek, Michael L.; Zientek, Michael L.; Hammarstrom, Jane M.; Johnson, Kathleen M.

    2016-07-13

    As part of the first-ever U.S. Geological Survey global assessment of undiscovered copper resources, data common to several regional spatial databases published by the U.S. Geological Survey, including one report from Finland and one from Greenland, were standardized, updated, and compiled into a global copper resource database. This integrated collection of spatial databases provides location, geologic and mineral resource data, and source references for deposits, significant prospects, and areas permissive for undiscovered deposits of both porphyry copper and sediment-hosted copper. The copper resource database allows for efficient modeling on a global scale in a geographic information system (GIS) and is provided in an Esri ArcGIS file geodatabase format.

  13. Protein Bioinformatics Databases and Resources

    PubMed Central

    Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.

    2017-01-01

    Many publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. To help researchers quickly find the appropriate protein related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era. PMID:28150231

  14. MIPSPlantsDB—plant database resource for integrative and comparative plant genome research

    PubMed Central

    Spannagl, Manuel; Noubibou, Octave; Haase, Dirk; Yang, Li; Gundlach, Heidrun; Hindemitt, Tobias; Klee, Kathrin; Haberer, Georg; Schoof, Heiko; Mayer, Klaus F. X.

    2007-01-01

    Genome-oriented plant research delivers rapidly increasing amount of plant genome data. Comprehensive and structured information resources are required to structure and communicate genome and associated analytical data for model organisms as well as for crops. The increase in available plant genomic data enables powerful comparative analysis and integrative approaches. PlantsDB aims to provide data and information resources for individual plant species and in addition to build a platform for integrative and comparative plant genome research. PlantsDB is constituted from genome databases for Arabidopsis, Medicago, Lotus, rice, maize and tomato. Complementary data resources for cis elements, repetive elements and extensive cross-species comparisons are implemented. The PlantsDB portal can be reached at . PMID:17202173

  15. The EBI SRS server-new features.

    PubMed

    Zdobnov, Evgeny M; Lopez, Rodrigo; Apweiler, Rolf; Etzold, Thure

    2002-08-01

    Here we report on recent developments at the EBI SRS server (http://srs.ebi.ac.uk). SRS has become an integration system for both data retrieval and sequence analysis applications. The EBI SRS server is a primary gateway to major databases in the field of molecular biology produced and supported at EBI as well as European public access point to the MEDLINE database provided by US National Library of Medicine (NLM). It is a reference server for latest developments in data and application integration. The new additions include: concept of virtual databases, integration of XML databases like the Integrated Resource of Protein Domains and Functional Sites (InterPro), Gene Ontology (GO), MEDLINE, Metabolic pathways, etc., user friendly data representation in 'Nice views', SRSQuickSearch bookmarklets. SRS6 is a licensed product of LION Bioscience AG freely available for academics. The EBI SRS server (http://srs.ebi.ac.uk) is a free central resource for molecular biology data as well as a reference server for the latest developments in data integration.

  16. National Maternal and Child Oral Health Resource Center

    MedlinePlus

    ... the Organizations Database Center for Oral Health Systems Integration and Improvement (COHSII) COHSII is a consortium promoting ... to e-mail lists Featured Resources Consensus Statement Integration Framework Bright Futures Pocket Guide Consumer Materials Special ...

  17. Influenza research database: an integrated bioinformatics resource for influenza virus research

    USDA-ARS?s Scientific Manuscript database

    The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics, an...

  18. Retrovirus Integration Database (RID): a public database for retroviral insertion sites into host genomes.

    PubMed

    Shao, Wei; Shan, Jigui; Kearney, Mary F; Wu, Xiaolin; Maldarelli, Frank; Mellors, John W; Luke, Brian; Coffin, John M; Hughes, Stephen H

    2016-07-04

    The NCI Retrovirus Integration Database is a MySql-based relational database created for storing and retrieving comprehensive information about retroviral integration sites, primarily, but not exclusively, HIV-1. The database is accessible to the public for submission or extraction of data originating from experiments aimed at collecting information related to retroviral integration sites including: the site of integration into the host genome, the virus family and subtype, the origin of the sample, gene exons/introns associated with integration, and proviral orientation. Information about the references from which the data were collected is also stored in the database. Tools are built into the website that can be used to map the integration sites to UCSC genome browser, to plot the integration site patterns on a chromosome, and to display provirus LTRs in their inserted genome sequence. The website is robust, user friendly, and allows users to query the database and analyze the data dynamically. https://rid.ncifcrf.gov ; or http://home.ncifcrf.gov/hivdrp/resources.htm .

  19. The BIG Data Center: from deposition to integration to translation

    PubMed Central

    2017-01-01

    Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn. PMID:27899658

  20. ChlamyCyc: an integrative systems biology database and web-portal for Chlamydomonas reinhardtii.

    PubMed

    May, Patrick; Christian, Jan-Ole; Kempa, Stefan; Walther, Dirk

    2009-05-04

    The unicellular green alga Chlamydomonas reinhardtii is an important eukaryotic model organism for the study of photosynthesis and plant growth. In the era of modern high-throughput technologies there is an imperative need to integrate large-scale data sets from high-throughput experimental techniques using computational methods and database resources to provide comprehensive information about the molecular and cellular organization of a single organism. In the framework of the German Systems Biology initiative GoFORSYS, a pathway database and web-portal for Chlamydomonas (ChlamyCyc) was established, which currently features about 250 metabolic pathways with associated genes, enzymes, and compound information. ChlamyCyc was assembled using an integrative approach combining the recently published genome sequence, bioinformatics methods, and experimental data from metabolomics and proteomics experiments. We analyzed and integrated a combination of primary and secondary database resources, such as existing genome annotations from JGI, EST collections, orthology information, and MapMan classification. ChlamyCyc provides a curated and integrated systems biology repository that will enable and assist in systematic studies of fundamental cellular processes in Chlamydomonas. The ChlamyCyc database and web-portal is freely available under http://chlamycyc.mpimp-golm.mpg.de.

  1. The Protein Information Resource: an integrated public resource of functional annotation of proteins

    PubMed Central

    Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.

    2002-01-01

    The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247

  2. MIPS PlantsDB: a database framework for comparative plant genome research.

    PubMed

    Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB.

  3. MIPS PlantsDB: a database framework for comparative plant genome research

    PubMed Central

    Nussbaumer, Thomas; Martis, Mihaela M.; Roessner, Stephan K.; Pfeifer, Matthias; Bader, Kai C.; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB–plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834–D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB. PMID:23203886

  4. The NCBI BioSystems database.

    PubMed

    Geer, Lewis Y; Marchler-Bauer, Aron; Geer, Renata C; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H

    2010-01-01

    The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI's Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets.

  5. Outline for Research in Large Data Base Resources.

    ERIC Educational Resources Information Center

    Kahn, Paul

    This paper uses a hypothetical application entitled "VAPORTRAILS" to examine how an integrated application can be used to solve the problems of search and retrieval from a range of qualitatively different databases, and the organization of the resulting information into a personal database resource. In addition, four general classes of databases…

  6. Integrating diverse databases into an unified analysis framework: a Galaxy approach

    PubMed Central

    Blankenberg, Daniel; Coraor, Nathan; Von Kuster, Gregory; Taylor, James; Nekrutenko, Anton

    2011-01-01

    Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources. Database URL: http://usegalaxy.org PMID:21531983

  7. The Next Step in Educational Program Budgets and Information Resource Management: Integrated Data Structures.

    ERIC Educational Resources Information Center

    Jackowski, Edward M.

    1988-01-01

    Discusses the role that information resource management (IRM) plays in educational program-oriented budgeting (POB), and presents a theoretical IRM model. Highlights include design considerations for integrated data systems; database management systems (DBMS); and how POB data can be integrated to enhance its value and use within an educational…

  8. Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing

    PubMed Central

    Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E.; de Magalhães, João Pedro

    2013-01-01

    The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology. PMID:23193293

  9. The NCBI BioSystems database

    PubMed Central

    Geer, Lewis Y.; Marchler-Bauer, Aron; Geer, Renata C.; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H.

    2010-01-01

    The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI’s Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets. PMID:19854944

  10. The integrated web service and genome database for agricultural plants with biotechnology information.

    PubMed

    Kim, Changkug; Park, Dongsuk; Seol, Youngjoo; Hahn, Jangho

    2011-01-01

    The National Agricultural Biotechnology Information Center (NABIC) constructed an agricultural biology-based infrastructure and developed a Web based relational database for agricultural plants with biotechnology information. The NABIC has concentrated on functional genomics of major agricultural plants, building an integrated biotechnology database for agro-biotech information that focuses on genomics of major agricultural resources. This genome database provides annotated genome information from 1,039,823 records mapped to rice, Arabidopsis, and Chinese cabbage.

  11. Tomato functional genomics database (TFGD): a comprehensive collection and analysis package for tomato functional genomics

    USDA-ARS?s Scientific Manuscript database

    Tomato Functional Genomics Database (TFGD; http://ted.bti.cornell.edu) provides a comprehensive systems biology resource to store, mine, analyze, visualize and integrate large-scale tomato functional genomics datasets. The database is expanded from the previously described Tomato Expression Database...

  12. SInCRe—structural interactome computational resource for Mycobacterium tuberculosis

    PubMed Central

    Metri, Rahul; Hariharaputran, Sridhar; Ramakrishnan, Gayatri; Anand, Praveen; Raghavender, Upadhyayula S.; Ochoa-Montaño, Bernardo; Higueruelo, Alicia P.; Sowdhamini, Ramanathan; Chandra, Nagasuma R.; Blundell, Tom L.; Srinivasan, Narayanaswamy

    2015-01-01

    We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding. Database URL: http://proline.biochem.iisc.ernet.in/sincre PMID:26130660

  13. Study on resources and environmental data integration towards data warehouse construction covering trans-boundary area of China, Russia and Mongolia

    NASA Astrophysics Data System (ADS)

    Wang, J.; Song, J.; Gao, M.; Zhu, L.

    2014-02-01

    The trans-boundary area between Northern China, Mongolia and eastern Siberia of Russia is a continuous geographical area located in north eastern Asia. Many common issues in this region need to be addressed based on a uniform resources and environmental data warehouse. Based on the practice of joint scientific expedition, the paper presented a data integration solution including 3 steps, i.e., data collection standards and specifications making, data reorganization and process, data warehouse design and development. A series of data collection standards and specifications were drawn up firstly covering more than 10 domains. According to the uniform standard, 20 resources and environmental survey databases in regional scale, and 11 in-situ observation databases were reorganized and integrated. North East Asia Resources and Environmental Data Warehouse was designed, which included 4 layers, i.e., resources layer, core business logic layer, internet interoperation layer, and web portal layer. The data warehouse prototype was developed and deployed initially. All the integrated data in this area can be accessed online.

  14. The BIG Data Center: from deposition to integration to translation.

    PubMed

    2017-01-04

    Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. The integrated web service and genome database for agricultural plants with biotechnology information

    PubMed Central

    Kim, ChangKug; Park, DongSuk; Seol, YoungJoo; Hahn, JangHo

    2011-01-01

    The National Agricultural Biotechnology Information Center (NABIC) constructed an agricultural biology-based infrastructure and developed a Web based relational database for agricultural plants with biotechnology information. The NABIC has concentrated on functional genomics of major agricultural plants, building an integrated biotechnology database for agro-biotech information that focuses on genomics of major agricultural resources. This genome database provides annotated genome information from 1,039,823 records mapped to rice, Arabidopsis, and Chinese cabbage. PMID:21887015

  16. Making proteomics data accessible and reusable: Current state of proteomics databases and repositories

    PubMed Central

    Perez-Riverol, Yasset; Alpi, Emanuele; Wang, Rui; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2015-01-01

    Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data. PMID:25158685

  17. Geo-spatial Service and Application based on National E-government Network Platform and Cloud

    NASA Astrophysics Data System (ADS)

    Meng, X.; Deng, Y.; Li, H.; Yao, L.; Shi, J.

    2014-04-01

    With the acceleration of China's informatization process, our party and government take a substantive stride in advancing development and application of digital technology, which promotes the evolution of e-government and its informatization. Meanwhile, as a service mode based on innovative resources, cloud computing may connect huge pools together to provide a variety of IT services, and has become one relatively mature technical pattern with further studies and massive practical applications. Based on cloud computing technology and national e-government network platform, "National Natural Resources and Geospatial Database (NRGD)" project integrated and transformed natural resources and geospatial information dispersed in various sectors and regions, established logically unified and physically dispersed fundamental database and developed national integrated information database system supporting main e-government applications. Cross-sector e-government applications and services are realized to provide long-term, stable and standardized natural resources and geospatial fundamental information products and services for national egovernment and public users.

  18. Integrated web visualizations for protein-protein interaction databases.

    PubMed

    Jeanquartier, Fleur; Jean-Quartier, Claire; Holzinger, Andreas

    2015-06-16

    Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.

  19. Development of a conceptual integrated traffic safety problem identification database

    DOT National Transportation Integrated Search

    1999-12-01

    The project conceptualized a traffic safety risk management information system and statistical database for improved problem-driver identification, countermeasure development, and resource allocation. The California Department of Motor Vehicles Drive...

  20. DBGC: A Database of Human Gastric Cancer

    PubMed Central

    Wang, Chao; Zhang, Jun; Cai, Mingdeng; Zhu, Zhenggang; Gu, Wenjie; Yu, Yingyan; Zhang, Xiaoyan

    2015-01-01

    The Database of Human Gastric Cancer (DBGC) is a comprehensive database that integrates various human gastric cancer-related data resources. Human gastric cancer-related transcriptomics projects, proteomics projects, mutations, biomarkers and drug-sensitive genes from different sources were collected and unified in this database. Moreover, epidemiological statistics of gastric cancer patients in China and clinicopathological information annotated with gastric cancer cases were also integrated into the DBGC. We believe that this database will greatly facilitate research regarding human gastric cancer in many fields. DBGC is freely available at http://bminfor.tongji.edu.cn/dbgc/index.do PMID:26566288

  1. [Integrated DNA barcoding database for identifying Chinese animal medicine].

    PubMed

    Shi, Lin-Chun; Yao, Hui; Xie, Li-Fang; Zhu, Ying-Jie; Song, Jing-Yuan; Zhang, Hui; Chen, Shi-Lin

    2014-06-01

    In order to construct an integrated DNA barcoding database for identifying Chinese animal medicine, the authors and their cooperators have completed a lot of researches for identifying Chinese animal medicines using DNA barcoding technology. Sequences from GenBank have been analyzed simultaneously. Three different methods, BLAST, barcoding gap and Tree building, have been used to confirm the reliabilities of barcode records in the database. The integrated DNA barcoding database for identifying Chinese animal medicine has been constructed using three different parts: specimen, sequence and literature information. This database contained about 800 animal medicines and the adulterants and closely related species. Unknown specimens can be identified by pasting their sequence record into the window on the ID page of species identification system for traditional Chinese medicine (www. tcmbarcode. cn). The integrated DNA barcoding database for identifying Chinese animal medicine is significantly important for animal species identification, rare and endangered species conservation and sustainable utilization of animal resources.

  2. PGSB/MIPS PlantsDB Database Framework for the Integration and Analysis of Plant Genome Data.

    PubMed

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai; Gundlach, Heidrun; Mayer, Klaus F X

    2017-01-01

    Plant Genome and Systems Biology (PGSB), formerly Munich Institute for Protein Sequences (MIPS) PlantsDB, is a database framework for the integration and analysis of plant genome data, developed and maintained for more than a decade now. Major components of that framework are genome databases and analysis resources focusing on individual (reference) genomes providing flexible and intuitive access to data. Another main focus is the integration of genomes from both model and crop plants to form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny). Data exchange and integrated search functionality with/over many plant genome databases is provided within the transPLANT project.

  3. DSSTOX: NEW ON-LINE RESOURCE FOR PUBLISHING AND INTEGRATING STANDARDIZED STRUCTURE-INCLUSIVE TOXICITY DATABASES

    EPA Science Inventory

    DSSTox: New On-line Resource for Publishing Structure-Standardized Toxicity Databases

    Ann M Richard1, Jamie Burch2, ClarLynda Williams3
    1Nat. Health and Environ. Effects Res. Lb, US EP& Ret Triangle Park, NC 27711; 2EPA-NC
    Central Univ Student COOP, US EPA, lies. Tri...

  4. Workshop report: Identifying opportunities for global integration of toxicogenomics databases, 26-27 June 2013, Research Triangle Park, NC, USA.

    PubMed

    Hendrickx, Diana M; Boyles, Rebecca R; Kleinjans, Jos C S; Dearry, Allen

    2014-12-01

    A joint US-EU workshop on enhancing data sharing and exchange in toxicogenomics was held at the National Institute for Environmental Health Sciences. Currently, efficient reuse of data is hampered by problems related to public data availability, data quality, database interoperability (the ability to exchange information), standardization and sustainability. At the workshop, experts from universities and research institutes presented databases, studies, organizations and tools that attempt to deal with these problems. Furthermore, a case study showing that combining toxicogenomics data from multiple resources leads to more accurate predictions in risk assessment was presented. All participants agreed that there is a need for a web portal describing the diverse, heterogeneous data resources relevant for toxicogenomics research. Furthermore, there was agreement that linking more data resources would improve toxicogenomics data analysis. To outline a roadmap to enhance interoperability between data resources, the participants recommend collecting user stories from the toxicogenomics research community on barriers in data sharing and exchange currently hampering answering to certain research questions. These user stories may guide the prioritization of steps to be taken for enhancing integration of toxicogenomics databases.

  5. Using glycome databases for drug discovery.

    PubMed

    Aoki-Kinoshita, Kiyoko F

    2008-08-01

    The glycomics field has made great advancements in the last decade due to technologies for their synthesis and analysis including carbohydrate microarrays. Accordingly, databases for glycomics research have also emerged and been made publicly available by many major institutions worldwide. This review introduces these and other useful databases on which new methods for drug discovery can be developed. The scope of this review covers current documented and accessible databases and resources pertaining to glycomics. These were selected with the expectation that they may be useful for drug discovery research. There is a plethora of glycomics databases that have much potential for drug discovery. This may seem daunting at first but this review helps to put some of these resources into perspective. Additionally, some thoughts on how to integrate these resources to allow more efficient research are presented.

  6. RAIN: RNA–protein Association and Interaction Networks

    PubMed Central

    Junge, Alexander; Refsgaard, Jan C.; Garde, Christian; Pan, Xiaoyong; Santos, Alberto; Alkan, Ferhat; Anthon, Christian; von Mering, Christian; Workman, Christopher T.; Jensen, Lars Juhl; Gorodkin, Jan

    2017-01-01

    Protein association networks can be inferred from a range of resources including experimental data, literature mining and computational predictions. These types of evidence are emerging for non-coding RNAs (ncRNAs) as well. However, integration of ncRNAs into protein association networks is challenging due to data heterogeneity. Here, we present a database of ncRNA–RNA and ncRNA–protein interactions and its integration with the STRING database of protein–protein interactions. These ncRNA associations cover four organisms and have been established from curated examples, experimental data, interaction predictions and automatic literature mining. RAIN uses an integrative scoring scheme to assign a confidence score to each interaction. We demonstrate that RAIN outperforms the underlying microRNA-target predictions in inferring ncRNA interactions. RAIN can be operated through an easily accessible web interface and all interaction data can be downloaded. Database URL: http://rth.dk/resources/rain PMID:28077569

  7. Integrated and Applied Curricula Discussion Group and Data Base Project. Final Report.

    ERIC Educational Resources Information Center

    Wisconsin Univ. - Stout, Menomonie. Center for Vocational, Technical and Adult Education.

    A project was conducted to compile integrated and applied curriculum resources, develop databases on the World Wide Web, and encourage networking for high school and technical college educators through an Internet discussion group. Activities conducted during the project include the creation of a web page to guide users to resource banks…

  8. Applications and Methods Utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for Bioinformatics Resource Discovery and Disparate Data and Service Integration

    USDA-ARS?s Scientific Manuscript database

    Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of scientific data between information resources difficu...

  9. miRNEST database: an integrative approach in microRNA search and annotation

    PubMed Central

    Szcześniak, Michał Wojciech; Deorowicz, Sebastian; Gapski, Jakub; Kaczyński, Łukasz; Makałowska, Izabela

    2012-01-01

    Despite accumulating data on animal and plant microRNAs and their functions, existing public miRNA resources usually collect miRNAs from a very limited number of species. A lot of microRNAs, including those from model organisms, remain undiscovered. As a result there is a continuous need to search for new microRNAs. We present miRNEST (http://mirnest.amu.edu.pl), a comprehensive database of animal, plant and virus microRNAs. The core part of the database is built from our miRNA predictions conducted on Expressed Sequence Tags of 225 animal and 202 plant species. The miRNA search was performed based on sequence similarity and as many as 10 004 miRNA candidates in 221 animal and 199 plant species were discovered. Out of them only 299 have already been deposited in miRBase. Additionally, miRNEST has been integrated with external miRNA data from literature and 13 databases, which includes miRNA sequences, small RNA sequencing data, expression, polymorphisms and targets data as well as links to external miRNA resources, whenever applicable. All this makes miRNEST a considerable miRNA resource in a sense of number of species (544) that integrates a scattered miRNA data into a uniform format with a user-friendly web interface. PMID:22135287

  10. Making proteomics data accessible and reusable: current state of proteomics databases and repositories.

    PubMed

    Perez-Riverol, Yasset; Alpi, Emanuele; Wang, Rui; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2015-03-01

    Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data. © 2014 The Authors. PROTEOMICS published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome

    PubMed Central

    Schoof, Heiko; Zaccaria, Paolo; Gundlach, Heidrun; Lemcke, Kai; Rudd, Stephen; Kolesov, Grigory; Arnold, Roland; Mewes, H. W.; Mayer, Klaus F. X.

    2002-01-01

    Arabidopsis thaliana is the first plant for which the complete genome has been sequenced and published. Annotation of complex eukaryotic genomes requires more than the assignment of genetic elements to the sequence. Besides completing the list of genes, we need to discover their cellular roles, their regulation and their interactions in order to understand the workings of the whole plant. The MIPS Arabidopsis thaliana Database (MAtDB; http://mips.gsf.de/proj/thal/db) started out as a repository for genome sequence data in the European Scientists Sequencing Arabidopsis (ESSA) project and the Arabidopsis Genome Initiative. Our aim is to transform MAtDB into an integrated biological knowledge resource by integrating diverse data, tools, query and visualization capabilities and by creating a comprehensive resource for Arabidopsis as a reference model for other species, including crop plants. PMID:11752263

  12. CHOmine: an integrated data warehouse for CHO systems biology and modeling

    PubMed Central

    Hanscho, Michael; Ruckerbauer, David E.; Zanghellini, Jürgen; Borth, Nicole

    2017-01-01

    Abstract The last decade has seen a surge in published genome-scale information for Chinese hamster ovary (CHO) cells, which are the main production vehicles for therapeutic proteins. While a single access point is available at www.CHOgenome.org, the primary data is distributed over several databases at different institutions. Currently research is frequently hampered by a plethora of gene names and IDs that vary between published draft genomes and databases making systems biology analyses cumbersome and elaborate. Here we present CHOmine, an integrative data warehouse connecting data from various databases and links to other ones. Furthermore, we introduce CHOmodel, a web based resource that provides access to recently published CHO cell line specific metabolic reconstructions. Both resources allow to query CHO relevant data, find interconnections between different types of data and thus provides a simple, standardized entry point to the world of CHO systems biology. Database URL: http://www.chogenome.org PMID:28605771

  13. WheatGenome.info: A Resource for Wheat Genomics Resource.

    PubMed

    Lai, Kaitao

    2016-01-01

    An integrated database with a variety of Web-based systems named WheatGenome.info hosting wheat genome and genomic data has been developed to support wheat research and crop improvement. The resource includes multiple Web-based applications, which are implemented as a variety of Web-based systems. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This portal provides links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/ .

  14. TryTransDB: A web-based resource for transport proteins in Trypanosomatidae.

    PubMed

    Sonar, Krushna; Kabra, Ritika; Singh, Shailza

    2018-03-12

    TryTransDB is a web-based resource that stores transport protein data which can be retrieved using a standalone BLAST tool. We have attempted to create an integrated database that can be a one-stop shop for the researchers working with transport proteins of Trypanosomatidae family. TryTransDB (Trypanosomatidae Transport Protein Database) is a web based comprehensive resource that can fire a BLAST search against most of the transport protein sequences (protein and nucleotide) from Trypanosomatidae family organisms. This web resource further allows to compute a phylogenetic tree by performing multiple sequence alignment (MSA) using CLUSTALW suite embedded in it. Also, cross-linking to other databases helps in gathering more information for a certain transport protein in a single website.

  15. Database Resources of the BIG Data Center in 2018

    PubMed Central

    Xu, Xingjian; Hao, Lili; Zhu, Junwei; Tang, Bixia; Zhou, Qing; Song, Fuhai; Chen, Tingting; Zhang, Sisi; Dong, Lili; Lan, Li; Wang, Yanqing; Sang, Jian; Hao, Lili; Liang, Fang; Cao, Jiabao; Liu, Fang; Liu, Lin; Wang, Fan; Ma, Yingke; Xu, Xingjian; Zhang, Lijuan; Chen, Meili; Tian, Dongmei; Li, Cuiping; Dong, Lili; Du, Zhenglin; Yuan, Na; Zeng, Jingyao; Zhang, Zhewen; Wang, Jinyue; Shi, Shuo; Zhang, Yadong; Pan, Mengyu; Tang, Bixia; Zou, Dong; Song, Shuhui; Sang, Jian; Xia, Lin; Wang, Zhennan; Li, Man; Cao, Jiabao; Niu, Guangyi; Zhang, Yang; Sheng, Xin; Lu, Mingming; Wang, Qi; Xiao, Jingfa; Zou, Dong; Wang, Fan; Hao, Lili; Liang, Fang; Li, Mengwei; Sun, Shixiang; Zou, Dong; Li, Rujiao; Yu, Chunlei; Wang, Guangyu; Sang, Jian; Liu, Lin; Li, Mengwei; Li, Man; Niu, Guangyi; Cao, Jiabao; Sun, Shixiang; Xia, Lin; Yin, Hongyan; Zou, Dong; Xu, Xingjian; Ma, Lina; Chen, Huanxin; Sun, Yubin; Yu, Lei; Zhai, Shuang; Sun, Mingyuan; Zhang, Zhang; Zhao, Wenming; Xiao, Jingfa; Bao, Yiming; Song, Shuhui; Hao, Lili; Li, Rujiao; Ma, Lina; Sang, Jian; Wang, Yanqing; Tang, Bixia; Zou, Dong; Wang, Fan

    2018-01-01

    Abstract The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. PMID:29036542

  16. Emotional Intelligence Research within Human Resource Development Scholarship

    ERIC Educational Resources Information Center

    Farnia, Forouzan; Nafukho, Fredrick Muyia

    2016-01-01

    Purpose: The purpose of this study is to review and synthesize pertinent emotional intelligence (EI) research within the human resource development (HRD) scholarship. Design/methodology/approach: An integrative review of literature was conducted and multiple electronic databases were searched to find the relevant resources. Using the content…

  17. Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.

    PubMed

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.

  18. Integrating GIS, Archeology, and the Internet.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sera White; Brenda Ringe Pace; Randy Lee

    2004-08-01

    At the Idaho National Engineering and Environmental Laboratory's (INEEL) Cultural Resource Management Office, a newly developed Data Management Tool (DMT) is improving management and long-term stewardship of cultural resources. The fully integrated system links an archaeological database, a historical database, and a research database to spatial data through a customized user interface using ArcIMS and Active Server Pages. Components of the new DMT are tailored specifically to the INEEL and include automated data entry forms for historic and prehistoric archaeological sites, specialized queries and reports that address both yearly and project-specific documentation requirements, and unique field recording forms. The predictivemore » modeling component increases the DMT’s value for land use planning and long-term stewardship. The DMT enhances the efficiency of archive searches, improving customer service, oversight, and management of the large INEEL cultural resource inventory. In the future, the DMT will facilitate data sharing with regulatory agencies, tribal organizations, and the general public.« less

  19. WheatGenome.info: an integrated database and portal for wheat genome information.

    PubMed

    Lai, Kaitao; Berkman, Paul J; Lorenc, Michal Tadeusz; Duran, Chris; Smits, Lars; Manoli, Sahana; Stiller, Jiri; Edwards, David

    2012-02-01

    Bread wheat (Triticum aestivum) is one of the most important crop plants, globally providing staple food for a large proportion of the human population. However, improvement of this crop has been limited due to its large and complex genome. Advances in genomics are supporting wheat crop improvement. We provide a variety of web-based systems hosting wheat genome and genomic data to support wheat research and crop improvement. WheatGenome.info is an integrated database resource which includes multiple web-based applications. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second-generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This system includes links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/.

  20. ERAIZDA: a model for holistic annotation of animal infectious and zoonotic diseases

    PubMed Central

    Buza, Teresia M.; Jack, Sherman W.; Kirunda, Halid; Khaitsa, Margaret L.; Lawrence, Mark L.; Pruett, Stephen; Peterson, Daniel G.

    2015-01-01

    There is an urgent need for a unified resource that integrates trans-disciplinary annotations of emerging and reemerging animal infectious and zoonotic diseases. Such data integration will provide wonderful opportunity for epidemiologists, researchers and health policy makers to make data-driven decisions designed to improve animal health. Integrating emerging and reemerging animal infectious and zoonotic disease data from a large variety of sources into a unified open-access resource provides more plausible arguments to achieve better understanding of infectious and zoonotic diseases. We have developed a model for interlinking annotations of these diseases. These diseases are of particular interest because of the threats they pose to animal health, human health and global health security. We demonstrated the application of this model using brucellosis, an infectious and zoonotic disease. Preliminary annotations were deposited into VetBioBase database (http://vetbiobase.igbb.msstate.edu). This database is associated with user-friendly tools to facilitate searching, retrieving and downloading of disease-related information. Database URL: http://vetbiobase.igbb.msstate.edu PMID:26581408

  1. New data sources and derived products for the SRER digital spatial database

    Treesearch

    Craig Wissler; Deborah Angell

    2003-01-01

    The Santa Rita Experimental Range (SRER) digital database was developed to automate and preserve ecological data and increase their accessibility. The digital data holdings include a spatial database that is used to integrate ecological data in a known reference system and to support spatial analyses. Recently, the Advanced Resource Technology (ART) facility has added...

  2. Searching Across the International Space Station Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; McDermott, William J.; Smith, Ernest E.; Bell, David G.; Gurram, Mohana

    2007-01-01

    Data access in the enterprise generally requires us to combine data from different sources and different formats. It is advantageous thus to focus on the intersection of the knowledge across sources and domains; keeping irrelevant knowledge around only serves to make the integration more unwieldy and more complicated than necessary. A context search over multiple domain is proposed in this paper to use context sensitive queries to support disciplined manipulation of domain knowledge resources. The objective of a context search is to provide the capability for interrogating many domain knowledge resources, which are largely semantically disjoint. The search supports formally the tasks of selecting, combining, extending, specializing, and modifying components from a diverse set of domains. This paper demonstrates a new paradigm in composition of information for enterprise applications. In particular, it discusses an approach to achieving data integration across multiple sources, in a manner that does not require heavy investment in database and middleware maintenance. This lean approach to integration leads to cost-effectiveness and scalability of data integration with an underlying schemaless object-relational database management system. This highly scalable, information on demand system framework, called NX-Search, which is an implementation of an information system built on NETMARK. NETMARK is a flexible, high-throughput open database integration framework for managing, storing, and searching unstructured or semi-structured arbitrary XML and HTML used widely at the National Aeronautics Space Administration (NASA) and industry.

  3. Chesapeake Bay Program Water Quality Database

    EPA Pesticide Factsheets

    The Chesapeake Information Management System (CIMS), designed in 1996, is an integrated, accessible information management system for the Chesapeake Bay Region. CIMS is an organized, distributed library of information and software tools designed to increase basin-wide public access to Chesapeake Bay information. The information delivered by CIMS includes technical and public information, educational material, environmental indicators, policy documents, and scientific data. Through the use of relational databases, web-based programming, and web-based GIS a large number of Internet resources have been established. These resources include multiple distributed on-line databases, on-demand graphing and mapping of environmental data, and geographic searching tools for environmental information. Baseline monitoring data, summarized data and environmental indicators that document ecosystem status and trends, confirm linkages between water quality, habitat quality and abundance, and the distribution and integrity of biological populations are also available. One of the major features of the CIMS network is the Chesapeake Bay Program's Data Hub, providing users access to a suite of long- term water quality and living resources databases. Chesapeake Bay mainstem and tidal tributary water quality, benthic macroinvertebrates, toxics, plankton, and fluorescence data can be obtained for a network of over 800 monitoring stations.

  4. PosMed-plus: an intelligent search engine that inferentially integrates cross-species information resources for molecular breeding of plants.

    PubMed

    Makita, Yuko; Kobayashi, Norio; Mochizuki, Yoshiki; Yoshida, Yuko; Asano, Satomi; Heida, Naohiko; Deshpande, Mrinalini; Bhatia, Rinki; Matsushima, Akihiro; Ishii, Manabu; Kawaguchi, Shuji; Iida, Kei; Hanada, Kosuke; Kuromori, Takashi; Seki, Motoaki; Shinozaki, Kazuo; Toyoda, Tetsuro

    2009-07-01

    Molecular breeding of crops is an efficient way to upgrade plant functions useful to mankind. A key step is forward genetics or positional cloning to identify the genes that confer useful functions. In order to accelerate the whole research process, we have developed an integrated database system powered by an intelligent data-retrieval engine termed PosMed-plus (Positional Medline for plant upgrading science), allowing us to prioritize highly promising candidate genes in a given chromosomal interval(s) of Arabidopsis thaliana and rice, Oryza sativa. By inferentially integrating cross-species information resources including genomes, transcriptomes, proteomes, localizomes, phenomes and literature, the system compares a user's query, such as phenotypic or functional keywords, with the literature associated with the relevant genes located within the interval. By utilizing orthologous and paralogous correspondences, PosMed-plus efficiently integrates cross-species information to facilitate the ranking of rice candidate genes based on evidence from other model species such as Arabidopsis. PosMed-plus is a plant science version of the PosMed system widely used by mammalian researchers, and provides both a powerful integrative search function and a rich integrative display of the integrated databases. PosMed-plus is the first cross-species integrated database that inferentially prioritizes candidate genes for forward genetics approaches in plant science, and will be expanded for wider use in plant upgrading in many species.

  5. Database resources for the Tuberculosis community

    PubMed Central

    Lew, Jocelyne M.; Mao, Chunhong; Shukla, Maulik; Warren, Andrew; Will, Rebecca; Kuznetsov, Dmitry; Xenarios, Ioannis; Robertson, Brian D.; Gordon, Stephen V.; Schnappinger, Dirk; Cole, Stewart T.; Sobral, Bruno

    2013-01-01

    Summary Access to online repositories for genomic and associated “-omics” datasets is now an essential part of everyday research activity. It is important therefore that the Tuberculosis community is aware of the databases and tools available to them online, as well as for the database hosts to know what the needs of the research community are. One of the goals of the Tuberculosis Annotation Jamboree, held in Washington DC on March 7th–8th 2012, was therefore to provide an overview of the current status of three key Tuberculosis resources, TubercuList (tuberculist.epfl.ch), TB Database (www.tbdb.org), and Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org). Here we summarize some key updates and upcoming features in TubercuList, and provide an overview of the PATRIC site and its online tools for pathogen RNA-Seq analysis. PMID:23332401

  6. SoyBase, The USDA-ARS Soybean Genetics and Genomics Database

    USDA-ARS?s Scientific Manuscript database

    SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean. SoyBase contains the most current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. The...

  7. MaizeGDB: New tools and resource

    USDA-ARS?s Scientific Manuscript database

    MaizeGDB, the USDA-ARS genetics and genomics database, is a highly curated, community-oriented informatics service to researchers focused on the crop plant and model organism Zea mays. MaizeGDB facilitates maize research by curating, integrating, and maintaining a database that serves as the central...

  8. Database Resources of the BIG Data Center in 2018.

    PubMed

    2018-01-04

    The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Regulators of Androgen Action Resource: a one-stop shop for the comprehensive study of androgen receptor action.

    PubMed

    DePriest, Adam D; Fiandalo, Michael V; Schlanger, Simon; Heemers, Frederike; Mohler, James L; Liu, Song; Heemers, Hannelore V

    2016-01-01

    Androgen receptor (AR) is a ligand-activated transcription factor that is the main target for treatment of non-organ-confined prostate cancer (CaP). Failure of life-prolonging AR-targeting androgen deprivation therapy is due to flexibility in steroidogenic pathways that control intracrine androgen levels and variability in the AR transcriptional output. Androgen biosynthesis enzymes, androgen transporters and AR-associated coregulators are attractive novel CaP treatment targets. These proteins, however, are characterized by multiple transcript variants and isoforms, are subject to genomic alterations, and are differentially expressed among CaPs. Determining their therapeutic potential requires evaluation of extensive, diverse datasets that are dispersed over multiple databases, websites and literature reports. Mining and integrating these datasets are cumbersome, time-consuming tasks and provide only snapshots of relevant information. To overcome this impediment to effective, efficient study of AR and potential drug targets, we developed the Regulators of Androgen Action Resource (RAAR), a non-redundant, curated and user-friendly searchable web interface. RAAR centralizes information on gene function, clinical relevance, and resources for 55 genes that encode proteins involved in biosynthesis, metabolism and transport of androgens and for 274 AR-associated coregulator genes. Data in RAAR are organized in two levels: (i) Information pertaining to production of androgens is contained in a 'pre-receptor level' database, and coregulator gene information is provided in a 'post-receptor level' database, and (ii) an 'other resources' database contains links to additional databases that are complementary to and useful to pursue further the information provided in RAAR. For each of its 329 entries, RAAR provides access to more than 20 well-curated publicly available databases, and thus, access to thousands of data points. Hyperlinks provide direct access to gene-specific entries in the respective database(s). RAAR is a novel, freely available resource that provides fast, reliable and easy access to integrated information that is needed to develop alternative CaP therapies. Database URL: http://www.lerner.ccf.org/cancerbio/heemers/RAAR/search/. © The Author(s) 2016. Published by Oxford University Press.

  10. EuroPhenome and EMPReSS: online mouse phenotyping resource

    PubMed Central

    Mallon, Ann-Marie; Hancock, John M.

    2008-01-01

    EuroPhenome (http://www.europhenome.org) and EMPReSS (http://empress.har.mrc.ac.uk/) form an integrated resource to provide access to data and procedures for mouse phenotyping. EMPReSS describes 96 Standard Operating Procedures for mouse phenotyping. EuroPhenome contains data resulting from carrying out EMPReSS protocols on four inbred laboratory mouse strains. As well as web interfaces, both resources support web services to enable integration with other mouse phenotyping and functional genetics resources, and are committed to initiatives to improve integration of mouse phenotype databases. EuroPhenome will be the repository for a recently initiated effort to carry out large-scale phenotyping on a large number of knockout mouse lines (EUMODIC). PMID:17905814

  11. EuroPhenome and EMPReSS: online mouse phenotyping resource.

    PubMed

    Mallon, Ann-Marie; Blake, Andrew; Hancock, John M

    2008-01-01

    EuroPhenome (http://www.europhenome.org) and EMPReSS (http://empress.har.mrc.ac.uk/) form an integrated resource to provide access to data and procedures for mouse phenotyping. EMPReSS describes 96 Standard Operating Procedures for mouse phenotyping. EuroPhenome contains data resulting from carrying out EMPReSS protocols on four inbred laboratory mouse strains. As well as web interfaces, both resources support web services to enable integration with other mouse phenotyping and functional genetics resources, and are committed to initiatives to improve integration of mouse phenotype databases. EuroPhenome will be the repository for a recently initiated effort to carry out large-scale phenotyping on a large number of knockout mouse lines (EUMODIC).

  12. [Data validation methods and discussion on Chinese materia medica resource survey].

    PubMed

    Zhang, Yue; Ma, Wei-Feng; Zhang, Xiao-Bo; Zhu, Shou-Dong; Guo, Lan-Ping; Wang, Xing-Xing

    2013-07-01

    From the beginning of the fourth national survey of the Chinese materia medica resources, there were 22 provinces have conducted pilots. The survey teams have reported immense data, it put forward the very high request to the database system construction. In order to ensure the quality, it is necessary to check and validate the data in database system. Data validation is important methods to ensure the validity, integrity and accuracy of census data. This paper comprehensively introduce the data validation system of the fourth national survey of the Chinese materia medica resources database system, and further improve the design idea and programs of data validation. The purpose of this study is to promote the survey work smoothly.

  13. Exploring Short Linear Motifs Using the ELM Database and Tools.

    PubMed

    Gouw, Marc; Sámano-Sánchez, Hugo; Van Roey, Kim; Diella, Francesca; Gibson, Toby J; Dinkel, Holger

    2017-06-27

    The Eukaryotic Linear Motif (ELM) resource is dedicated to the characterization and prediction of short linear motifs (SLiMs). SLiMs are compact, degenerate peptide segments found in many proteins and essential to almost all cellular processes. However, despite their abundance, SLiMs remain largely uncharacterized. The ELM database is a collection of manually annotated SLiM instances curated from experimental literature. In this article we illustrate how to browse and search the database for curated SLiM data, and cover the different types of data integrated in the resource. We also cover how to use this resource in order to predict SLiMs in known as well as novel proteins, and how to interpret the results generated by the ELM prediction pipeline. The ELM database is a very rich resource, and in the following protocols we give helpful examples to demonstrate how this knowledge can be used to improve your own research. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  14. PathCase-SB architecture and database design

    PubMed Central

    2011-01-01

    Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889

  15. The Advent of Portals.

    ERIC Educational Resources Information Center

    Jackson, Mary E.

    2002-01-01

    Explains portals as tools that gather a variety of electronic information resources, including local library resources, into a single Web page. Highlights include cross-database searching; integration with university portals and course management software; the ARL (Association of Research Libraries) Scholars Portal Initiative; and selected vendors…

  16. PATRIC, the bacterial bioinformatics database and analysis resource.

    PubMed

    Wattam, Alice R; Abraham, David; Dalay, Oral; Disz, Terry L; Driscoll, Timothy; Gabbard, Joseph L; Gillespie, Joseph J; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K; Olson, Robert; Overbeek, Ross; Pusch, Gordon D; Shukla, Maulik; Schulman, Julie; Stevens, Rick L; Sullivan, Daniel E; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J C; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W

    2014-01-01

    The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10,000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.

  17. PATRIC, the bacterial bioinformatics database and analysis resource

    PubMed Central

    Wattam, Alice R.; Abraham, David; Dalay, Oral; Disz, Terry L.; Driscoll, Timothy; Gabbard, Joseph L.; Gillespie, Joseph J.; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olson, Robert; Overbeek, Ross; Pusch, Gordon D.; Shukla, Maulik; Schulman, Julie; Stevens, Rick L.; Sullivan, Daniel E.; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J.C.; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W.

    2014-01-01

    The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein–protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue. PMID:24225323

  18. Ginseng Genome Database: an open-access platform for genomics of Panax ginseng.

    PubMed

    Jayakodi, Murukarthick; Choi, Beom-Soon; Lee, Sang-Choon; Kim, Nam-Hoon; Park, Jee Young; Jang, Woojong; Lakshmanan, Meiyappan; Mohan, Shobhana V G; Lee, Dong-Yup; Yang, Tae-Jin

    2018-04-12

    The ginseng (Panax ginseng C.A. Meyer) is a perennial herbaceous plant that has been used in traditional oriental medicine for thousands of years. Ginsenosides, which have significant pharmacological effects on human health, are the foremost bioactive constituents in this plant. Having realized the importance of this plant to humans, an integrated omics resource becomes indispensable to facilitate genomic research, molecular breeding and pharmacological study of this herb. The first draft genome sequences of P. ginseng cultivar "Chunpoong" were reported recently. Here, using the draft genome, transcriptome, and functional annotation datasets of P. ginseng, we have constructed the Ginseng Genome Database http://ginsengdb.snu.ac.kr /, the first open-access platform to provide comprehensive genomic resources of P. ginseng. The current version of this database provides the most up-to-date draft genome sequence (of approximately 3000 Mbp of scaffold sequences) along with the structural and functional annotations for 59,352 genes and digital expression of genes based on transcriptome data from different tissues, growth stages and treatments. In addition, tools for visualization and the genomic data from various analyses are provided. All data in the database were manually curated and integrated within a user-friendly query page. This database provides valuable resources for a range of research fields related to P. ginseng and other species belonging to the Apiales order as well as for plant research communities in general. Ginseng genome database can be accessed at http://ginsengdb.snu.ac.kr /.

  19. A Novel Concept for the Search and Retrieval of the Derwent Markush Resource Database.

    PubMed

    Barth, Andreas; Stengel, Thomas; Litterst, Edwin; Kraut, Hans; Matuszczyk, Henry; Ailer, Franz; Hajkowski, Steve

    2016-05-23

    The representation of and search for generic chemical structures (Markush) remains a continuing challenge. Several research groups have addressed this problem, and over time a limited number of practical solutions have been proposed. Today there are two large commercial providers of Markush databases: Chemical Abstracts Service (CAS) and Thomson Reuters. The Thomson Reuters "Derwent" Markush database is currently offered via the online services Questel and STN and as a data feed for in-house use. The aim of this paper is to briefly review the existing Markush systems (databases plus search engines) and to describe our new approach for the implementation of the Derwent Markush Resource on STN. Our new approach demonstrates the integration of the Derwent Markush Resource database into the existing chemistry-focused STN platform without loss of detail. This provides compatibility with other structure and Markush databases on STN and at the same time makes it possible to deploy the specific features and functions of the Derwent approach. It is shown that the different Markush languages developed by CAS and Derwent can be combined into a single general Markush description. In this concept the generic nodes are grouped together in a unique hierarchy where all chemical elements and fragments can be integrated. As a consequence, both systems are searchable using a single structure query. Moreover, the presented concept could serve as a promising starting point for a common generalized description of Markush structures.

  20. The Biofuel Feedstock Genomics Resource: a web-based portal and database to enable functional genomics of plant biofuel feedstock species.

    PubMed

    Childs, Kevin L; Konganti, Kranti; Buell, C Robin

    2012-01-01

    Major feedstock sources for future biofuel production are likely to be high biomass producing plant species such as poplar, pine, switchgrass, sorghum and maize. One active area of research in these species is genome-enabled improvement of lignocellulosic biofuel feedstock quality and yield. To facilitate genomic-based investigations in these species, we developed the Biofuel Feedstock Genomic Resource (BFGR), a database and web-portal that provides high-quality, uniform and integrated functional annotation of gene and transcript assembly sequences from species of interest to lignocellulosic biofuel feedstock researchers. The BFGR includes sequence data from 54 species and permits researchers to view, analyze and obtain annotation at the gene, transcript, protein and genome level. Annotation of biochemical pathways permits the identification of key genes and transcripts central to the improvement of lignocellulosic properties in these species. The integrated nature of the BFGR in terms of annotation methods, orthologous/paralogous relationships and linkage to seven species with complete genome sequences allows comparative analyses for biofuel feedstock species with limited sequence resources. Database URL: http://bfgr.plantbiology.msu.edu.

  1. Human Ageing Genomic Resources: new and updated databases

    PubMed Central

    Tacutu, Robi; Thornton, Daniel; Johnson, Emily; Budovsky, Arie; Barardo, Diogo; Craig, Thomas; Diana, Eugene; Lehmann, Gilad; Toren, Dmitri; Wang, Jingwei; Fraifeld, Vadim E

    2018-01-01

    Abstract In spite of a growing body of research and data, human ageing remains a poorly understood process. Over 10 years ago we developed the Human Ageing Genomic Resources (HAGR), a collection of databases and tools for studying the biology and genetics of ageing. Here, we present HAGR’s main functionalities, highlighting new additions and improvements. HAGR consists of six core databases: (i) the GenAge database of ageing-related genes, in turn composed of a dataset of >300 human ageing-related genes and a dataset with >2000 genes associated with ageing or longevity in model organisms; (ii) the AnAge database of animal ageing and longevity, featuring >4000 species; (iii) the GenDR database with >200 genes associated with the life-extending effects of dietary restriction; (iv) the LongevityMap database of human genetic association studies of longevity with >500 entries; (v) the DrugAge database with >400 ageing or longevity-associated drugs or compounds; (vi) the CellAge database with >200 genes associated with cell senescence. All our databases are manually curated by experts and regularly updated to ensure a high quality data. Cross-links across our databases and to external resources help researchers locate and integrate relevant information. HAGR is freely available online (http://genomics.senescence.info/). PMID:29121237

  2. MOPED 2.5—An Integrated Multi-Omics Resource: Multi-Omics Profiling Expression Database Now Includes Transcriptomics Data

    PubMed Central

    Montague, Elizabeth; Stanberry, Larissa; Higdon, Roger; Janko, Imre; Lee, Elaine; Anderson, Nathaniel; Choiniere, John; Stewart, Elizabeth; Yandl, Gregory; Broomall, William; Kolker, Natali

    2014-01-01

    Abstract Multi-omics data-driven scientific discovery crucially rests on high-throughput technologies and data sharing. Currently, data are scattered across single omics repositories, stored in varying raw and processed formats, and are often accompanied by limited or no metadata. The Multi-Omics Profiling Expression Database (MOPED, http://moped.proteinspire.org) version 2.5 is a freely accessible multi-omics expression database. Continual improvement and expansion of MOPED is driven by feedback from the Life Sciences Community. In order to meet the emergent need for an integrated multi-omics data resource, MOPED 2.5 now includes gene relative expression data in addition to protein absolute and relative expression data from over 250 large-scale experiments. To facilitate accurate integration of experiments and increase reproducibility, MOPED provides extensive metadata through the Data-Enabled Life Sciences Alliance (DELSA Global, http://delsaglobal.org) metadata checklist. MOPED 2.5 has greatly increased the number of proteomics absolute and relative expression records to over 500,000, in addition to adding more than four million transcriptomics relative expression records. MOPED has an intuitive user interface with tabs for querying different types of omics expression data and new tools for data visualization. Summary information including expression data, pathway mappings, and direct connection between proteins and genes can be viewed on Protein and Gene Details pages. These connections in MOPED provide a context for multi-omics expression data exploration. Researchers are encouraged to submit omics data which will be consistently processed into expression summaries. MOPED as a multi-omics data resource is a pivotal public database, interdisciplinary knowledge resource, and platform for multi-omics understanding. PMID:24910945

  3. Database resources for the tuberculosis community.

    PubMed

    Lew, Jocelyne M; Mao, Chunhong; Shukla, Maulik; Warren, Andrew; Will, Rebecca; Kuznetsov, Dmitry; Xenarios, Ioannis; Robertson, Brian D; Gordon, Stephen V; Schnappinger, Dirk; Cole, Stewart T; Sobral, Bruno

    2013-01-01

    Access to online repositories for genomic and associated "-omics" datasets is now an essential part of everyday research activity. It is important therefore that the Tuberculosis community is aware of the databases and tools available to them online, as well as for the database hosts to know what the needs of the research community are. One of the goals of the Tuberculosis Annotation Jamboree, held in Washington DC on March 7th-8th 2012, was therefore to provide an overview of the current status of three key Tuberculosis resources, TubercuList (tuberculist.epfl.ch), TB Database (www.tbdb.org), and Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org). Here we summarize some key updates and upcoming features in TubercuList, and provide an overview of the PATRIC site and its online tools for pathogen RNA-Seq analysis. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. ERAIZDA: a model for holistic annotation of animal infectious and zoonotic diseases.

    PubMed

    Buza, Teresia M; Jack, Sherman W; Kirunda, Halid; Khaitsa, Margaret L; Lawrence, Mark L; Pruett, Stephen; Peterson, Daniel G

    2015-01-01

    There is an urgent need for a unified resource that integrates trans-disciplinary annotations of emerging and reemerging animal infectious and zoonotic diseases. Such data integration will provide wonderful opportunity for epidemiologists, researchers and health policy makers to make data-driven decisions designed to improve animal health. Integrating emerging and reemerging animal infectious and zoonotic disease data from a large variety of sources into a unified open-access resource provides more plausible arguments to achieve better understanding of infectious and zoonotic diseases. We have developed a model for interlinking annotations of these diseases. These diseases are of particular interest because of the threats they pose to animal health, human health and global health security. We demonstrated the application of this model using brucellosis, an infectious and zoonotic disease. Preliminary annotations were deposited into VetBioBase database (http://vetbiobase.igbb.msstate.edu). This database is associated with user-friendly tools to facilitate searching, retrieving and downloading of disease-related information. Database URL: http://vetbiobase.igbb.msstate.edu. © The Author(s) 2015. Published by Oxford University Press.

  5. Methods to Register Models and Input/Output Parameters for Integrated Modeling

    EPA Science Inventory

    Significant resources can be required when constructing integrated modeling systems. In a typical application, components (e.g., models and databases) created by different developers are assimilated, requiring the framework’s functionality to bridge the gap between the user’s kno...

  6. RAID v2.0: an updated resource of RNA-associated interactions across organisms

    PubMed Central

    Yi, Ying; Zhao, Yue; Li, Chunhua; Zhang, Lin; Huang, Huiying; Li, Yana; Liu, Lanlan; Hou, Ping; Cui, Tianyu; Tan, Puwen; Hu, Yongfei; Zhang, Ting; Huang, Yan; Li, Xiaobo; Yu, Jia; Wang, Dong

    2017-01-01

    With the development of biotechnologies and computational prediction algorithms, the number of experimental and computational prediction RNA-associated interactions has grown rapidly in recent years. However, diverse RNA-associated interactions are scattered over a wide variety of resources and organisms, whereas a fully comprehensive view of diverse RNA-associated interactions is still not available for any species. Hence, we have updated the RAID database to version 2.0 (RAID v2.0, www.rna-society.org/raid/) by integrating experimental and computational prediction interactions from manually reading literature and other database resources under one common framework. The new developments in RAID v2.0 include (i) over 850-fold RNA-associated interactions, an enhancement compared to the previous version; (ii) numerous resources integrated with experimental or computational prediction evidence for each RNA-associated interaction; (iii) a reliability assessment for each RNA-associated interaction based on an integrative confidence score; and (iv) an increase of species coverage to 60. Consequently, RAID v2.0 recruits more than 5.27 million RNA-associated interactions, including more than 4 million RNA–RNA interactions and more than 1.2 million RNA–protein interactions, referring to nearly 130 000 RNA/protein symbols across 60 species. PMID:27899615

  7. DRUMS: a human disease related unique gene mutation search engine.

    PubMed

    Li, Zuofeng; Liu, Xingnan; Wen, Jingran; Xu, Ye; Zhao, Xin; Li, Xuan; Liu, Lei; Zhang, Xiaoyan

    2011-10-01

    With the completion of the human genome project and the development of new methods for gene variant detection, the integration of mutation data and its phenotypic consequences has become more important than ever. Among all available resources, locus-specific databases (LSDBs) curate one or more specific genes' mutation data along with high-quality phenotypes. Although some genotype-phenotype data from LSDB have been integrated into central databases little effort has been made to integrate all these data by a search engine approach. In this work, we have developed disease related unique gene mutation search engine (DRUMS), a search engine for human disease related unique gene mutation as a convenient tool for biologists or physicians to retrieve gene variant and related phenotype information. Gene variant and phenotype information were stored in a gene-centred relational database. Moreover, the relationships between mutations and diseases were indexed by the uniform resource identifier from LSDB, or another central database. By querying DRUMS, users can access the most popular mutation databases under one interface. DRUMS could be treated as a domain specific search engine. By using web crawling, indexing, and searching technologies, it provides a competitively efficient interface for searching and retrieving mutation data and their relationships to diseases. The present system is freely accessible at http://www.scbit.org/glif/new/drums/index.html. © 2011 Wiley-Liss, Inc.

  8. NRF2-ome: an integrated web resource to discover protein interaction and regulatory networks of NRF2.

    PubMed

    Türei, Dénes; Papp, Diána; Fazekas, Dávid; Földvári-Nagy, László; Módos, Dezső; Lenti, Katalin; Csermely, Péter; Korcsmáros, Tamás

    2013-01-01

    NRF2 is the master transcriptional regulator of oxidative and xenobiotic stress responses. NRF2 has important roles in carcinogenesis, inflammation, and neurodegenerative diseases. We developed an online resource, NRF2-ome, to provide an integrated and systems-level database for NRF2. The database contains manually curated and predicted interactions of NRF2 as well as data from external interaction databases. We integrated NRF2 interactome with NRF2 target genes, NRF2 regulating TFs, and miRNAs. We connected NRF2-ome to signaling pathways to allow mapping upstream NRF2 regulatory components that could directly or indirectly influence NRF2 activity totaling 35,967 protein-protein and signaling interactions. The user-friendly website allows researchers without computational background to search, browse, and download the database. The database can be downloaded in SQL, CSV, BioPAX, SBML, PSI-MI, and in a Cytoscape CYS file formats. We illustrated the applicability of the website by suggesting a posttranscriptional negative feedback of NRF2 by MAFG protein and raised the possibility of a connection between NRF2 and the JAK/STAT pathway through STAT1 and STAT3. NRF2-ome can also be used as an evaluation tool to help researchers and drug developers to understand the hidden regulatory mechanisms in the complex network of NRF2.

  9. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources.

    PubMed

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/. © The Author(s) 2015. Published by Oxford University Press.

  10. EcoliWiki: a wiki-based community resource for Escherichia coli

    PubMed Central

    McIntosh, Brenley K.; Renfro, Daniel P.; Knapp, Gwendowlyn S.; Lairikyengbam, Chanchala R.; Liles, Nathan M.; Niu, Lili; Supak, Amanda M.; Venkatraman, Anand; Zweifel, Adrienne E.; Siegele, Deborah A.; Hu, James C.

    2012-01-01

    EcoliWiki is the community annotation component of the PortEco (http://porteco.org; formerly EcoliHub) project, an online data resource that integrates information on laboratory strains of Escherichia coli, its phages, plasmids and mobile genetic elements. As one of the early adopters of the wiki approach to model organism databases, EcoliWiki was designed to not only facilitate community-driven sharing of biological knowledge about E. coli as a model organism, but also to be interoperable with other data resources. EcoliWiki content currently covers genes from five laboratory E. coli strains, 21 bacteriophage genomes, F plasmid and eight transposons. EcoliWiki integrates the Mediawiki wiki platform with other open-source software tools and in-house software development to extend how wikis can be used for model organism databases. EcoliWiki can be accessed online at http://ecoliwiki.net. PMID:22064863

  11. Second-Tier Database for Ecosystem Focus, 2003-2004 Annual Report.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    University of Washington, Columbia Basin Research, DART Project Staff,

    2004-12-01

    The Second-Tier Database for Ecosystem Focus (Contract 00004124) provides direct and timely public access to Columbia Basin environmental, operational, fishery and riverine data resources for federal, state, public and private entities essential to sound operational and resource management. The database also assists with juvenile and adult mainstem passage modeling supporting federal decisions affecting the operation of the FCRPS. The Second-Tier Database known as Data Access in Real Time (DART) integrates public data for effective access, consideration and application. DART also provides analysis tools and performance measures for evaluating the condition of Columbia Basin salmonid stocks. These services are critical tomore » BPA's implementation of its fish and wildlife responsibilities under the Endangered Species Act (ESA).« less

  12. Development and implementation of an Integrated Water Resources Management System (IWRMS)

    NASA Astrophysics Data System (ADS)

    Flügel, W.-A.; Busch, C.

    2011-04-01

    One of the innovative objectives in the EC project BRAHMATWINN was the development of a stakeholder oriented Integrated Water Resources Management System (IWRMS). The toolset integrates the findings of the project and presents it in a user friendly way for decision support in sustainable integrated water resources management (IWRM) in river basins. IWRMS is a framework, which integrates different types of basin information and which supports the development of IWRM options for climate change mitigation. It is based on the River Basin Information System (RBIS) data models and delivers a graphical user interface for stakeholders. A special interface was developed for the integration of the enhanced DANUBIA model input and the NetSyMod model with its Mulino decision support system (mulino mDss) component. The web based IWRMS contains and combines different types of data and methods to provide river basin data and information for decision support. IWRMS is based on a three tier software framework which uses (i) html/javascript at the client tier, (ii) PHP programming language to realize the application tier, and (iii) a postgresql/postgis database tier to manage and storage all data, except the DANUBIA modelling raw data, which are file based and registered in the database tier. All three tiers can reside on one or different computers and are adapted to the local hardware infrastructure. IWRMS as well as RBIS are based on Open Source Software (OSS) components and flexible and time saving access to that database is guaranteed by web-based interfaces for data visualization and retrieval. The IWRMS is accessible via the BRAHMATWINN homepage: http://www.brahmatwinn.uni-jena.de and a user manual for the RBIS is available for download as well.

  13. The Eukaryotic Pathogen Databases: a functional genomic resource integrating data from human and veterinary parasites.

    PubMed

    Harb, Omar S; Roos, David S

    2015-01-01

    Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.

  14. IDAAPM: integrated database of ADMET and adverse effects of predictive modeling based on FDA approved drug data.

    PubMed

    Legehar, Ashenafi; Xhaard, Henri; Ghemtio, Leo

    2016-01-01

    The disposition of a pharmaceutical compound within an organism, i.e. its Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) properties and adverse effects, critically affects late stage failure of drug candidates and has led to the withdrawal of approved drugs. Computational methods are effective approaches to reduce the number of safety issues by analyzing possible links between chemical structures and ADMET or adverse effects, but this is limited by the size, quality, and heterogeneity of the data available from individual sources. Thus, large, clean and integrated databases of approved drug data, associated with fast and efficient predictive tools are desirable early in the drug discovery process. We have built a relational database (IDAAPM) to integrate available approved drug data such as drug approval information, ADMET and adverse effects, chemical structures and molecular descriptors, targets, bioactivity and related references. The database has been coupled with a searchable web interface and modern data analytics platform (KNIME) to allow data access, data transformation, initial analysis and further predictive modeling. Data were extracted from FDA resources and supplemented from other publicly available databases. Currently, the database contains information regarding about 19,226 FDA approval applications for 31,815 products (small molecules and biologics) with their approval history, 2505 active ingredients, together with as many ADMET properties, 1629 molecular structures, 2.5 million adverse effects and 36,963 experimental drug-target bioactivity data. IDAAPM is a unique resource that, in a single relational database, provides detailed information on FDA approved drugs including their ADMET properties and adverse effects, the corresponding targets with bioactivity data, coupled with a data analytics platform. It can be used to perform basic to complex drug-target ADMET or adverse effects analysis and predictive modeling. IDAAPM is freely accessible at http://idaapm.helsinki.fi and can be exploited through a KNIME workflow connected to the database.Graphical abstractFDA approved drug data integration for predictive modeling.

  15. The NIF DISCO Framework: facilitating automated integration of neuroscience content on the web.

    PubMed

    Marenco, Luis; Wang, Rixin; Shepherd, Gordon M; Miller, Perry L

    2010-06-01

    This paper describes the capabilities of DISCO, an extensible approach that supports integrative Web-based information dissemination. DISCO is a component of the Neuroscience Information Framework (NIF), an NIH Neuroscience Blueprint initiative that facilitates integrated access to diverse neuroscience resources via the Internet. DISCO facilitates the automated maintenance of several distinct capabilities using a collection of files 1) that are maintained locally by the developers of participating neuroscience resources and 2) that are "harvested" on a regular basis by a central DISCO server. This approach allows central NIF capabilities to be updated as each resource's content changes over time. DISCO currently supports the following capabilities: 1) resource descriptions, 2) "LinkOut" to a resource's data items from NCBI Entrez resources such as PubMed, 3) Web-based interoperation with a resource, 4) sharing a resource's lexicon and ontology, 5) sharing a resource's database schema, and 6) participation by the resource in neuroscience-related RSS news dissemination. The developers of a resource are free to choose which DISCO capabilities their resource will participate in. Although DISCO is used by NIF to facilitate neuroscience data integration, its capabilities have general applicability to other areas of research.

  16. Establishment of the Northeast Coastal Watershed Geospatial Data Network (NECWGDN)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hannigan, Robyn

    The goals of NECWGDN were to establish integrated geospatial databases that interfaced with existing open-source (water.html) environmental data server technologies (e.g., HydroDesktop) and included ecological and human data to enable evaluation, prediction, and adaptation in coastal environments to climate- and human-induced threats to the coastal marine resources within the Gulf of Maine. We have completed the development and testing of a "test bed" architecture that is compatible with HydroDesktop and have identified key metadata structures that will enable seamless integration and delivery of environmental, ecological, and human data as well as models to predict threats to end-users. Uniquely this databasemore » integrates point as well as model data and so offers capacities to end-users that are unique among databases. Future efforts will focus on the development of integrated environmental-human dimension models that can serve, in near real time, visualizations of threats to coastal resources and habitats.« less

  17. EuPathDB: the eukaryotic pathogen genomics database resource

    PubMed Central

    Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y.; Brestelli, John; Brunk, Brian P.; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S.; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C.; Lawrence, Cris; Li, Wei; Pinney, Deborah F.; Pulman, Jane A.; Roos, David S.; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J.; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie

    2017-01-01

    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions. PMID:27903906

  18. A dedicated database system for handling multi-level data in systems biology.

    PubMed

    Pornputtapong, Natapol; Wanichthanarak, Kwanjeera; Nilsson, Avlant; Nookaew, Intawat; Nielsen, Jens

    2014-01-01

    Advances in high-throughput technologies have enabled extensive generation of multi-level omics data. These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. This leads to difficulties in data accessibility and often results in errors when data are merged and integrated from varied resources. Therefore, integration and management of systems biological data remain very challenging. To overcome this, we designed and developed a dedicated database system that can serve and solve the vital issues in data management and hereby facilitate data integration, modeling and analysis in systems biology within a sole database. In addition, a yeast data repository was implemented as an integrated database environment which is operated by the database system. Two applications were implemented to demonstrate extensibility and utilization of the system. Both illustrate how the user can access the database via the web query function and implemented scripts. These scripts are specific for two sample cases: 1) Detecting the pheromone pathway in protein interaction networks; and 2) Finding metabolic reactions regulated by Snf1 kinase. In this study we present the design of database system which offers an extensible environment to efficiently capture the majority of biological entities and relations encountered in systems biology. Critical functions and control processes were designed and implemented to ensure consistent, efficient, secure and reliable transactions. The two sample cases on the yeast integrated data clearly demonstrate the value of a sole database environment for systems biology research.

  19. Groundwater modeling in integrated water resources management--visions for 2020.

    PubMed

    Refsgaard, Jens Christian; Højberg, Anker Lajer; Møller, Ingelise; Hansen, Martin; Søndergaard, Verner

    2010-01-01

    Groundwater modeling is undergoing a change from traditional stand-alone studies toward being an integrated part of holistic water resources management procedures. This is illustrated by the development in Denmark, where comprehensive national databases for geologic borehole data, groundwater-related geophysical data, geologic models, as well as a national groundwater-surface water model have been established and integrated to support water management. This has enhanced the benefits of using groundwater models. Based on insight gained from this Danish experience, a scientifically realistic scenario for the use of groundwater modeling in 2020 has been developed, in which groundwater models will be a part of sophisticated databases and modeling systems. The databases and numerical models will be seamlessly integrated, and the tasks of monitoring and modeling will be merged. Numerical models for atmospheric, surface water, and groundwater processes will be coupled in one integrated modeling system that can operate at a wide range of spatial scales. Furthermore, the management systems will be constructed with a focus on building credibility of model and data use among all stakeholders and on facilitating a learning process whereby data and models, as well as stakeholders' understanding of the system, are updated to currently available information. The key scientific challenges for achieving this are (1) developing new methodologies for integration of statistical and qualitative uncertainty; (2) mapping geological heterogeneity and developing scaling methodologies; (3) developing coupled model codes; and (4) developing integrated information systems, including quality assurance and uncertainty information that facilitate active stakeholder involvement and learning.

  20. Spending Patterns of Metropolitan Universities: A Longitudinal Study.

    ERIC Educational Resources Information Center

    Schuh, John H.

    2002-01-01

    Using data from the Integrated Post-secondary Education Data System relational database, assessed the financial resources and expenditure patterns of members of the Coalition of Urban and Metropolitan Universities and compared them to non-metropolitan peers. Verified that metropolitan institutions have fewer resources to work with than others in…

  1. USGS national surveys and analysis projects: Preliminary compilation of integrated geological datasets for the United States

    USGS Publications Warehouse

    Nicholson, Suzanne W.; Stoeser, Douglas B.; Wilson, Frederic H.; Dicken, Connie L.; Ludington, Steve

    2007-01-01

    The growth in the use of Geographic nformation Systems (GS) has highlighted the need for regional and national digital geologic maps attributed with age and rock type information. Such spatial data can be conveniently used to generate derivative maps for purposes that include mineral-resource assessment, metallogenic studies, tectonic studies, human health and environmental research. n 1997, the United States Geological Survey’s Mineral Resources Program initiated an effort to develop national digital databases for use in mineral resource and environmental assessments. One primary activity of this effort was to compile a national digital geologic map database, utilizing state geologic maps, to support mineral resource studies in the range of 1:250,000- to 1:1,000,000-scale. Over the course of the past decade, state databases were prepared using a common standard for the database structure, fields, attributes, and data dictionaries. As of late 2006, standardized geological map databases for all conterminous (CONUS) states have been available on-line as USGS Open-File Reports. For Alaska and Hawaii, new state maps are being prepared, and the preliminary work for Alaska is being released as a series of 1:500,000-scale regional compilations. See below for a list of all published databases.

  2. Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model.

    PubMed

    Reiser, Leonore; Berardini, Tanya Z; Li, Donghui; Muller, Robert; Strait, Emily M; Li, Qian; Mezheritsky, Yarik; Vetushko, Andrey; Huala, Eva

    2016-01-01

    Databases and data repositories provide essential functions for the research community by integrating, curating, archiving and otherwise packaging data to facilitate discovery and reuse. Despite their importance, funding for maintenance of these resources is increasingly hard to obtain. Fueled by a desire to find long term, sustainable solutions to database funding, staff from the Arabidopsis Information Resource (TAIR), founded the nonprofit organization, Phoenix Bioinformatics, using TAIR as a test case for user-based funding. Subscription-based funding has been proposed as an alternative to grant funding but its application has been very limited within the nonprofit sector. Our testing of this model indicates that it is a viable option, at least for some databases, and that it is possible to strike a balance that maximizes access while still incentivizing subscriptions. One year after transitioning to subscription support, TAIR is self-sustaining and Phoenix is poised to expand and support additional resources that wish to incorporate user-based funding strategies. Database URL: www.arabidopsis.org. © The Author(s) 2016. Published by Oxford University Press.

  3. Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model

    PubMed Central

    Berardini, Tanya Z.; Li, Donghui; Muller, Robert; Strait, Emily M.; Li, Qian; Mezheritsky, Yarik; Vetushko, Andrey; Huala, Eva

    2016-01-01

    Databases and data repositories provide essential functions for the research community by integrating, curating, archiving and otherwise packaging data to facilitate discovery and reuse. Despite their importance, funding for maintenance of these resources is increasingly hard to obtain. Fueled by a desire to find long term, sustainable solutions to database funding, staff from the Arabidopsis Information Resource (TAIR), founded the nonprofit organization, Phoenix Bioinformatics, using TAIR as a test case for user-based funding. Subscription-based funding has been proposed as an alternative to grant funding but its application has been very limited within the nonprofit sector. Our testing of this model indicates that it is a viable option, at least for some databases, and that it is possible to strike a balance that maximizes access while still incentivizing subscriptions. One year after transitioning to subscription support, TAIR is self-sustaining and Phoenix is poised to expand and support additional resources that wish to incorporate user-based funding strategies. Database URL: www.arabidopsis.org PMID:26989150

  4. A semantic problem solving environment for integrative parasite research: identification of intervention targets for Trypanosoma cruzi.

    PubMed

    Parikh, Priti P; Minning, Todd A; Nguyen, Vinh; Lalithsena, Sarasi; Asiaee, Amir H; Sahoo, Satya S; Doshi, Prashant; Tarleton, Rick; Sheth, Amit P

    2012-01-01

    Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge. We developed a semantic problem-solving environment (SPSE) that uses ontologies to integrate internal lab data with external resources in a Parasite Knowledge Base (PKB), which has the ability to query across these resources in a unified manner. The SPSE includes Web Ontology Language (OWL)-based ontologies, experimental data with its provenance information represented using the Resource Description Format (RDF), and a visual querying tool, Cuebee, that features integrated use of Web services. We demonstrate the use and benefit of SPSE using example queries for identifying gene knockout targets of Trypanosoma cruzi for vaccine development. Answers to these queries involve looking up multiple sources of data, linking them together and presenting the results. The SPSE facilitates parasitologists in leveraging the growing, but disparate, parasite data resources by offering an integrative platform that utilizes Semantic Web techniques, while keeping their workload increase minimal.

  5. Preliminary Integrated Geologic Map Databases for the United States: Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, Rhode Island and Vermont

    USGS Publications Warehouse

    Nicholson, Suzanne W.; Dicken, Connie L.; Horton, John D.; Foose, Michael P.; Mueller, Julia A.L.; Hon, Rudi

    2006-01-01

    The rapid growth in the use of Geographic Information Systems (GIS) has highlighted the need for regional and national scale digital geologic maps that have standardized information about geologic age and lithology. Such maps can be conveniently used to generate derivative maps for manifold special purposes such as mineral-resource assessment, metallogenic studies, tectonic studies, and environmental research. Although two digital geologic maps (Schruben and others, 1994; Reed and Bush, 2004) of the United States currently exist, their scales (1:2,500,000 and 1:5,000,000) are too general for many regional applications. Most states have digital geologic maps at scales of about 1:500,000, but the databases are not comparably structured and, thus, it is difficult to use the digital database for more than one state at a time. This report describes the result for a seven state region of an effort by the U.S. Geological Survey to produce a series of integrated and standardized state geologic map databases that cover the entire United States. In 1997, the United States Geological Survey's Mineral Resources Program initiated the National Surveys and Analysis (NSA) Project to develop national digital databases. One primary activity of this project was to compile a national digital geologic map database, utilizing state geologic maps, to support studies in the range of 1:250,000- to 1:1,000,000-scale. To accomplish this, state databases were prepared using a common standard for the database structure, fields, attribution, and data dictionaries. For Alaska and Hawaii new state maps are being prepared and the preliminary work for Alaska is being released as a series of 1:250,000 scale quadrangle reports. This document provides background information and documentation for the integrated geologic map databases of this report. This report is one of a series of such reports releasing preliminary standardized geologic map databases for the United States. The data products of the project consist of two main parts, the spatial databases and a set of supplemental tables relating to geologic map units. The datasets serve as a data resource to generate a variety of stratigraphic, age, and lithologic maps. This documentation is divided into four main sections: (1) description of the set of data files provided in this report, (2) specifications of the spatial databases, (3) specifications of the supplemental tables, and (4) an appendix containing the data dictionaries used to populate some fields of the spatial database and supplemental tables.

  6. Semantic SenseLab: implementing the vision of the Semantic Web in neuroscience

    PubMed Central

    Samwald, Matthias; Chen, Huajun; Ruttenberg, Alan; Lim, Ernest; Marenco, Luis; Miller, Perry; Shepherd, Gordon; Cheung, Kei-Hoi

    2011-01-01

    Summary Objective Integrative neuroscience research needs a scalable informatics framework that enables semantic integration of diverse types of neuroscience data. This paper describes the use of the Web Ontology Language (OWL) and other Semantic Web technologies for the representation and integration of molecular-level data provided by several of SenseLab suite of neuroscience databases. Methods Based on the original database structure, we semi-automatically translated the databases into OWL ontologies with manual addition of semantic enrichment. The SenseLab ontologies are extensively linked to other biomedical Semantic Web resources, including the Subcellular Anatomy Ontology, Brain Architecture Management System, the Gene Ontology, BIRNLex and UniProt. The SenseLab ontologies have also been mapped to the Basic Formal Ontology and Relation Ontology, which helps ease interoperability with many other existing and future biomedical ontologies for the Semantic Web. In addition, approaches to representing contradictory research statements are described. The SenseLab ontologies are designed for use on the Semantic Web that enables their integration into a growing collection of biomedical information resources. Conclusion We demonstrate that our approach can yield significant potential benefits and that the Semantic Web is rapidly becoming mature enough to realize its anticipated promises. The ontologies are available online at http://neuroweb.med.yale.edu/senselab/ PMID:20006477

  7. BIOZON: a system for unification, management and analysis of heterogeneous biological data.

    PubMed

    Birkland, Aaron; Yona, Golan

    2006-02-15

    Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. Here we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated. The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at http://biozon.org.

  8. Semantic SenseLab: Implementing the vision of the Semantic Web in neuroscience.

    PubMed

    Samwald, Matthias; Chen, Huajun; Ruttenberg, Alan; Lim, Ernest; Marenco, Luis; Miller, Perry; Shepherd, Gordon; Cheung, Kei-Hoi

    2010-01-01

    Integrative neuroscience research needs a scalable informatics framework that enables semantic integration of diverse types of neuroscience data. This paper describes the use of the Web Ontology Language (OWL) and other Semantic Web technologies for the representation and integration of molecular-level data provided by several of SenseLab suite of neuroscience databases. Based on the original database structure, we semi-automatically translated the databases into OWL ontologies with manual addition of semantic enrichment. The SenseLab ontologies are extensively linked to other biomedical Semantic Web resources, including the Subcellular Anatomy Ontology, Brain Architecture Management System, the Gene Ontology, BIRNLex and UniProt. The SenseLab ontologies have also been mapped to the Basic Formal Ontology and Relation Ontology, which helps ease interoperability with many other existing and future biomedical ontologies for the Semantic Web. In addition, approaches to representing contradictory research statements are described. The SenseLab ontologies are designed for use on the Semantic Web that enables their integration into a growing collection of biomedical information resources. We demonstrate that our approach can yield significant potential benefits and that the Semantic Web is rapidly becoming mature enough to realize its anticipated promises. The ontologies are available online at http://neuroweb.med.yale.edu/senselab/. 2009 Elsevier B.V. All rights reserved.

  9. RNAcentral: A comprehensive database of non-coding RNA sequences

    DOE PAGES

    Williams, Kelly Porter; Lau, Britney Yan

    2016-10-28

    RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. Furthermore, the website has been subject to continuous improvements focusing on text and sequence similaritymore » searches as well as genome browsing functionality.« less

  10. RNAcentral: A comprehensive database of non-coding RNA sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, Kelly Porter; Lau, Britney Yan

    RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. Furthermore, the website has been subject to continuous improvements focusing on text and sequence similaritymore » searches as well as genome browsing functionality.« less

  11. Second-Tier Database for Ecosystem Focus, 2002-2003 Annual Report.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Van Holmes, Chris; Muongchanh, Christine; Anderson, James J.

    2003-11-01

    The Second-Tier Database for Ecosystem Focus (Contract 00004124) provides direct and timely public access to Columbia Basin environmental, operational, fishery and riverine data resources for federal, state, public and private entities. The Second-Tier Database known as Data Access in Realtime (DART) integrates public data for effective access, consideration and application. DART also provides analysis tools and performance measures helpful in evaluating the condition of Columbia Basin salmonid stocks.

  12. Graph-Based Weakly-Supervised Methods for Information Extraction & Integration

    ERIC Educational Resources Information Center

    Talukdar, Partha Pratim

    2010-01-01

    The variety and complexity of potentially-related data resources available for querying--webpages, databases, data warehouses--has been growing ever more rapidly. There is a growing need to pose integrative queries "across" multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse…

  13. CHOmine: an integrated data warehouse for CHO systems biology and modeling.

    PubMed

    Gerstl, Matthias P; Hanscho, Michael; Ruckerbauer, David E; Zanghellini, Jürgen; Borth, Nicole

    2017-01-01

    The last decade has seen a surge in published genome-scale information for Chinese hamster ovary (CHO) cells, which are the main production vehicles for therapeutic proteins. While a single access point is available at www.CHOgenome.org, the primary data is distributed over several databases at different institutions. Currently research is frequently hampered by a plethora of gene names and IDs that vary between published draft genomes and databases making systems biology analyses cumbersome and elaborate. Here we present CHOmine, an integrative data warehouse connecting data from various databases and links to other ones. Furthermore, we introduce CHOmodel, a web based resource that provides access to recently published CHO cell line specific metabolic reconstructions. Both resources allow to query CHO relevant data, find interconnections between different types of data and thus provides a simple, standardized entry point to the world of CHO systems biology. http://www.chogenome.org. © The Author(s) 2017. Published by Oxford University Press.

  14. Social Gerontology--Integrative and Territorial Aspects: A Citation Analysis of Subject Scatter and Database Coverage

    ERIC Educational Resources Information Center

    Lasda Bergman, Elaine M.

    2011-01-01

    To determine the mix of resources used in social gerontology research, a citation analysis was conducted. A representative sample of citations was selected from three prominent gerontology journals and information was added to determine subject scatter and database coverage for the cited materials. Results indicate that a significant portion of…

  15. RAID v2.0: an updated resource of RNA-associated interactions across organisms.

    PubMed

    Yi, Ying; Zhao, Yue; Li, Chunhua; Zhang, Lin; Huang, Huiying; Li, Yana; Liu, Lanlan; Hou, Ping; Cui, Tianyu; Tan, Puwen; Hu, Yongfei; Zhang, Ting; Huang, Yan; Li, Xiaobo; Yu, Jia; Wang, Dong

    2017-01-04

    With the development of biotechnologies and computational prediction algorithms, the number of experimental and computational prediction RNA-associated interactions has grown rapidly in recent years. However, diverse RNA-associated interactions are scattered over a wide variety of resources and organisms, whereas a fully comprehensive view of diverse RNA-associated interactions is still not available for any species. Hence, we have updated the RAID database to version 2.0 (RAID v2.0, www.rna-society.org/raid/) by integrating experimental and computational prediction interactions from manually reading literature and other database resources under one common framework. The new developments in RAID v2.0 include (i) over 850-fold RNA-associated interactions, an enhancement compared to the previous version; (ii) numerous resources integrated with experimental or computational prediction evidence for each RNA-associated interaction; (iii) a reliability assessment for each RNA-associated interaction based on an integrative confidence score; and (iv) an increase of species coverage to 60. Consequently, RAID v2.0 recruits more than 5.27 million RNA-associated interactions, including more than 4 million RNA-RNA interactions and more than 1.2 million RNA-protein interactions, referring to nearly 130 000 RNA/protein symbols across 60 species. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Design research about coastal zone planning and management information system based on GIS and database technologies

    NASA Astrophysics Data System (ADS)

    Huang, Pei; Wu, Sangyun; Feng, Aiping; Guo, Yacheng

    2008-10-01

    As littoral areas in possession of concentrated population, abundant resources, developed industry and active economy, the coastal areas are bound to become the forward positions and supported regions for marine exploitation. In the 21st century, the pressure that coastal zones are faced with is as follows: growth of population and urbanization, rise of sea level and coastal erosion, shortage of freshwater resource and deterioration of water resource, and degradation of fishery resource and so on. So the resources of coastal zones should be programmed and used reasonably for the sustainable development of economy and environment. This paper proposes a design research on the construction of coastal zone planning and management information system based on GIS and database technologies. According to this system, the planning results of coastal zones could be queried and displayed expediently through the system interface. It is concluded that the integrated application of GIS and database technologies provides a new modern method for the management of coastal zone resources, and makes it possible to ensure the rational development and utilization of the coastal zone resources, along with the sustainable development of economy and environment.

  17. Autophagy Regulatory Network - a systems-level bioinformatics resource for studying the mechanism and regulation of autophagy.

    PubMed

    Türei, Dénes; Földvári-Nagy, László; Fazekas, Dávid; Módos, Dezső; Kubisch, János; Kadlecsik, Tamás; Demeter, Amanda; Lenti, Katalin; Csermely, Péter; Vellai, Tibor; Korcsmáros, Tamás

    2015-01-01

    Autophagy is a complex cellular process having multiple roles, depending on tissue, physiological, or pathological conditions. Major post-translational regulators of autophagy are well known, however, they have not yet been collected comprehensively. The precise and context-dependent regulation of autophagy necessitates additional regulators, including transcriptional and post-transcriptional components that are listed in various datasets. Prompted by the lack of systems-level autophagy-related information, we manually collected the literature and integrated external resources to gain a high coverage autophagy database. We developed an online resource, Autophagy Regulatory Network (ARN; http://autophagy-regulation.org), to provide an integrated and systems-level database for autophagy research. ARN contains manually curated, imported, and predicted interactions of autophagy components (1,485 proteins with 4,013 interactions) in humans. We listed 413 transcription factors and 386 miRNAs that could regulate autophagy components or their protein regulators. We also connected the above-mentioned autophagy components and regulators with signaling pathways from the SignaLink 2 resource. The user-friendly website of ARN allows researchers without computational background to search, browse, and download the database. The database can be downloaded in SQL, CSV, BioPAX, SBML, PSI-MI, and in a Cytoscape CYS file formats. ARN has the potential to facilitate the experimental validation of novel autophagy components and regulators. In addition, ARN helps the investigation of transcription factors, miRNAs and signaling pathways implicated in the control of the autophagic pathway. The list of such known and predicted regulators could be important in pharmacological attempts against cancer and neurodegenerative diseases.

  18. Integrating In Silico Resources to Map a Signaling Network

    PubMed Central

    Liu, Hanqing; Beck, Tim N.; Golemis, Erica A.; Serebriiskii, Ilya G.

    2013-01-01

    The abundance of publicly available life science databases offer a wealth of information that can support interpretation of experimentally derived data and greatly enhance hypothesis generation. Protein interaction and functional networks are not simply new renditions of existing data: they provide the opportunity to gain insights into the specific physical and functional role a protein plays as part of the biological system. In this chapter, we describe different in silico tools that can quickly and conveniently retrieve data from existing data repositories and discuss how the available tools are best utilized for different purposes. While emphasizing protein-protein interaction databases (e.g., BioGrid and IntAct), we also introduce metasearch platforms such as STRING and GeneMANIA, pathway databases (e.g., BioCarta and Pathway Commons), text mining approaches (e.g., PubMed and Chilibot), and resources for drug-protein interactions, genetic information for model organisms and gene expression information based on microarray data mining. Furthermore, we provide a simple step-by-step protocol to building customized protein-protein interaction networks in Cytoscape, a powerful network assembly and visualization program, integrating data retrieved from these various databases. As we illustrate, generation of composite interaction networks enables investigators to extract significantly more information about a given biological system than utilization of a single database or sole reliance on primary literature. PMID:24233784

  19. Emissions & Generation Resource Integrated Database (eGRID), eGRID2002 (with years 1996 - 2000 data)

    EPA Pesticide Factsheets

    The Emissions & Generation Resource Integrated Database (eGRID) is a comprehensive source of data on the environmental characteristics of almost all electric power generated in the United States. These environmental characteristics include air emissions for nitrogen oxides, sulfur dioxide, carbon dioxide, methane, nitrous oxide, and mercury; emissions rates; net generation; resource mix; and many other attributes. eGRID2002 (years 1996 through 2000 data) contains 16 Excel spreadsheets and the Technical Support Document, as well as the eGRID Data Browser, User's Manual, and Readme file. Archived eGRID data can be viewed as spreadsheets or by using the eGRID Data Browser. The eGRID spreadsheets can be manipulated by data users and enables users to view all the data underlying eGRID. The eGRID Data Browser enables users to view key data using powerful search features. Note that the eGRID Data Browser will not run on a Mac-based machine without Windows emulation.

  20. Comprehensive, comprehensible, distributed and intelligent databases: current status.

    PubMed

    Frishman, D; Heumann, K; Lesk, A; Mewes, H W

    1998-01-01

    It is only a matter of time until a user will see not many but one integrated database of information for molecular biology. Is this true? Is it a good thing? Why will it happen? Where are we now? What developments are fostering and what developments are impeding progress towards this end? A list of WWW resources devoted to database issues in molecular biology is available at http://www.mips.biochem.mpg.de frishman@mips.biochem.mpg.de

  1. Application GIS on university planning: building a spatial database aided spatial decision

    NASA Astrophysics Data System (ADS)

    Miao, Lei; Wu, Xiaofang; Wang, Kun; Nong, Yu

    2007-06-01

    With the development of university and its size enlarging, kinds of resource need to effective management urgently. Spacial database is the right tool to assist administrator's spatial decision. And it's ready for digital campus with integrating existing OMS. It's researched about the campus planning in detail firstly. Following instanced by south china agriculture university it is practiced that how to build the geographic database of the campus building and house for university administrator's spatial decision.

  2. The NIF DISCO Framework: Facilitating Automated Integration of Neuroscience Content on the Web

    PubMed Central

    Marenco, Luis; Wang, Rixin; Shepherd, Gordon M.; Miller, Perry L.

    2013-01-01

    This paper describes the capabilities of DISCO, an extensible approach that supports integrative Web-based information dissemination. DISCO is a component of the Neuroscience Information Framework (NIF), an NIH Neuroscience Blueprint initiative that facilitates integrated access to diverse neuroscience resources via the Internet. DISCO facilitates the automated maintenance of several distinct capabilities using a collection of files 1) that are maintained locally by the developers of participating neuroscience resources and 2) that are “harvested” on a regular basis by a central DISCO server. This approach allows central NIF capabilities to be updated as each resource’s content changes over time. DISCO currently supports the following capabilities: 1) resource descriptions, 2) “LinkOut” to a resource’s data items from NCBI Entrez resources such as PubMed, 3) Web-based interoperation with a resource, 4) sharing a resource’s lexicon and ontology, 5) sharing a resource’s database schema, and 6) participation by the resource in neuroscience-related RSS news dissemination. The developers of a resource are free to choose which DISCO capabilities their resource will participate in. Although DISCO is used by NIF to facilitate neuroscience data integration, its capabilities have general applicability to other areas of research. PMID:20387131

  3. YAdumper: extracting and translating large information volumes from relational databases to structured flat files.

    PubMed

    Fernández, José M; Valencia, Alfonso

    2004-10-12

    Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.

  4. ExPASy: SIB bioinformatics resource portal.

    PubMed

    Artimo, Panu; Jonnalagedda, Manohar; Arnold, Konstantin; Baratin, Delphine; Csardi, Gabor; de Castro, Edouard; Duvaud, Séverine; Flegel, Volker; Fortier, Arnaud; Gasteiger, Elisabeth; Grosdidier, Aurélien; Hernandez, Céline; Ioannidis, Vassilios; Kuznetsov, Dmitry; Liechti, Robin; Moretti, Sébastien; Mostaguir, Khaled; Redaschi, Nicole; Rossier, Grégoire; Xenarios, Ioannis; Stockinger, Heinz

    2012-07-01

    ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a 'decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across 'selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPASy.

  5. Hermes, the Information Messenger, Integrating Information Services and Delivering Them to the End User.

    ERIC Educational Resources Information Center

    Coello-Coutino, Gerardo; Ainsworth, Shirley; Escalante-Gonzalbo, Ana Marie

    2002-01-01

    Describes Hermes, a research tool that uses specially designed acquisition, parsing and presentation methods to integrate information resources on the Internet, from searching in disparate bibliographic databases, to accessing full text articles online, and developing a web of information associated with each reference via one common interface.…

  6. Network-based drug discovery by integrating systems biology and computational technologies

    PubMed Central

    Leung, Elaine L.; Cao, Zhi-Wei; Jiang, Zhi-Hong; Zhou, Hua

    2013-01-01

    Network-based intervention has been a trend of curing systemic diseases, but it relies on regimen optimization and valid multi-target actions of the drugs. The complex multi-component nature of medicinal herbs may serve as valuable resources for network-based multi-target drug discovery due to its potential treatment effects by synergy. Recently, robustness of multiple systems biology platforms shows powerful to uncover molecular mechanisms and connections between the drugs and their targeting dynamic network. However, optimization methods of drug combination are insufficient, owning to lacking of tighter integration across multiple ‘-omics’ databases. The newly developed algorithm- or network-based computational models can tightly integrate ‘-omics’ databases and optimize combinational regimens of drug development, which encourage using medicinal herbs to develop into new wave of network-based multi-target drugs. However, challenges on further integration across the databases of medicinal herbs with multiple system biology platforms for multi-target drug optimization remain to the uncertain reliability of individual data sets, width and depth and degree of standardization of herbal medicine. Standardization of the methodology and terminology of multiple system biology and herbal database would facilitate the integration. Enhance public accessible databases and the number of research using system biology platform on herbal medicine would be helpful. Further integration across various ‘-omics’ platforms and computational tools would accelerate development of network-based drug discovery and network medicine. PMID:22877768

  7. Integration of Evidence Base into a Probabilistic Risk Assessment

    NASA Technical Reports Server (NTRS)

    Saile, Lyn; Lopez, Vilma; Bickham, Grandin; Kerstman, Eric; FreiredeCarvalho, Mary; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei

    2011-01-01

    INTRODUCTION: A probabilistic decision support model such as the Integrated Medical Model (IMM) utilizes an immense amount of input data that necessitates a systematic, integrated approach for data collection, and management. As a result of this approach, IMM is able to forecasts medical events, resource utilization and crew health during space flight. METHODS: Inflight data is the most desirable input for the Integrated Medical Model. Non-attributable inflight data is collected from the Lifetime Surveillance for Astronaut Health study as well as the engineers, flight surgeons, and astronauts themselves. When inflight data is unavailable cohort studies, other models and Bayesian analyses are used, in addition to subject matters experts input on occasion. To determine the quality of evidence of a medical condition, the data source is categorized and assigned a level of evidence from 1-5; the highest level is one. The collected data reside and are managed in a relational SQL database with a web-based interface for data entry and review. The database is also capable of interfacing with outside applications which expands capabilities within the database itself. Via the public interface, customers can access a formatted Clinical Findings Form (CLiFF) that outlines the model input and evidence base for each medical condition. Changes to the database are tracked using a documented Configuration Management process. DISSCUSSION: This strategic approach provides a comprehensive data management plan for IMM. The IMM Database s structure and architecture has proven to support additional usages. As seen by the resources utilization across medical conditions analysis. In addition, the IMM Database s web-based interface provides a user-friendly format for customers to browse and download the clinical information for medical conditions. It is this type of functionality that will provide Exploratory Medicine Capabilities the evidence base for their medical condition list. CONCLUSION: The IMM Database in junction with the IMM is helping NASA aerospace program improve the health care and reduce risk for the astronauts crew. Both the database and model will continue to expand to meet customer needs through its multi-disciplinary evidence based approach to managing data. Future expansion could serve as a platform for a Space Medicine Wiki of medical conditions.

  8. MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome.

    PubMed

    Wang, Julia; Al-Ouran, Rami; Hu, Yanhui; Kim, Seon-Young; Wan, Ying-Wooi; Wangler, Michael F; Yamamoto, Shinya; Chao, Hsiao-Tuan; Comjean, Aram; Mohr, Stephanie E; Perrimon, Norbert; Liu, Zhandong; Bellen, Hugo J

    2017-06-01

    One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  9. Coastal resource and sensitivity mapping of Vietnam

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Odin, L.M.

    1997-08-01

    This paper describes a project to establish a relationship between environmental sensitivity (primarily to oil pollution) and response planning and prevention priorities for Vietnamese coastal regions. An inventory of coastal environmental sensitivity and the creation of index mapping was performed. Satellite and geographical information system data were integrated and used for database creation. The database was used to create a coastal resource map, coastal sensitivity map, and a field inventory base map. The final coastal environment sensitivity classification showed that almost 40 percent of the 7448 km of mapped shoreline has a high to medium high sensitivity to oil pollution.

  10. Towards Semantic e-Science for Traditional Chinese Medicine

    PubMed Central

    Chen, Huajun; Mao, Yuxin; Zheng, Xiaoqing; Cui, Meng; Feng, Yi; Deng, Shuiguang; Yin, Aining; Zhou, Chunying; Tang, Jinming; Jiang, Xiaohong; Wu, Zhaohui

    2007-01-01

    Background Recent advances in Web and information technologies with the increasing decentralization of organizational structures have resulted in massive amounts of information resources and domain-specific services in Traditional Chinese Medicine. The massive volume and diversity of information and services available have made it difficult to achieve seamless and interoperable e-Science for knowledge-intensive disciplines like TCM. Therefore, information integration and service coordination are two major challenges in e-Science for TCM. We still lack sophisticated approaches to integrate scientific data and services for TCM e-Science. Results We present a comprehensive approach to build dynamic and extendable e-Science applications for knowledge-intensive disciplines like TCM based on semantic and knowledge-based techniques. The semantic e-Science infrastructure for TCM supports large-scale database integration and service coordination in a virtual organization. We use domain ontologies to integrate TCM database resources and services in a semantic cyberspace and deliver a semantically superior experience including browsing, searching, querying and knowledge discovering to users. We have developed a collection of semantic-based toolkits to facilitate TCM scientists and researchers in information sharing and collaborative research. Conclusion Semantic and knowledge-based techniques are suitable to knowledge-intensive disciplines like TCM. It's possible to build on-demand e-Science system for TCM based on existing semantic and knowledge-based techniques. The presented approach in the paper integrates heterogeneous distributed TCM databases and services, and provides scientists with semantically superior experience to support collaborative research in TCM discipline. PMID:17493289

  11. NCBI Bookshelf: books and documents in life sciences and health care

    PubMed Central

    Hoeppner, Marilu A.

    2013-01-01

    Bookshelf (http://www.ncbi.nlm.nih.gov/books/) is a full-text electronic literature resource of books and documents in life sciences and health care at the National Center for Biotechnology Information (NCBI). Created in 1999 with a single book as an encyclopedic reference for resources such as PubMed and GenBank, it has grown to its current size of >1300 titles. Unlike other NCBI databases, such as GenBank and Gene, which have a strict data structure, books come in all forms; they are diverse in publication types, formats, sizes and authoring models. The Bookshelf data format is XML tagged in the NCBI Book DTD (Document Type Definition), modeled after the National Library of Medicine journal article DTDs. The book DTD has been used for systematically tagging the diverse data formats of books, a move that has set the foundation for the growth of this resource. Books at NCBI followed the route of journal articles in the PubMed Central project, using the PubMed Central architectural framework, workflows and processes. Through integration with other NCBI molecular databases, books at NCBI can be used to provide reference information for biological data and facilitate its discovery. This article describes Bookshelf at NCBI: its growth, data handling and retrieval and integration with molecular databases. PMID:23203889

  12. NCBI Bookshelf: books and documents in life sciences and health care.

    PubMed

    Hoeppner, Marilu A

    2013-01-01

    Bookshelf (http://www.ncbi.nlm.nih.gov/books/) is a full-text electronic literature resource of books and documents in life sciences and health care at the National Center for Biotechnology Information (NCBI). Created in 1999 with a single book as an encyclopedic reference for resources such as PubMed and GenBank, it has grown to its current size of >1300 titles. Unlike other NCBI databases, such as GenBank and Gene, which have a strict data structure, books come in all forms; they are diverse in publication types, formats, sizes and authoring models. The Bookshelf data format is XML tagged in the NCBI Book DTD (Document Type Definition), modeled after the National Library of Medicine journal article DTDs. The book DTD has been used for systematically tagging the diverse data formats of books, a move that has set the foundation for the growth of this resource. Books at NCBI followed the route of journal articles in the PubMed Central project, using the PubMed Central architectural framework, workflows and processes. Through integration with other NCBI molecular databases, books at NCBI can be used to provide reference information for biological data and facilitate its discovery. This article describes Bookshelf at NCBI: its growth, data handling and retrieval and integration with molecular databases.

  13. TDR Targets: a chemogenomics resource for neglected diseases.

    PubMed

    Magariños, María P; Carmona, Santiago J; Crowther, Gregory J; Ralph, Stuart A; Roos, David S; Shanmugam, Dhanasekaran; Van Voorhis, Wesley C; Agüero, Fernán

    2012-01-01

    The TDR Targets Database (http://tdrtargets.org) has been designed and developed as an online resource to facilitate the rapid identification and prioritization of molecular targets for drug development, focusing on pathogens responsible for neglected human diseases. The database integrates pathogen specific genomic information with functional data (e.g. expression, phylogeny, essentiality) for genes collected from various sources, including literature curation. This information can be browsed and queried using an extensive web interface with functionalities for combining, saving, exporting and sharing the query results. Target genes can be ranked and prioritized using numerical weights assigned to the criteria used for querying. In this report we describe recent updates to the TDR Targets database, including the addition of new genomes (specifically helminths), and integration of chemical structure, property and bioactivity information for biological ligands, drugs and inhibitors and cheminformatic tools for querying and visualizing these chemical data. These changes greatly facilitate exploration of linkages (both known and predicted) between genes and small molecules, yielding insight into whether particular proteins may be druggable, effectively allowing the navigation of chemical space in a genomics context.

  14. TDR Targets: a chemogenomics resource for neglected diseases

    PubMed Central

    Magariños, María P.; Carmona, Santiago J.; Crowther, Gregory J.; Ralph, Stuart A.; Roos, David S.; Shanmugam, Dhanasekaran; Van Voorhis, Wesley C.; Agüero, Fernán

    2012-01-01

    The TDR Targets Database (http://tdrtargets.org) has been designed and developed as an online resource to facilitate the rapid identification and prioritization of molecular targets for drug development, focusing on pathogens responsible for neglected human diseases. The database integrates pathogen specific genomic information with functional data (e.g. expression, phylogeny, essentiality) for genes collected from various sources, including literature curation. This information can be browsed and queried using an extensive web interface with functionalities for combining, saving, exporting and sharing the query results. Target genes can be ranked and prioritized using numerical weights assigned to the criteria used for querying. In this report we describe recent updates to the TDR Targets database, including the addition of new genomes (specifically helminths), and integration of chemical structure, property and bioactivity information for biological ligands, drugs and inhibitors and cheminformatic tools for querying and visualizing these chemical data. These changes greatly facilitate exploration of linkages (both known and predicted) between genes and small molecules, yielding insight into whether particular proteins may be druggable, effectively allowing the navigation of chemical space in a genomics context. PMID:22116064

  15. BioMart Central Portal: an open database network for the biological community

    PubMed Central

    Guberman, Jonathan M.; Ai, J.; Arnaiz, O.; Baran, Joachim; Blake, Andrew; Baldock, Richard; Chelala, Claude; Croft, David; Cros, Anthony; Cutts, Rosalind J.; Di Génova, A.; Forbes, Simon; Fujisawa, T.; Gadaleta, E.; Goodstein, D. M.; Gundem, Gunes; Haggarty, Bernard; Haider, Syed; Hall, Matthew; Harris, Todd; Haw, Robin; Hu, S.; Hubbard, Simon; Hsu, Jack; Iyer, Vivek; Jones, Philip; Katayama, Toshiaki; Kinsella, R.; Kong, Lei; Lawson, Daniel; Liang, Yong; Lopez-Bigas, Nuria; Luo, J.; Lush, Michael; Mason, Jeremy; Moreews, Francois; Ndegwa, Nelson; Oakley, Darren; Perez-Llamas, Christian; Primig, Michael; Rivkin, Elena; Rosanoff, S.; Shepherd, Rebecca; Simon, Reinhard; Skarnes, B.; Smedley, Damian; Sperling, Linda; Spooner, William; Stevenson, Peter; Stone, Kevin; Teague, J.; Wang, Jun; Wang, Jianxin; Whitty, Brett; Wong, D. T.; Wong-Erasmus, Marie; Yao, L.; Youens-Clark, Ken; Yung, Christina; Zhang, Junjun; Kasprzyk, Arek

    2011-01-01

    BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities. Database URL: http://central.biomart.org. PMID:21930507

  16. Gramene database in 2010: updates and extensions.

    PubMed

    Youens-Clark, Ken; Buckler, Ed; Casstevens, Terry; Chen, Charles; Declerck, Genevieve; Derwent, Paul; Dharmawardhana, Palitha; Jaiswal, Pankaj; Kersey, Paul; Karthikeyan, A S; Lu, Jerry; McCouch, Susan R; Ren, Liya; Spooner, William; Stein, Joshua C; Thomason, Jim; Wei, Sharon; Ware, Doreen

    2011-01-01

    Now in its 10th year, the Gramene database (http://www.gramene.org) has grown from its primary focus on rice, the first fully-sequenced grass genome, to become a resource for major model and crop plants including Arabidopsis, Brachypodium, maize, sorghum, poplar and grape in addition to several species of rice. Gramene began with the addition of an Ensembl genome browser and has expanded in the last decade to become a robust resource for plant genomics hosting a wide array of data sets including quantitative trait loci (QTL), metabolic pathways, genetic diversity, genes, proteins, germplasm, literature, ontologies and a fully-structured markers and sequences database integrated with genome browsers and maps from various published studies (genetic, physical, bin, etc.). In addition, Gramene now hosts a variety of web services including a Distributed Annotation Server (DAS), BLAST and a public MySQL database. Twice a year, Gramene releases a major build of the database and makes interim releases to correct errors or to make important updates to software and/or data.

  17. An Innovative Infrastructure with a Universal Geo-spatiotemporal Data Representation Supporting Cost-effective Integration of Diverse Earth Science Data

    NASA Astrophysics Data System (ADS)

    Kuo, K. S.; Rilee, M. L.

    2017-12-01

    Existing pathways for bringing together massive, diverse Earth Science datasets for integrated analyses burden end users with data packaging and management details irrelevant to their domain goals. The major data repositories focus on archival, discovery, and dissemination of products (files) in a standardized manner. End-users must download and then adapt these files using local resources and custom methods before analysis can proceed. This reduces scientific or other domain productivity, as scarce resources and expertise must be diverted to data processing. The Spatio-Temporal Adaptive Resolution Encoding (STARE) is a unifying scheme encoding geospatial and temporal information for organizing data on scalable computing/storage resources, minimizing expensive data transfers. STARE provides a compact representation that turns set-logic functions, e.g. conditional subsetting, into integer operations, that takes into account representative spatiotemporal resolutions of the data in the datasets, which is needed for data placement alignment of geo-spatiotemporally diverse data on massive parallel resources. Automating important scientific functions (e.g. regridding) and computational functions (e.g. data placement) allows scientists to focus on domain specific questions instead of expending their expertise on data processing. While STARE is not tied to any particular computing technology, we have used STARE for visualization and the SciDB array database to analyze Earth Science data on a 28-node compute cluster. STARE's automatic data placement and coupling of geometric and array indexing allows complicated data comparisons to be realized as straightforward database operations like "join." With STARE-enabled automation, SciDB+STARE provides a database interface, reducing costly data preparation, increasing the volume and variety of integrable data, and easing result sharing. Using SciDB+STARE as part of an integrated analysis infrastructure, we demonstrate the dramatic ease of combining diametrically different datasets, i.e. gridded (NMQ radar) vs. spacecraft swath (TRMM). SciDB+STARE is an important step towards a computational infrastructure for integrating and sharing diverse, complex Earth Science data and science products derived from them.

  18. LiverAtlas: a unique integrated knowledge database for systems-level research of liver and hepatic disease.

    PubMed

    Zhang, Yanqiong; Yang, Chunyuan; Wang, Shaochuang; Chen, Tao; Li, Mansheng; Wang, Xue; Li, Dongsheng; Wang, Kang; Ma, Jie; Wu, Songfeng; Zhang, Xueli; Zhu, Yunping; Wu, Jinsheng; He, Fuchu

    2013-09-01

    A large amount of liver-related physiological and pathological data exist in publicly available biological and bibliographic databases, which are usually far from comprehensive or integrated. Data collection, integration and mining processes pose a great challenge to scientific researchers and clinicians interested in the liver. To address these problems, we constructed LiverAtlas (http://liveratlas.hupo.org.cn), a comprehensive resource of biomedical knowledge related to the liver and various hepatic diseases by incorporating 53 databases. In the present version, LiverAtlas covers data on liver-related genomics, transcriptomics, proteomics, metabolomics and hepatic diseases. Additionally, LiverAtlas provides a wealth of manually curated information, relevant literature citations and cross-references to other databases. Importantly, an expert-confirmed Human Liver Disease Ontology, including relevant information for 227 types of hepatic disease, has been constructed and is used to annotate LiverAtlas data. Furthermore, we have demonstrated two examples of applying LiverAtlas data to identify candidate markers for hepatocellular carcinoma (HCC) at the systems level and to develop a systems biology-based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC differential diagnosis. LiverAtlas is the most comprehensive liver and hepatic disease resource, which helps biologists and clinicians to analyse their data at the systems level and will contribute much to the biomarker discovery and diagnostic performance enhancement for liver diseases. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  19. NeMedPlant: a database of therapeutic applications and chemical constituents of medicinal plants from north-east region of India

    PubMed Central

    Meetei, Potshangbam Angamba; Singh, Pankaj; Nongdam, Potshangbam; Prabhu, N Prakash; Rathore, RS; Vindal, Vaibhav

    2012-01-01

    The North-East region of India is one of the twelve mega biodiversity region, containing many rare and endangered species. A curated database of medicinal and aromatic plants from the regions called NeMedPlant is developed. The database contains traditional, scientific and medicinal information about plants and their active constituents, obtained from scholarly literature and local sources. The database is cross-linked with major biochemical databases and analytical tools. The integrated database provides resource for investigations into hitherto unexplored medicinal plants and serves to speed up the discovery of natural productsbased drugs. Availability The database is available for free at http://bif.uohyd.ac.in/nemedplant/orhttp://202.41.85.11/nemedplant/ PMID:22419844

  20. The Papillomavirus Episteme: a central resource for papillomavirus sequence data and analysis.

    PubMed

    Van Doorslaer, Koenraad; Tan, Qina; Xirasagar, Sandhya; Bandaru, Sandya; Gopalan, Vivek; Mohamoud, Yasmin; Huyen, Yentram; McBride, Alison A

    2013-01-01

    The goal of the Papillomavirus Episteme (PaVE) is to provide an integrated resource for the analysis of papillomavirus (PV) genome sequences and related information. The PaVE is a freely accessible, web-based tool (http://pave.niaid.nih.gov) created around a relational database, which enables storage, analysis and exchange of sequence information. From a design perspective, the PaVE adopts an Open Source software approach and stresses the integration and reuse of existing tools. Reference PV genome sequences have been extracted from publicly available databases and reannotated using a custom-created tool. To date, the PaVE contains 241 annotated PV genomes, 2245 genes and regions, 2004 protein sequences and 47 protein structures, which users can explore, analyze or download. The PaVE provides scientists with the data and tools needed to accelerate scientific progress for the study and treatment of diseases caused by PVs.

  1. Comprehensive data resources and analytical tools for pathological association of aminoacyl tRNA synthetases with cancer

    PubMed Central

    Lee, Ji-Hyun; You, Sungyong; Hyeon, Do Young; Kang, Byeongsoo; Kim, Hyerim; Park, Kyoung Mii; Han, Byungwoo; Hwang, Daehee; Kim, Sunghoon

    2015-01-01

    Mammalian cells have cytoplasmic and mitochondrial aminoacyl-tRNA synthetases (ARSs) that catalyze aminoacylation of tRNAs during protein synthesis. Despite their housekeeping functions in protein synthesis, recently, ARSs and ARS-interacting multifunctional proteins (AIMPs) have been shown to play important roles in disease pathogenesis through their interactions with disease-related molecules. However, there are lacks of data resources and analytical tools that can be used to examine disease associations of ARS/AIMPs. Here, we developed an Integrated Database for ARSs (IDA), a resource database including cancer genomic/proteomic and interaction data of ARS/AIMPs. IDA includes mRNA expression, somatic mutation, copy number variation and phosphorylation data of ARS/AIMPs and their interacting proteins in various cancers. IDA further includes an array of analytical tools for exploration of disease association of ARS/AIMPs, identification of disease-associated ARS/AIMP interactors and reconstruction of ARS-dependent disease-perturbed network models. Therefore, IDA provides both comprehensive data resources and analytical tools for understanding potential roles of ARS/AIMPs in cancers. Database URL: http://ida.biocon.re.kr/, http://ars.biocon.re.kr/ PMID:25824651

  2. dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins.

    PubMed

    Huang, Kai-Yao; Su, Min-Gang; Kao, Hui-Ju; Hsieh, Yun-Chung; Jhong, Jhih-Hua; Cheng, Kuang-Hao; Huang, Hsien-Da; Lee, Tzong-Yi

    2016-01-04

    Owing to the importance of the post-translational modifications (PTMs) of proteins in regulating biological processes, the dbPTM (http://dbPTM.mbc.nctu.edu.tw/) was developed as a comprehensive database of experimentally verified PTMs from several databases with annotations of potential PTMs for all UniProtKB protein entries. For this 10th anniversary of dbPTM, the updated resource provides not only a comprehensive dataset of experimentally verified PTMs, supported by the literature, but also an integrative interface for accessing all available databases and tools that are associated with PTM analysis. As well as collecting experimental PTM data from 14 public databases, this update manually curates over 12 000 modified peptides, including the emerging S-nitrosylation, S-glutathionylation and succinylation, from approximately 500 research articles, which were retrieved by text mining. As the number of available PTM prediction methods increases, this work compiles a non-homologous benchmark dataset to evaluate the predictive power of online PTM prediction tools. An increasing interest in the structural investigation of PTM substrate sites motivated the mapping of all experimental PTM peptides to protein entries of Protein Data Bank (PDB) based on database identifier and sequence identity, which enables users to examine spatially neighboring amino acids, solvent-accessible surface area and side-chain orientations for PTM substrate sites on tertiary structures. Since drug binding in PDB is annotated, this update identified over 1100 PTM sites that are associated with drug binding. The update also integrates metabolic pathways and protein-protein interactions to support the PTM network analysis for a group of proteins. Finally, the web interface is redesigned and enhanced to facilitate access to this resource. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. From data repositories to submission portals: rethinking the role of domain-specific databases in CollecTF.

    PubMed

    Kılıç, Sefa; Sagitova, Dinara M; Wolfish, Shoshannah; Bely, Benoit; Courtot, Mélanie; Ciufo, Stacy; Tatusova, Tatiana; O'Donovan, Claire; Chibucos, Marcus C; Martin, Maria J; Erill, Ivan

    2016-01-01

    Domain-specific databases are essential resources for the biomedical community, leveraging expert knowledge to curate published literature and provide access to referenced data and knowledge. The limited scope of these databases, however, poses important challenges on their infrastructure, visibility, funding and usefulness to the broader scientific community. CollecTF is a community-oriented database documenting experimentally validated transcription factor (TF)-binding sites in the Bacteria domain. In its quest to become a community resource for the annotation of transcriptional regulatory elements in bacterial genomes, CollecTF aims to move away from the conventional data-repository paradigm of domain-specific databases. Through the adoption of well-established ontologies, identifiers and collaborations, CollecTF has progressively become also a portal for the annotation and submission of information on transcriptional regulatory elements to major biological sequence resources (RefSeq, UniProtKB and the Gene Ontology Consortium). This fundamental change in database conception capitalizes on the domain-specific knowledge of contributing communities to provide high-quality annotations, while leveraging the availability of stable information hubs to promote long-term access and provide high-visibility to the data. As a submission portal, CollecTF generates TF-binding site information through direct annotation of RefSeq genome records, definition of TF-based regulatory networks in UniProtKB entries and submission of functional annotations to the Gene Ontology. As a database, CollecTF provides enhanced search and browsing, targeted data exports, binding motif analysis tools and integration with motif discovery and search platforms. This innovative approach will allow CollecTF to focus its limited resources on the generation of high-quality information and the provision of specialized access to the data.Database URL: http://www.collectf.org/. © The Author(s) 2016. Published by Oxford University Press.

  4. NCBI Epigenomics: what's new for 2013.

    PubMed

    Fingerman, Ian M; Zhang, Xuan; Ratzat, Walter; Husain, Nora; Cohen, Robert F; Schuler, Gregory D

    2013-01-01

    The Epigenomics resource at the National Center for Biotechnology Information (NCBI) has been created to serve as a comprehensive public repository for whole-genome epigenetic data sets (www.ncbi.nlm.nih.gov/epigenomics). We have constructed this resource by selecting the subset of epigenetics-specific data from the Gene Expression Omnibus (GEO) database and then subjecting them to further review and annotation. Associated data tracks can be viewed using popular genome browsers or downloaded for local analysis. We have performed extensive user testing throughout the development of this resource, and new features and improvements are continuously being implemented based on the results. We have made substantial usability improvements to user interfaces, enhanced functionality, made identification of data tracks of interest easier and created new tools for preliminary data analyses. Additionally, we have made efforts to enhance the integration between the Epigenomics resource and other NCBI databases, including the Gene database and PubMed. Data holdings have also increased dramatically since the initial publication describing the NCBI Epigenomics resource and currently consist of >3700 viewable and downloadable data tracks from 955 biological sources encompassing five well-studied species. This updated manuscript highlights these changes and improvements.

  5. NCBI Epigenomics: What’s new for 2013

    PubMed Central

    Fingerman, Ian M.; Zhang, Xuan; Ratzat, Walter; Husain, Nora; Cohen, Robert F.; Schuler, Gregory D.

    2013-01-01

    The Epigenomics resource at the National Center for Biotechnology Information (NCBI) has been created to serve as a comprehensive public repository for whole-genome epigenetic data sets (www.ncbi.nlm.nih.gov/epigenomics). We have constructed this resource by selecting the subset of epigenetics-specific data from the Gene Expression Omnibus (GEO) database and then subjecting them to further review and annotation. Associated data tracks can be viewed using popular genome browsers or downloaded for local analysis. We have performed extensive user testing throughout the development of this resource, and new features and improvements are continuously being implemented based on the results. We have made substantial usability improvements to user interfaces, enhanced functionality, made identification of data tracks of interest easier and created new tools for preliminary data analyses. Additionally, we have made efforts to enhance the integration between the Epigenomics resource and other NCBI databases, including the Gene database and PubMed. Data holdings have also increased dramatically since the initial publication describing the NCBI Epigenomics resource and currently consist of >3700 viewable and downloadable data tracks from 955 biological sources encompassing five well-studied species. This updated manuscript highlights these changes and improvements. PMID:23193265

  6. KaBOB: ontology-based semantic integration of biomedical databases.

    PubMed

    Livingston, Kevin M; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E

    2015-04-23

    The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.

  7. On a High-Performance VLSI Solution to Database Problems.

    DTIC Science & Technology

    1981-08-01

    offer such attractive features as automatic verification and. maintenance of semantic integrity, usage of views as abstraction and authorization...course, is the waste of too much potential resource. The global database may contain information for many different users and applications. In processing...working on, this may cause no damage at all, but some waste of space. Therefore one solution may be perhaps to do nothing to prevent its occurrence

  8. BNDB - the Biochemical Network Database.

    PubMed

    Küntzer, Jan; Backes, Christina; Blum, Torsten; Gerasch, Andreas; Kaufmann, Michael; Kohlbacher, Oliver; Lenhof, Hans-Peter

    2007-10-02

    Technological advances in high-throughput techniques and efficient data acquisition methods have resulted in a massive amount of life science data. The data is stored in numerous databases that have been established over the last decades and are essential resources for scientists nowadays. However, the diversity of the databases and the underlying data models make it difficult to combine this information for solving complex problems in systems biology. Currently, researchers typically have to browse several, often highly focused, databases to obtain the required information. Hence, there is a pressing need for more efficient systems for integrating, analyzing, and interpreting these data. The standardization and virtual consolidation of the databases is a major challenge resulting in a unified access to a variety of data sources. We present the Biochemical Network Database (BNDB), a powerful relational database platform, allowing a complete semantic integration of an extensive collection of external databases. BNDB is built upon a comprehensive and extensible object model called BioCore, which is powerful enough to model most known biochemical processes and at the same time easily extensible to be adapted to new biological concepts. Besides a web interface for the search and curation of the data, a Java-based viewer (BiNA) provides a powerful platform-independent visualization and navigation of the data. BiNA uses sophisticated graph layout algorithms for an interactive visualization and navigation of BNDB. BNDB allows a simple, unified access to a variety of external data sources. Its tight integration with the biochemical network library BN++ offers the possibility for import, integration, analysis, and visualization of the data. BNDB is freely accessible at http://www.bndb.org.

  9. Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource

    PubMed Central

    Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa

    2003-01-01

    Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355

  10. PatGen--a consolidated resource for searching genetic patent sequences.

    PubMed

    Rouse, Richard J D; Castagnetto, Jesus; Niedner, Roland H

    2005-04-15

    Compared to the wealth of online resources covering genomic, proteomic and derived data the Bioinformatics community is rather underserved when it comes to patent information related to biological sequences. The current online resources are either incomplete or rather expensive. This paper describes, PatGen, an integrated database containing data from bioinformatic and patent resources. This effort addresses the inconsistency of publicly available genetic patent data coverage by providing access to a consolidated dataset. PatGen can be searched at http://www.patgendb.com rjdrouse@patentinformatics.com.

  11. ESCAPE: database for integrating high-content published data collected from human and mouse embryonic stem cells.

    PubMed

    Xu, Huilei; Baroukh, Caroline; Dannenfelser, Ruth; Chen, Edward Y; Tan, Christopher M; Kou, Yan; Kim, Yujin E; Lemischka, Ihor R; Ma'ayan, Avi

    2013-01-01

    High content studies that profile mouse and human embryonic stem cells (m/hESCs) using various genome-wide technologies such as transcriptomics and proteomics are constantly being published. However, efforts to integrate such data to obtain a global view of the molecular circuitry in m/hESCs are lagging behind. Here, we present an m/hESC-centered database called Embryonic Stem Cell Atlas from Pluripotency Evidence integrating data from many recent diverse high-throughput studies including chromatin immunoprecipitation followed by deep sequencing, genome-wide inhibitory RNA screens, gene expression microarrays or RNA-seq after knockdown (KD) or overexpression of critical factors, immunoprecipitation followed by mass spectrometry proteomics and phosphoproteomics. The database provides web-based interactive search and visualization tools that can be used to build subnetworks and to identify known and novel regulatory interactions across various regulatory layers. The web-interface also includes tools to predict the effects of combinatorial KDs by additive effects controlled by sliders, or through simulation software implemented in MATLAB. Overall, the Embryonic Stem Cell Atlas from Pluripotency Evidence database is a comprehensive resource for the stem cell systems biology community. Database URL: http://www.maayanlab.net/ESCAPE

  12. ILDgenDB: integrated genetic knowledge resource for interstitial lung diseases (ILDs).

    PubMed

    Mishra, Smriti; Shah, Mohammad I; Sarkar, Malay; Asati, Nimisha; Rout, Chittaranjan

    2018-01-01

    Interstitial lung diseases (ILDs) are a diverse group of ∼200 acute and chronic pulmonary disorders that are characterized by variable amounts of inflammation, fibrosis and architectural distortion with substantial morbidity and mortality. Inaccurate and delayed diagnoses increase the risk, especially in developing countries. Studies have indicated the significant roles of genetic elements in ILDs pathogenesis. Therefore, the first genetic knowledge resource, ILDgenDB, has been developed with an objective to provide ILDs genetic data and their integrated analyses for the better understanding of disease pathogenesis and identification of diagnostics-based biomarkers. This resource contains literature-curated disease candidate genes (DCGs) enriched with various regulatory elements that have been generated using an integrated bioinformatics workflow of databases searches, literature-mining and DCGs-microRNA (miRNAs)-single nucleotide polymorphisms (SNPs) association analyses. To provide statistical significance to disease-gene association, ILD-specificity index and hypergeomatric test scores were also incorporated. Association analyses of miRNAs, SNPs and pathways responsible for the pathogenesis of different sub-classes of ILDs were also incorporated. Manually verified 299 DCGs and their significant associations with 1932 SNPs, 2966 miRNAs and 9170 miR-polymorphisms were also provided. Furthermore, 216 literature-mined and proposed biomarkers were identified. The ILDgenDB resource provides user-friendly browsing and extensive query-based information retrieval systems. Additionally, this resource also facilitates graphical view of predicted DCGs-SNPs/miRNAs and literature associated DCGs-ILDs interactions for each ILD to facilitate efficient data interpretation. Outcomes of analyses suggested the significant involvement of immune system and defense mechanisms in ILDs pathogenesis. This resource may potentially facilitate genetic-based disease monitoring and diagnosis.Database URL: http://14.139.240.55/ildgendb/index.php.

  13. Database citation in full text biomedical articles.

    PubMed

    Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R

    2013-01-01

    Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.

  14. Database Citation in Full Text Biomedical Articles

    PubMed Central

    Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R.

    2013-01-01

    Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services. PMID:23734176

  15. PIGD: a database for intronless genes in the Poaceae.

    PubMed

    Yan, Hanwei; Jiang, Cuiping; Li, Xiaoyu; Sheng, Lei; Dong, Qing; Peng, Xiaojian; Li, Qian; Zhao, Yang; Jiang, Haiyang; Cheng, Beijiu

    2014-10-01

    Intronless genes are a feature of prokaryotes; however, they are widespread and unequally distributed among eukaryotes and represent an important resource to study the evolution of gene architecture. Although many databases on exons and introns exist, there is currently no cohesive database that collects intronless genes in plants into a single database. In this study, we present the Poaceae Intronless Genes Database (PIGD), a user-friendly web interface to explore information on intronless genes from different plants. Five Poaceae species, Sorghum bicolor, Zea mays, Setaria italica, Panicum virgatum and Brachypodium distachyon, are included in the current release of PIGD. Gene annotations and sequence data were collected and integrated from different databases. The primary focus of this study was to provide gene descriptions and gene product records. In addition, functional annotations, subcellular localization prediction and taxonomic distribution are reported. PIGD allows users to readily browse, search and download data. BLAST and comparative analyses are also provided through this online database, which is available at http://pigd.ahau.edu.cn/. PIGD provides a solid platform for the collection, integration and analysis of intronless genes in the Poaceae. As such, this database will be useful for subsequent bio-computational analysis in comparative genomics and evolutionary studies.

  16. NEIBank: Genomics and bioinformatics resources for vision research

    PubMed Central

    Peterson, Katherine; Gao, James; Buchoff, Patee; Jaworski, Cynthia; Bowes-Rickman, Catherine; Ebright, Jessica N.; Hauser, Michael A.; Hoover, David

    2008-01-01

    NEIBank is an integrated resource for genomics and bioinformatics in vision research. It includes expressed sequence tag (EST) data and sequence-verified cDNA clones for multiple eye tissues of several species, web-based access to human eye-specific SAGE data through EyeSAGE, and comprehensive, annotated databases of known human eye disease genes and candidate disease gene loci. All expression- and disease-related data are integrated in EyeBrowse, an eye-centric genome browser. NEIBank provides a comprehensive overview of current knowledge of the transcriptional repertoires of eye tissues and their relation to pathology. PMID:18648525

  17. Resource physiology of conifers: Acquisition, allocation, and utilization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, W.K.; Hinckley, T.M.

    1995-03-01

    This book focuses on a synthetic view of the resource physiology of conifer trees with an emphasis on developing a perspective that can integrate across the biological hierarchy. This objective is in concert with more scientific goals of maintaining biological diversity and the sustainability of forest systems. The preservation of coniferous forest ecosystems is a major concern today. This volume deals with the topics of resource acquisition, allocation, and utilization in conifers. Selected papers are indexed separately for inclusion in the Energy Science and Technology Database.

  18. Large-Scale Collection and Analysis of Full-Length cDNAs from Brachypodium distachyon and Integration with Pooideae Sequence Resources

    PubMed Central

    Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Takahashi, Fuminori; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

    2013-01-01

    A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the −3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a “one-stop” information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops. PMID:24130698

  19. Databases, Repositories, and Other Data Resources in Structural Biology.

    PubMed

    Zheng, Heping; Porebski, Przemyslaw J; Grabowski, Marek; Cooper, David R; Minor, Wladek

    2017-01-01

    Structural biology, like many other areas of modern science, produces an enormous amount of primary, derived, and "meta" data with a high demand on data storage and manipulations. Primary data come from various steps of sample preparation, diffraction experiments, and functional studies. These data are not only used to obtain tangible results, like macromolecular structural models, but also to enrich and guide our analysis and interpretation of various biomedical problems. Herein we define several categories of data resources, (a) Archives, (b) Repositories, (c) Databases, and (d) Advanced Information Systems, that can accommodate primary, derived, or reference data. Data resources may be used either as web portals or internally by structural biology software. To be useful, each resource must be maintained, curated, as well as integrated with other resources. Ideally, the system of interconnected resources should evolve toward comprehensive "hubs", or Advanced Information Systems. Such systems, encompassing the PDB and UniProt, are indispensable not only for structural biology, but for many related fields of science. The categories of data resources described herein are applicable well beyond our usual scientific endeavors.

  20. MIPS: curated databases and comprehensive secondary data resources in 2010.

    PubMed

    Mewes, H Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F X; Stümpflen, Volker; Antonov, Alexey

    2011-01-01

    The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).

  1. MIPS: curated databases and comprehensive secondary data resources in 2010

    PubMed Central

    Mewes, H. Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F.X.; Stümpflen, Volker; Antonov, Alexey

    2011-01-01

    The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38 000 000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de). PMID:21109531

  2. Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies.

    PubMed

    Yang, Tsun-Po; Beazley, Claude; Montgomery, Stephen B; Dimas, Antigone S; Gutierrez-Arcelus, Maria; Stranger, Barbara E; Deloukas, Panos; Dermitzakis, Emmanouil T

    2010-10-01

    Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols. http://www.sanger.ac.uk/resources/software/genevar.

  3. GénoPlante-Info (GPI): a collection of databases and bioinformatics resources for plant genomics

    PubMed Central

    Samson, Delphine; Legeai, Fabrice; Karsenty, Emmanuelle; Reboux, Sébastien; Veyrieras, Jean-Baptiste; Just, Jeremy; Barillot, Emmanuel

    2003-01-01

    Génoplante is a partnership program between public French institutes (INRA, CIRAD, IRD and CNRS) and private companies (Biogemma, Bayer CropScience and Bioplante) that aims at developing genome analysis programs for crop species (corn, wheat, rapeseed, sunflower and pea) and model plants (Arabidopsis and rice). The outputs of these programs form a wealth of information (genomic sequence, transcriptome, proteome, allelic variability, mapping and synteny, and mutation data) and tools (databases, interfaces, analysis software), that are being integrated and made public at the public bioinformatics resource centre of Génoplante: GénoPlante-Info (GPI). This continuous flood of data and tools is regularly updated and will grow continuously during the coming two years. Access to the GPI databases and tools is available at http://genoplante-info.infobiogen.fr/. PMID:12519976

  4. Columba: an integrated database of proteins, structures, and annotations.

    PubMed

    Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf

    2005-03-31

    Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.

  5. REFOLDdb: a new and sustainable gateway to experimental protocols for protein refolding.

    PubMed

    Mizutani, Hisashi; Sugawara, Hideaki; Buckle, Ashley M; Sangawa, Takeshi; Miyazono, Ken-Ichi; Ohtsuka, Jun; Nagata, Koji; Shojima, Tomoki; Nosaki, Shohei; Xu, Yuqun; Wang, Delong; Hu, Xiao; Tanokura, Masaru; Yura, Kei

    2017-04-24

    More than 7000 papers related to "protein refolding" have been published to date, with approximately 300 reports each year during the last decade. Whilst some of these papers provide experimental protocols for protein refolding, a survey in the structural life science communities showed a necessity for a comprehensive database for refolding techniques. We therefore have developed a new resource - "REFOLDdb" that collects refolding techniques into a single, searchable repository to help researchers develop refolding protocols for proteins of interest. We based our resource on the existing REFOLD database, which has not been updated since 2009. We redesigned the data format to be more concise, allowing consistent representations among data entries compared with the original REFOLD database. The remodeled data architecture enhances the search efficiency and improves the sustainability of the database. After an exhaustive literature search we added experimental refolding protocols from reports published 2009 to early 2017. In addition to this new data, we fully converted and integrated existing REFOLD data into our new resource. REFOLDdb contains 1877 entries as of March 17 th , 2017, and is freely available at http://p4d-info.nig.ac.jp/refolddb/ . REFOLDdb is a unique database for the life sciences research community, providing annotated information for designing new refolding protocols and customizing existing methodologies. We envisage that this resource will find wide utility across broad disciplines that rely on the production of pure, active, recombinant proteins. Furthermore, the database also provides a useful overview of the recent trends and statistics in refolding technology development.

  6. The Image Data Resource: A Bioimage Data Integration and Publication Platform.

    PubMed

    Williams, Eleanor; Moore, Josh; Li, Simon W; Rustici, Gabriella; Tarkowska, Aleksandra; Chessel, Anatole; Leo, Simone; Antal, Bálint; Ferguson, Richard K; Sarkans, Ugis; Brazma, Alvis; Salas, Rafael E Carazo; Swedlow, Jason R

    2017-08-01

    Access to primary research data is vital for the advancement of science. To extend the data types supported by community repositories, we built a prototype Image Data Resource (IDR) that collects and integrates imaging data acquired across many different imaging modalities. IDR links data from several imaging modalities, including high-content screening, super-resolution and time-lapse microscopy, digital pathology, public genetic or chemical databases, and cell and tissue phenotypes expressed using controlled ontologies. Using this integration, IDR facilitates the analysis of gene networks and reveals functional interactions that are inaccessible to individual studies. To enable re-analysis, we also established a computational resource based on Jupyter notebooks that allows remote access to the entire IDR. IDR is also an open source platform that others can use to publish their own image data. Thus IDR provides both a novel on-line resource and a software infrastructure that promotes and extends publication and re-analysis of scientific image data.

  7. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse

    PubMed Central

    Liu, Zhi-Ping; Wu, Canglin; Miao, Hongyu; Wu, Hulin

    2015-01-01

    Transcriptional and post-transcriptional regulation of gene expression is of fundamental importance to numerous biological processes. Nowadays, an increasing amount of gene regulatory relationships have been documented in various databases and literature. However, to more efficiently exploit such knowledge for biomedical research and applications, it is necessary to construct a genome-wide regulatory network database to integrate the information on gene regulatory relationships that are widely scattered in many different places. Therefore, in this work, we build a knowledge-based database, named ‘RegNetwork’, of gene regulatory networks for human and mouse by collecting and integrating the documented regulatory interactions among transcription factors (TFs), microRNAs (miRNAs) and target genes from 25 selected databases. Moreover, we also inferred and incorporated potential regulatory relationships based on transcription factor binding site (TFBS) motifs into RegNetwork. As a result, RegNetwork contains a comprehensive set of experimentally observed or predicted transcriptional and post-transcriptional regulatory relationships, and the database framework is flexibly designed for potential extensions to include gene regulatory networks for other organisms in the future. Based on RegNetwork, we characterized the statistical and topological properties of genome-wide regulatory networks for human and mouse, we also extracted and interpreted simple yet important network motifs that involve the interplays between TF-miRNA and their targets. In summary, RegNetwork provides an integrated resource on the prior information for gene regulatory relationships, and it enables us to further investigate context-specific transcriptional and post-transcriptional regulatory interactions based on domain-specific experimental data. Database URL: http://www.regnetworkweb.org PMID:26424082

  8. ChemProt-2.0: visual navigation in a disease chemical biology database

    PubMed Central

    Kim Kjærulff, Sonny; Wich, Louis; Kringelum, Jens; Jacobsen, Ulrik P.; Kouskoumvekaki, Irene; Audouze, Karine; Lund, Ole; Brunak, Søren; Oprea, Tudor I.; Taboureau, Olivier

    2013-01-01

    ChemProt-2.0 (http://www.cbs.dtu.dk/services/ChemProt-2.0) is a public available compilation of multiple chemical–protein annotation resources integrated with diseases and clinical outcomes information. The database has been updated to >1.15 million compounds with 5.32 millions bioactivity measurements for 15 290 proteins. Each protein is linked to quality-scored human protein–protein interactions data based on more than half a million interactions, for studying diseases and biological outcomes (diseases, pathways and GO terms) through protein complexes. In ChemProt-2.0, therapeutic effects as well as adverse drug reactions have been integrated allowing for suggesting proteins associated to clinical outcomes. New chemical structure fingerprints were computed based on the similarity ensemble approach. Protein sequence similarity search was also integrated to evaluate the promiscuity of proteins, which can help in the prediction of off-target effects. Finally, the database was integrated into a visual interface that enables navigation of the pharmacological space for small molecules. Filtering options were included in order to facilitate and to guide dynamic search of specific queries. PMID:23185041

  9. TISSUES 2.0: an integrative web resource on mammalian tissue expression

    PubMed Central

    Palasca, Oana; Santos, Alberto; Stolte, Christian; Gorodkin, Jan; Jensen, Lars Juhl

    2018-01-01

    Abstract Physiological and molecular similarities between organisms make it possible to translate findings from simpler experimental systems—model organisms—into more complex ones, such as human. This translation facilitates the understanding of biological processes under normal or disease conditions. Researchers aiming to identify the similarities and differences between organisms at the molecular level need resources collecting multi-organism tissue expression data. We have developed a database of gene–tissue associations in human, mouse, rat and pig by integrating multiple sources of evidence: transcriptomics covering all four species and proteomics (human only), manually curated and mined from the scientific literature. Through a scoring scheme, these associations are made comparable across all sources of evidence and across organisms. Furthermore, the scoring produces a confidence score assigned to each of the associations. The TISSUES database (version 2.0) is publicly accessible through a user-friendly web interface and as part of the STRING app for Cytoscape. In addition, we analyzed the agreement between datasets, across and within organisms, and identified that the agreement is mainly affected by the quality of the datasets rather than by the technologies used or organisms compared. Database URL: http://tissues.jensenlab.org/ PMID:29617745

  10. Emissions & Generation Resource Integrated Database (eGRID), eGRID2012

    EPA Pesticide Factsheets

    The Emissions & Generation Resource Integrated Database (eGRID) is a comprehensive source of data on the environmental characteristics of almost all electric power generated in the United States. These environmental characteristics include air emissions for nitrogen oxides, sulfur dioxide, carbon dioxide, methane, and nitrous oxide; emissions rates; net generation; resource mix; and many other attributes. eGRID2012 Version 1.0 is the eighth edition of eGRID, which contains the complete release of year 2009 data, as well as year 2007, 2005, and 2004 data. For year 2009 data, all the data are contained in a single Microsoft Excel workbook, which contains boiler, generator, plant, state, power control area, eGRID subregion, NERC region, U.S. total and grid gross loss factor tabs. Full documentation, summary data, eGRID subregion and NERC region representational maps, and GHG emission factors are also released in this edition. The fourth edition of eGRID, eGRID2002 Version 2.01, containing year 1996 through 2000 data is located on the eGRID Archive page (http://www.epa.gov/cleanenergy/energy-resources/egrid/archive.html). The current edition of eGRID and the archived edition of eGRID contain the following years of data: 1996 - 2000, 2004, 2005, and 2007. eGRID has no other years of data.

  11. Realizing what's essential: a case study on integrating electronic journal management into a print-centric technical services department.

    PubMed

    Dollar, Daniel M; Gallagher, John; Glover, Janis; Marone, Regina Kenny; Crooker, Cynthia

    2007-04-01

    To support migration from print to electronic resources, the Cushing/Whitney Medical Library at Yale University reorganized its Technical Services Department to focus on managing electronic resources. The library hired consultants to help plan the changes and to present recommendations for integrating electronic resource management into every position. The library task force decided to focus initial efforts on the periodical collection. To free staff time to devote to electronic journals, most of the print subscriptions were switched to online only and new workflows were developed for e-journals. Staff learned new responsibilities such as activating e-journals, maintaining accurate holdings information in the online public access catalog and e-journals database ("electronic shelf reading"), updating the link resolver knowledgebase, and troubleshooting. All of the serials team members now spend significant amounts of time managing e-journals. The serials staff now spends its time managing the materials most important to the library's clientele (e-journals and databases). The team's proactive approach to maintenance work and rapid response to reported problems should improve patrons' experiences using e-journals. The library is taking advantage of new technologies such as an electronic resource management system, and library workflows and procedures will continue to evolve as technology changes.

  12. PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species ▿ ‡ #

    PubMed Central

    Gillespie, Joseph J.; Wattam, Alice R.; Cammer, Stephen A.; Gabbard, Joseph L.; Shukla, Maulik P.; Dalay, Oral; Driscoll, Timothy; Hix, Deborah; Mane, Shrinivasrao P.; Mao, Chunhong; Nordberg, Eric K.; Scott, Mark; Schulman, Julie R.; Snyder, Eric E.; Sullivan, Daniel E.; Wang, Chunxia; Warren, Andrew; Williams, Kelly P.; Xue, Tian; Seung Yoo, Hyun; Zhang, Chengdong; Zhang, Yan; Will, Rebecca; Kenyon, Ronald W.; Sobral, Bruno W.

    2011-01-01

    Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided. PMID:21896772

  13. Realizing what's essential: a case study on integrating electronic journal management into a print-centric technicalservices department

    PubMed Central

    Dollar, Daniel M.; Gallagher, John; Glover, Janis; Marone, Regina Kenny; Crooker, Cynthia

    2007-01-01

    Objective: To support migration from print to electronic resources, the Cushing/Whitney Medical Library at Yale University reorganized its Technical Services Department to focus on managing electronic resources. Methods: The library hired consultants to help plan the changes and to present recommendations for integrating electronic resource management into every position. The library task force decided to focus initial efforts on the periodical collection. To free staff time to devote to electronic journals, most of the print subscriptions were switched to online only and new workflows were developed for e-journals. Results: Staff learned new responsibilities such as activating e-journals, maintaining accurate holdings information in the online public access catalog and e-journals database (“electronic shelf reading”), updating the link resolver knowledgebase, and troubleshooting. All of the serials team members now spend significant amounts of time managing e-journals. Conclusions: The serials staff now spends its time managing the materials most important to the library's clientele (e-journals and databases). The team's proactive approach to maintenance work and rapid response to reported problems should improve patrons' experiences using e-journals. The library is taking advantage of new technologies such as an electronic resource management system, and library workflows and procedures will continue to evolve as technology changes. PMID:17443247

  14. Emissions & Generation Resource Integrated Database (eGRID) Questions and Answers

    EPA Pesticide Factsheets

    eGRID is a comprehensive source of data on the environmental characteristics of almost all electric power generated in the United States. eGRID is based on available plant-specific data for all U.S. electricity generating plants that report data.

  15. BioMart Central Portal: an open database network for the biological community.

    PubMed

    Guberman, Jonathan M; Ai, J; Arnaiz, O; Baran, Joachim; Blake, Andrew; Baldock, Richard; Chelala, Claude; Croft, David; Cros, Anthony; Cutts, Rosalind J; Di Génova, A; Forbes, Simon; Fujisawa, T; Gadaleta, E; Goodstein, D M; Gundem, Gunes; Haggarty, Bernard; Haider, Syed; Hall, Matthew; Harris, Todd; Haw, Robin; Hu, S; Hubbard, Simon; Hsu, Jack; Iyer, Vivek; Jones, Philip; Katayama, Toshiaki; Kinsella, R; Kong, Lei; Lawson, Daniel; Liang, Yong; Lopez-Bigas, Nuria; Luo, J; Lush, Michael; Mason, Jeremy; Moreews, Francois; Ndegwa, Nelson; Oakley, Darren; Perez-Llamas, Christian; Primig, Michael; Rivkin, Elena; Rosanoff, S; Shepherd, Rebecca; Simon, Reinhard; Skarnes, B; Smedley, Damian; Sperling, Linda; Spooner, William; Stevenson, Peter; Stone, Kevin; Teague, J; Wang, Jun; Wang, Jianxin; Whitty, Brett; Wong, D T; Wong-Erasmus, Marie; Yao, L; Youens-Clark, Ken; Yung, Christina; Zhang, Junjun; Kasprzyk, Arek

    2011-01-01

    BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities.

  16. Generation and validation of a universal perinatal database and biospecimen repository: PeriBank.

    PubMed

    Antony, K M; Hemarajata, P; Chen, J; Morris, J; Cook, C; Masalas, D; Gedminas, M; Brown, A; Versalovic, J; Aagaard, K

    2016-11-01

    There is a dearth of biospecimen repositories available to perinatal researchers. In order to address this need, here we describe the methodology used to establish such a resource. With the collaboration of MedSci.net, we generated an online perinatal database with 847 fields of clinical information. Simultaneously, we established a biospecimen repository of the same clinical participants. The demographic and clinical outcomes data are described for the first 10 000 participants enrolled. The demographic characteristics are consistent with the demographics of the delivery hospitals. Quality analysis of the biospecimens reveals variation in very few analytes. Furthermore, since the creation of PeriBank, we have demonstrated validity of the database and tissue integrity of the biospecimen repository. Here we establish that the creation of a universal perinatal database and biospecimen collection is not only possible, but allows for the performance of state-of-the-science translational perinatal research and is a potentially valuable resource to academic perinatal researchers.

  17. Integration of a neuroimaging processing pipeline into a pan-canadian computing grid

    NASA Astrophysics Data System (ADS)

    Lavoie-Courchesne, S.; Rioux, P.; Chouinard-Decorte, F.; Sherif, T.; Rousseau, M.-E.; Das, S.; Adalat, R.; Doyon, J.; Craddock, C.; Margulies, D.; Chu, C.; Lyttelton, O.; Evans, A. C.; Bellec, P.

    2012-02-01

    The ethos of the neuroimaging field is quickly moving towards the open sharing of resources, including both imaging databases and processing tools. As a neuroimaging database represents a large volume of datasets and as neuroimaging processing pipelines are composed of heterogeneous, computationally intensive tools, such open sharing raises specific computational challenges. This motivates the design of novel dedicated computing infrastructures. This paper describes an interface between PSOM, a code-oriented pipeline development framework, and CBRAIN, a web-oriented platform for grid computing. This interface was used to integrate a PSOM-compliant pipeline for preprocessing of structural and functional magnetic resonance imaging into CBRAIN. We further tested the capacity of our infrastructure to handle a real large-scale project. A neuroimaging database including close to 1000 subjects was preprocessed using our interface and publicly released to help the participants of the ADHD-200 international competition. This successful experiment demonstrated that our integrated grid-computing platform is a powerful solution for high-throughput pipeline analysis in the field of neuroimaging.

  18. CHEMICAL STRUCTURE INDEXING OF TOXICITY DATA ON ...

    EPA Pesticide Factsheets

    Standardized chemical structure annotation of public toxicity databases and information resources is playing an increasingly important role in the 'flattening' and integration of diverse sets of biological activity data on the Internet. This review discusses public initiatives that are accelerating the pace of this transformation, with particular reference to toxicology-related chemical information. Chemical content annotators, structure locator services, large structure/data aggregator web sites, structure browsers, International Union of Pure and Applied Chemistry (IUPAC) International Chemical Identifier (InChI) codes, toxicity data models and public chemical/biological activity profiling initiatives are all playing a role in overcoming barriers to the integration of toxicity data, and are bringing researchers closer to the reality of a mineable chemical Semantic Web. An example of this integration of data is provided by the collaboration among researchers involved with the Distributed Structure-Searchable Toxicity (DSSTox) project, the Carcinogenic Potency Project, projects at the National Cancer Institute and the PubChem database. Standardizing chemical structure annotation of public toxicity databases

  19. The Histone Database: an integrated resource for histones and histone fold-containing proteins

    PubMed Central

    Mariño-Ramírez, Leonardo; Levine, Kevin M.; Morales, Mario; Zhang, Suiyuan; Moreland, R. Travis; Baxevanis, Andreas D.; Landsman, David

    2011-01-01

    Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins. Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/. PMID:22025671

  20. Fungal genome resources at NCBI.

    PubMed

    Robbertse, B; Tatusova, T

    2011-09-01

    The National Center for Biotechnology Information (NCBI) is well known for the nucleotide sequence archive, GenBank and sequence analysis tool BLAST. However, NCBI integrates many types of biomolecular data from variety of sources and makes it available to the scientific community as interactive web resources as well as organized releases of bulk data. These tools are available to explore and compare fungal genomes. Searching all databases with Fungi [organism] at http://www.ncbi.nlm.nih.gov/ is the quickest way to find resources of interest with fungal entries. Some tools though are resources specific and can be indirectly accessed from a particular database in the Entrez system. These include graphical viewers and comparative analysis tools such as TaxPlot, TaxMap and UniGene DDD (found via UniGene Homepage). Gene and BioProject pages also serve as portals to external data such as community annotation websites, BioGrid and UniProt. There are many different ways of accessing genomic data at NCBI. Depending on the focus and goal of research projects or the level of interest, a user would select a particular route for accessing genomic databases and resources. This review article describes methods of accessing fungal genome data and provides examples that illustrate the use of analysis tools.

  1. An online spatial database of Australian Indigenous Biocultural Knowledge for contemporary natural and cultural resource management.

    PubMed

    Pert, Petina L; Ens, Emilie J; Locke, John; Clarke, Philip A; Packer, Joanne M; Turpin, Gerry

    2015-11-15

    With growing international calls for the enhanced involvement of Indigenous peoples and their biocultural knowledge in managing conservation and the sustainable use of physical environment, it is timely to review the available literature and develop cross-cultural approaches to the management of biocultural resources. Online spatial databases are becoming common tools for educating land managers about Indigenous Biocultural Knowledge (IBK), specifically to raise a broad awareness of issues, identify knowledge gaps and opportunities, and to promote collaboration. Here we describe a novel approach to the application of internet and spatial analysis tools that provide an overview of publically available documented Australian IBK (AIBK) and outline the processes used to develop the online resource. By funding an AIBK working group, the Australian Centre for Ecological Analysis and Synthesis (ACEAS) provided a unique opportunity to bring together cross-cultural, cross-disciplinary and trans-organizational contributors who developed these resources. Without such an intentionally collaborative process, this unique tool would not have been developed. The tool developed through this process is derived from a spatial and temporal literature review, case studies and a compilation of methods, as well as other relevant AIBK papers. The online resource illustrates the depth and breadth of documented IBK and identifies opportunities for further work, partnerships and investment for the benefit of not only Indigenous Australians, but all Australians. The database currently includes links to over 1500 publically available IBK documents, of which 568 are geo-referenced and were mapped. It is anticipated that as awareness of the online resource grows, more documents will be provided through the website to build the database. It is envisaged that this will become a well-used tool, integral to future natural and cultural resource management and maintenance. Copyright © 2015. Published by Elsevier B.V.

  2. TogoTable: cross-database annotation system using the Resource Description Framework (RDF) data model.

    PubMed

    Kawano, Shin; Watanabe, Tsutomu; Mizuguchi, Sohei; Araki, Norie; Katayama, Toshiaki; Yamaguchi, Atsuko

    2014-07-01

    TogoTable (http://togotable.dbcls.jp/) is a web tool that adds user-specified annotations to a table that a user uploads. Annotations are drawn from several biological databases that use the Resource Description Framework (RDF) data model. TogoTable uses database identifiers (IDs) in the table as a query key for searching. RDF data, which form a network called Linked Open Data (LOD), can be searched from SPARQL endpoints using a SPARQL query language. Because TogoTable uses RDF, it can integrate annotations from not only the reference database to which the IDs originally belong, but also externally linked databases via the LOD network. For example, annotations in the Protein Data Bank can be retrieved using GeneID through links provided by the UniProt RDF. Because RDF has been standardized by the World Wide Web Consortium, any database with annotations based on the RDF data model can be easily incorporated into this tool. We believe that TogoTable is a valuable Web tool, particularly for experimental biologists who need to process huge amounts of data such as high-throughput experimental output. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. STINGRAY: system for integrated genomic resources and analysis.

    PubMed

    Wagner, Glauber; Jardim, Rodrigo; Tschoeke, Diogo A; Loureiro, Daniel R; Ocaña, Kary A C S; Ribeiro, Antonio C B; Emmel, Vanessa E; Probst, Christian M; Pitaluga, André N; Grisard, Edmundo C; Cavalcanti, Maria C; Campos, Maria L M; Mattoso, Marta; Dávila, Alberto M R

    2014-03-07

    The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/.

  4. STINGRAY: system for integrated genomic resources and analysis

    PubMed Central

    2014-01-01

    Background The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. Findings STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. Conclusion STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/. PMID:24606808

  5. The Plant Genome Integrative Explorer Resource: PlantGenIE.org.

    PubMed

    Sundell, David; Mannapperuma, Chanaka; Netotea, Sergiu; Delhomme, Nicolas; Lin, Yao-Cheng; Sjödin, Andreas; Van de Peer, Yves; Jansson, Stefan; Hvidsten, Torgeir R; Street, Nathaniel R

    2015-12-01

    Accessing and exploring large-scale genomics data sets remains a significant challenge to researchers without specialist bioinformatics training. We present the integrated PlantGenIE.org platform for exploration of Populus, conifer and Arabidopsis genomics data, which includes expression networks and associated visualization tools. Standard features of a model organism database are provided, including genome browsers, gene list annotation, Blast homology searches and gene information pages. Community annotation updating is supported via integration of WebApollo. We have produced an RNA-sequencing (RNA-Seq) expression atlas for Populus tremula and have integrated these data within the expression tools. An updated version of the ComPlEx resource for performing comparative plant expression analyses of gene coexpression network conservation between species has also been integrated. The PlantGenIE.org platform provides intuitive access to large-scale and genome-wide genomics data from model forest tree species, facilitating both community contributions to annotation improvement and tools supporting use of the included data resources to inform biological insight. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  6. The Virtual Watershed Observatory: Cyberinfrastructure for Model-Data Integration and Access

    NASA Astrophysics Data System (ADS)

    Duffy, C.; Leonard, L. N.; Giles, L.; Bhatt, G.; Yu, X.

    2011-12-01

    The Virtual Watershed Observatory (VWO) is a concept where scientists, water managers, educators and the general public can create a virtual observatory from integrated hydrologic model results, national databases and historical or real-time observations via web services. In this paper, we propose a prototype for automated and virtualized web services software using national data products for climate reanalysis, soils, geology, terrain and land cover. The VWO has the broad purpose of making accessible water resource simulations, real-time data assimilation, calibration and archival at the scale of HUC 12 watersheds (Hydrologic Unit Code) anywhere in the continental US. Our prototype for model-data integration focuses on creating tools for fast data storage from selected national databases, as well as the computational resources necessary for a dynamic, distributed watershed simulation. The paper will describe cyberinfrastructure tools and workflow that attempts to resolve the problem of model-data accessibility and scalability such that individuals, research teams, managers and educators can create a WVO in a desired context. Examples are given for the NSF-funded Shale Hills Critical Zone Observatory and the European Critical Zone Observatories within the SoilTrEC project. In the future implementation of WVO services will benefit from the development of a cloud cyber infrastructure as the prototype evolves to data and model intensive computation for continental scale water resource predictions.

  7. CHEMICAL STRUCTURE INDEXING OF TOXICITY DATA ON THE INTERNET: MOVING TOWARDS A FLAT WORLD

    EPA Science Inventory

    Standardized chemical structure annotation of public toxicity databases and information resources is playing an increasingly important role in the 'flattening' and integration of diverse sets of biological activity data on the Internet. This review discusses public initiatives th...

  8. Strategies for Change: Part I.

    ERIC Educational Resources Information Center

    Atkinson, Hugh C.

    1984-01-01

    Focusing on notion of change as integrated into personal and library settings, this essay discusses routines versus change, redistribution of resources, effects of economic recession, technological changes in information transfer media (papyrus and clay tablets, video disks, fiber optics, microcomputers, databases), and changes in library patrons'…

  9. EUROPA2: Plan Database Services for Planning and Scheduling Applications

    NASA Technical Reports Server (NTRS)

    Bedrax-Weiss, Tania; Frank, Jeremy; Jonsson, Ari; McGann, Conor

    2004-01-01

    NASA missions require solving a wide variety of planning and scheduling problems with temporal constraints; simple resources such as robotic arms, communications antennae and cameras; complex replenishable resources such as memory, power and fuel; and complex constraints on geometry, heat and lighting angles. Planners and schedulers that solve these problems are used in ground tools as well as onboard systems. The diversity of planning problems and applications of planners and schedulers precludes a one-size fits all solution. However, many of the underlying technologies are common across planning domains and applications. We describe CAPR, a formalism for planning that is general enough to cover a wide variety of planning and scheduling domains of interest to NASA. We then describe EUROPA(sub 2), a software framework implementing CAPR. EUROPA(sub 2) provides efficient, customizable Plan Database Services that enable the integration of CAPR into a wide variety of applications. We describe the design of EUROPA(sub 2) from the perspective of both modeling, customization and application integration to different classes of NASA missions.

  10. An Innovative Infrastructure with a Universal Geo-Spatiotemporal Data Representation Supporting Cost-Effective Integration of Diverse Earth Science Data

    NASA Technical Reports Server (NTRS)

    Rilee, Michael Lee; Kuo, Kwo-Sen

    2017-01-01

    The SpatioTemporal Adaptive Resolution Encoding (STARE) is a unifying scheme encoding geospatial and temporal information for organizing data on scalable computing/storage resources, minimizing expensive data transfers. STARE provides a compact representation that turns set-logic functions into integer operations, e.g. conditional sub-setting, taking into account representative spatiotemporal resolutions of the data in the datasets. STARE geo-spatiotemporally aligns data placements of diverse data on massive parallel resources to maximize performance. Automating important scientific functions (e.g. regridding) and computational functions (e.g. data placement) allows scientists to focus on domain-specific questions instead of expending their efforts and expertise on data processing. With STARE-enabled automation, SciDB (Scientific Database) plus STARE provides a database interface, reducing costly data preparation, increasing the volume and variety of interoperable data, and easing result sharing. Using SciDB plus STARE as part of an integrated analysis infrastructure dramatically eases combining diametrically different datasets.

  11. Ensembl variation resources

    PubMed Central

    2010-01-01

    Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org. PMID:20459805

  12. Teaching resources for dermatology on the WWW--quiz system and dynamic lecture scripts using a HTTP-database demon.

    PubMed Central

    Bittorf, A.; Diepgen, T. L.

    1996-01-01

    The World Wide Web (WWW) is becoming the major way of acquiring information in all scientific disciplines as well as in business. It is very well suitable for fast distribution and exchange of up to date teaching resources. However, to date most teaching applications on the Web do not use its full power by integrating interactive components. We have set up a computer based training (CBT) framework for Dermatology, which consists of dynamic lecture scripts, case reports, an atlas and a quiz system. All these components heavily rely on an underlying image database that permits the creation of dynamic documents. We used a demon process that keeps the database open and can be accessed using HTTP to achieve better performance and avoid the overhead involved by starting CGI-processes. The result of our evaluation was very encouraging. Images Figure 3 PMID:8947625

  13. FlyRNAi.org—the database of the Drosophila RNAi screening center and transgenic RNAi project: 2017 update

    PubMed Central

    Hu, Yanhui; Comjean, Aram; Roesel, Charles; Vinayagam, Arunachalam; Flockhart, Ian; Zirin, Jonathan; Perkins, Lizabeth; Perrimon, Norbert; Mohr, Stephanie E.

    2017-01-01

    The FlyRNAi database of the Drosophila RNAi Screening Center (DRSC) and Transgenic RNAi Project (TRiP) at Harvard Medical School and associated DRSC/TRiP Functional Genomics Resources website (http://fgr.hms.harvard.edu) serve as a reagent production tracking system, screen data repository, and portal to the community. Through this portal, we make available protocols, online tools, and other resources useful to researchers at all stages of high-throughput functional genomics screening, from assay design and reagent identification to data analysis and interpretation. In this update, we describe recent changes and additions to our website, database and suite of online tools. Recent changes reflect a shift in our focus from a single technology (RNAi) and model species (Drosophila) to the application of additional technologies (e.g. CRISPR) and support of integrated, cross-species approaches to uncovering gene function using functional genomics and other approaches. PMID:27924039

  14. Region and database management for HANDI 2000 business management system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wilson, D.

    The Data Integration 2000 Project will result in an integrated and comprehensive set of functional applications containing core information necessary to support the Project Hanford Management Contract. It is based on the Commercial-Off-The-Shelf product solution with commercially proven business processes. The COTS product solution set, of PassPort and People Soft software, supports finance, supply and chemical management/Material Safety Data Sheet, human resources.

  15. Influenza Research Database: An integrated bioinformatics resource for influenza virus research.

    PubMed

    Zhang, Yun; Aevermann, Brian D; Anderson, Tavis K; Burke, David F; Dauphin, Gwenaelle; Gu, Zhiping; He, Sherry; Kumar, Sanjeev; Larsen, Christopher N; Lee, Alexandra J; Li, Xiaomei; Macken, Catherine; Mahaffey, Colin; Pickett, Brett E; Reardon, Brian; Smith, Thomas; Stewart, Lucy; Suloway, Christian; Sun, Guangyu; Tong, Lei; Vincent, Amy L; Walters, Bryan; Zaremba, Sam; Zhao, Hongtao; Zhou, Liwei; Zmasek, Christian; Klem, Edward B; Scheuermann, Richard H

    2017-01-04

    The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics and therapeutics against influenza virus by providing a comprehensive collection of influenza-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, personal workbench spaces for data storage and sharing, and active user community support. Here, we describe the recent improvements in IRD including the use of cloud and high performance computing resources, analysis and visualization of user-provided sequence data with associated metadata, predictions of novel variant proteins, annotations of phenotype-associated sequence markers and their predicted phenotypic effects, hemagglutinin (HA) clade classifications, an automated tool for HA subtype numbering conversion, linkouts to disease event data and the addition of host factor and antiviral drug components. All data and tools are freely available without restriction from the IRD website at https://www.fludb.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies

    PubMed Central

    Yang, Tsun-Po; Beazley, Claude; Montgomery, Stephen B.; Dimas, Antigone S.; Gutierrez-Arcelus, Maria; Stranger, Barbara E.; Deloukas, Panos; Dermitzakis, Emmanouil T.

    2010-01-01

    Summary: Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols. Availability: http://www.sanger.ac.uk/resources/software/genevar Contact: emmanouil.dermitzakis@unige.ch PMID:20702402

  17. Integration among databases and data sets to support productive nanotechnology: Challenges and recommendations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karcher, Sandra; Willighagen, Egon L.; Rumble, John

    Many groups within the broad field of nanoinformatics are already developing data repositories and analytical tools driven by their individual organizational goals. Integrating these data resources across disciplines and with non-nanotechnology resources can support multiple objectives by enabling the reuse of the same information. Integration can also serve as the impetus for novel scientific discoveries by providing the framework to support deeper data analyses. This article discusses current data integration practices in nanoinformatics and in comparable mature fields, and nanotechnology-specific challenges impacting data integration. Based on results from a nanoinformatics-community-wide survey, recommendations for achieving integration of existing operational nanotechnology resourcesmore » are presented. Nanotechnology-specific data integration challenges, if effectively resolved, can foster the application and validation of nanotechnology within and across disciplines. This paper is one of a series of articles by the Nanomaterial Data Curation Initiative that address data issues such as data curation workflows, data completeness and quality, curator responsibilities, and metadata.« less

  18. DDRprot: a database of DNA damage response-related proteins.

    PubMed

    Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M

    2016-01-01

    The DNA Damage Response (DDR) signalling network is an essential system that protects the genome's integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.Database URL: http://ddr.cbbio.es. © The Author(s) 2016. Published by Oxford University Press.

  19. ATtRACT-a database of RNA-binding proteins and associated motifs.

    PubMed

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.

  20. DB Dehydrogenase: an online integrated structural database on enzyme dehydrogenase.

    PubMed

    Nandy, Suman Kumar; Bhuyan, Rajabrata; Seal, Alpana

    2012-01-01

    Dehydrogenase enzymes are almost inevitable for metabolic processes. Shortage or malfunctioning of dehydrogenases often leads to several acute diseases like cancers, retinal diseases, diabetes mellitus, Alzheimer, hepatitis B & C etc. With advancement in modern-day research, huge amount of sequential, structural and functional data are generated everyday and widens the gap between structural attributes and its functional understanding. DB Dehydrogenase is an effort to relate the functionalities of dehydrogenase with its structures. It is a completely web-based structural database, covering almost all dehydrogenases [~150 enzyme classes, ~1200 entries from ~160 organisms] whose structures are known. It is created by extracting and integrating various online resources to provide the true and reliable data and implemented by MySQL relational database through user friendly web interfaces using CGI Perl. Flexible search options are there for data extraction and exploration. To summarize, sequence, structure, function of all dehydrogenases in one place along with the necessary option of cross-referencing; this database will be utile for researchers to carry out further work in this field. The database is available for free at http://www.bifku.in/DBD/

  1. Using EMBL-EBI services via Web interface and programmatically via Web Services

    PubMed Central

    Lopez, Rodrigo; Cowley, Andrew; Li, Weizhong; McWilliam, Hamish

    2015-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages. PMID:25501941

  2. 'RetinoGenetics': a comprehensive mutation database for genes related to inherited retinal degeneration.

    PubMed

    Ran, Xia; Cai, Wei-Jun; Huang, Xiu-Feng; Liu, Qi; Lu, Fan; Qu, Jia; Wu, Jinyu; Jin, Zi-Bing

    2014-01-01

    Inherited retinal degeneration (IRD), a leading cause of human blindness worldwide, is exceptionally heterogeneous with clinical heterogeneity and genetic variety. During the past decades, tremendous efforts have been made to explore the complex heterogeneity, and massive mutations have been identified in different genes underlying IRD with the significant advancement of sequencing technology. In this study, we developed a comprehensive database, 'RetinoGenetics', which contains informative knowledge about all known IRD-related genes and mutations for IRD. 'RetinoGenetics' currently contains 4270 mutations in 186 genes, with detailed information associated with 164 phenotypes from 934 publications and various types of functional annotations. Then extensive annotations were performed to each gene using various resources, including Gene Ontology, KEGG pathways, protein-protein interaction, mutational annotations and gene-disease network. Furthermore, by using the search functions, convenient browsing ways and intuitive graphical displays, 'RetinoGenetics' could serve as a valuable resource for unveiling the genetic basis of IRD. Taken together, 'RetinoGenetics' is an integrative, informative and updatable resource for IRD-related genetic predispositions. Database URL: http://www.retinogenetics.org/. © The Author(s) 2014. Published by Oxford University Press.

  3. LIVIVO - the Vertical Search Engine for Life Sciences.

    PubMed

    Müller, Bernd; Poley, Christoph; Pössel, Jana; Hagelstein, Alexandra; Gübitz, Thomas

    2017-01-01

    The explosive growth of literature and data in the life sciences challenges researchers to keep track of current advancements in their disciplines. Novel approaches in the life science like the One Health paradigm require integrated methodologies in order to link and connect heterogeneous information from databases and literature resources. Current publications in the life sciences are increasingly characterized by the employment of trans-disciplinary methodologies comprising molecular and cell biology, genetics, genomic, epigenomic, transcriptional and proteomic high throughput technologies with data from humans, plants, and animals. The literature search engine LIVIVO empowers retrieval functionality by incorporating various literature resources from medicine, health, environment, agriculture and nutrition. LIVIVO is developed in-house by ZB MED - Information Centre for Life Sciences. It provides a user-friendly and usability-tested search interface with a corpus of 55 Million citations derived from 50 databases. Standardized application programming interfaces are available for data export and high throughput retrieval. The search functions allow for semantic retrieval with filtering options based on life science entities. The service oriented architecture of LIVIVO uses four different implementation layers to deliver search services. A Knowledge Environment is developed by ZB MED to deal with the heterogeneity of data as an integrative approach to model, store, and link semantic concepts within literature resources and databases. Future work will focus on the exploitation of life science ontologies and on the employment of NLP technologies in order to improve query expansion, filters in faceted search, and concept based relevancy rankings in LIVIVO.

  4. dbSUPER: a database of super-enhancers in mouse and human genome

    PubMed Central

    Khan, Aziz; Zhang, Xuegong

    2016-01-01

    Super-enhancers are clusters of transcriptional enhancers that drive cell-type-specific gene expression and are crucial to cell identity. Many disease-associated sequence variations are enriched in super-enhancer regions of disease-relevant cell types. Thus, super-enhancers can be used as potential biomarkers for disease diagnosis and therapeutics. Current studies have identified super-enhancers in more than 100 cell types and demonstrated their functional importance. However, a centralized resource to integrate all these findings is not currently available. We developed dbSUPER (http://bioinfo.au.tsinghua.edu.cn/dbsuper/), the first integrated and interactive database of super-enhancers, with the primary goal of providing a resource for assistance in further studies related to transcriptional control of cell identity and disease. dbSUPER provides a responsive and user-friendly web interface to facilitate efficient and comprehensive search and browsing. The data can be easily sent to Galaxy instances, GREAT and Cistrome web-servers for downstream analysis, and can also be visualized in the UCSC genome browser where custom tracks can be added automatically. The data can be downloaded and exported in variety of formats. Furthermore, dbSUPER lists genes associated with super-enhancers and also links to external databases such as GeneCards, UniProt and Entrez. dbSUPER also provides an overlap analysis tool to annotate user-defined regions. We believe dbSUPER is a valuable resource for the biology and genetic research communities. PMID:26438538

  5. Gene: a gene-centered information resource at NCBI.

    PubMed

    Brown, Garth R; Hem, Vichet; Katz, Kenneth S; Ovetsky, Michael; Wallin, Craig; Ermolaeva, Olga; Tolstoy, Igor; Tatusova, Tatiana; Pruitt, Kim D; Maglott, Donna R; Murphy, Terence D

    2015-01-01

    The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  6. Resources for Functional Genomics Studies in Drosophila melanogaster

    PubMed Central

    Mohr, Stephanie E.; Hu, Yanhui; Kim, Kevin; Housden, Benjamin E.; Perrimon, Norbert

    2014-01-01

    Drosophila melanogaster has become a system of choice for functional genomic studies. Many resources, including online databases and software tools, are now available to support design or identification of relevant fly stocks and reagents or analysis and mining of existing functional genomic, transcriptomic, proteomic, etc. datasets. These include large community collections of fly stocks and plasmid clones, “meta” information sites like FlyBase and FlyMine, and an increasing number of more specialized reagents, databases, and online tools. Here, we introduce key resources useful to plan large-scale functional genomics studies in Drosophila and to analyze, integrate, and mine the results of those studies in ways that facilitate identification of highest-confidence results and generation of new hypotheses. We also discuss ways in which existing resources can be used and might be improved and suggest a few areas of future development that would further support large- and small-scale studies in Drosophila and facilitate use of Drosophila information by the research community more generally. PMID:24653003

  7. [Design and implementation of supply security monitoring and analysis system for Chinese patent medicines supply in national essential medicines].

    PubMed

    Wang, Hui; Zhang, Xiao-Bo; Huang, Lu-Qi; Guo, Lan-Ping; Wang, Ling; Zhao, Yu-Ping; Yang, Guang

    2017-11-01

    The supply of Chinese patent medicine is influenced by the price of raw materials (Chinese herbal medicines) and the stock of resources. On the one hand, raw material prices show cyclical volatility or even irreversible soaring, making the price of Chinese patent medicine is not stable or even the highest cost of hanging upside down. On the other hand, due to lack of resources or disable some of the proprietary Chinese medicine was forced to stop production. Based on the micro-service architecture and Redis cluster deployment Based on the micro-service architecture and Redis cluster deployment, the supply security monitoring and analysis system for Chinese patent medicines in national essential medicines has realized the dynamic monitoring and intelligence warning of herbs and Chinese patent medicine by connecting and integrating the database of Chinese medicine resources, the dynamic monitoring system of traditional Chinese medicine resources and the basic medicine database of Chinese patent medicine. Copyright© by the Chinese Pharmaceutical Association.

  8. Integration of environmental simulation models with satellite remote sensing and geographic information systems technologies: case studies

    USGS Publications Warehouse

    Steyaert, Louis T.; Loveland, Thomas R.; Brown, Jesslyn F.; Reed, Bradley C.

    1993-01-01

    Environmental modelers are testing and evaluating a prototype land cover characteristics database for the conterminous United States developed by the EROS Data Center of the U.S. Geological Survey and the University of Nebraska Center for Advanced Land Management Information Technologies. This database was developed from multi temporal, 1-kilometer advanced very high resolution radiometer (AVHRR) data for 1990 and various ancillary data sets such as elevation, ecological regions, and selected climatic normals. Several case studies using this database were analyzed to illustrate the integration of satellite remote sensing and geographic information systems technologies with land-atmosphere interactions models at a variety of spatial and temporal scales. The case studies are representative of contemporary environmental simulation modeling at local to regional levels in global change research, land and water resource management, and environmental simulation modeling at local to regional levels in global change research, land and water resource management and environmental risk assessment. The case studies feature land surface parameterizations for atmospheric mesoscale and global climate models; biogenic-hydrocarbons emissions models; distributed parameter watershed and other hydrological models; and various ecological models such as ecosystem, dynamics, biogeochemical cycles, ecotone variability, and equilibrium vegetation models. The case studies demonstrate the important of multi temporal AVHRR data to develop to develop and maintain a flexible, near-realtime land cover characteristics database. Moreover, such a flexible database is needed to derive various vegetation classification schemes, to aggregate data for nested models, to develop remote sensing algorithms, and to provide data on dynamic landscape characteristics. The case studies illustrate how such a database supports research on spatial heterogeneity, land use, sensitivity analysis, and scaling issues involving regional extrapolations and parameterizations of dynamic land processes within simulation models.

  9. Geologic database for digital geology of California, Nevada, and Utah: an application of the North American Data Model

    USGS Publications Warehouse

    Bedford, David R.; Ludington, Steve; Nutt, Constance M.; Stone, Paul A.; Miller, David M.; Miller, Robert J.; Wagner, David L.; Saucedo, George J.

    2003-01-01

    The USGS is creating an integrated national database for digital state geologic maps that includes stratigraphic, age, and lithologic information. The majority of the conterminous 48 states have digital geologic base maps available, often at scales of 1:500,000. This product is a prototype, and is intended to demonstrate the types of derivative maps that will be possible with the national integrated database. This database permits the creation of a number of types of maps via simple or sophisticated queries, maps that may be useful in a number of areas, including mineral-resource assessment, environmental assessment, and regional tectonic evolution. This database is distributed with three main parts: a Microsoft Access 2000 database containing geologic map attribute data, an Arc/Info (Environmental Systems Research Institute, Redlands, California) Export format file containing points representing designation of stratigraphic regions for the Geologic Map of Utah, and an ArcView 3.2 (Environmental Systems Research Institute, Redlands, California) project containing scripts and dialogs for performing a series of generalization and mineral resource queries. IMPORTANT NOTE: Spatial data for the respective stage geologic maps is not distributed with this report. The digital state geologic maps for the states involved in this report are separate products, and two of them are produced by individual state agencies, which may be legally and/or financially responsible for this data. However, the spatial datasets for maps discussed in this report are available to the public. Questions regarding the distribution, sale, and use of individual state geologic maps should be sent to the respective state agency. We do provide suggestions for obtaining and formatting the spatial data to make it compatible with data in this report. See section ‘Obtaining and Formatting Spatial Data’ in the PDF version of the report.

  10. An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases.

    PubMed

    Yang, Jin Ok; Hwang, Sohyun; Oh, Jeongsu; Bhak, Jong; Sohn, Tae-Kwon

    2008-12-12

    Studies on the relationship between disease and genetic variations such as single nucleotide polymorphisms (SNPs) are important. Genetic variations can cause disease by influencing important biological regulation processes. Despite the needs for analyzing SNP and disease correlation, most existing databases provide information only on functional variants at specific locations on the genome, or deal with only a few genes associated with disease. There is no combined resource to widely support gene-, SNP-, and disease-related information, and to capture relationships among such data. Therefore, we developed an integrated database-pipeline system for studying SNPs and diseases. To implement the pipeline system for the integrated database, we first unified complicated and redundant disease terms and gene names using the Unified Medical Language System (UMLS) for classification and noun modification, and the HUGO Gene Nomenclature Committee (HGNC) and NCBI gene databases. Next, we collected and integrated representative databases for three categories of information. For genes and proteins, we examined the NCBI mRNA, UniProt, UCSC Table Track and MitoDat databases. For genetic variants we used the dbSNP, JSNP, ALFRED, and HGVbase databases. For disease, we employed OMIM, GAD, and HGMD databases. The database-pipeline system provides a disease thesaurus, including genes and SNPs associated with disease. The search results for these categories are available on the web page http://diseasome.kobic.re.kr/, and a genome browser is also available to highlight findings, as well as to permit the convenient review of potentially deleterious SNPs among genes strongly associated with specific diseases and clinical phenotypes. Our system is designed to capture the relationships between SNPs associated with disease and disease-causing genes. The integrated database-pipeline provides a list of candidate genes and SNP markers for evaluation in both epidemiological and molecular biological approaches to diseases-gene association studies. Furthermore, researchers then can decide semi-automatically the data set for association studies while considering the relationships between genetic variation and diseases. The database can also be economical for disease-association studies, as well as to facilitate an understanding of the processes which cause disease. Currently, the database contains 14,674 SNP records and 109,715 gene records associated with human diseases and it is updated at regular intervals.

  11. The Neuroscience Information Framework: A Data and Knowledge Environment for Neuroscience

    PubMed Central

    Akil, Huda; Ascoli, Giorgio A.; Bowden, Douglas M.; Bug, William; Donohue, Duncan E.; Goldberg, David H.; Grafstein, Bernice; Grethe, Jeffrey S.; Gupta, Amarnath; Halavi, Maryam; Kennedy, David N.; Marenco, Luis; Martone, Maryann E.; Miller, Perry L.; Müller, Hans-Michael; Robert, Adrian; Shepherd, Gordon M.; Sternberg, Paul W.; Van Essen, David C.; Williams, Robert W.

    2009-01-01

    With support from the Institutes and Centers forming the NIH Blueprint for Neuroscience Research, we have designed and implemented a new initiative for integrating access to and use of Web-based neuroscience resources: the Neuroscience Information Framework. The Framework arises from the expressed need of the neuroscience community for neuroinformatic tools and resources to aid scientific inquiry, builds upon prior development of neuroinformatics by the Human Brain Project and others, and directly derives from the Society for Neuroscience’s Neuroscience Database Gateway. Partnered with the Society, its Neuroinformatics Committee, and volunteer consultant-collaborators, our multi-site consortium has developed: (1) a comprehensive, dynamic, inventory of Web-accessible neuroscience resources, (2) an extended and integrated terminology describing resources and contents, and (3) a framework accepting and aiding concept-based queries. Evolving instantiations of the Framework may be viewed at http://nif.nih.gov, http://neurogateway.org, and other sites as they come on line. PMID:18946742

  12. The neuroscience information framework: a data and knowledge environment for neuroscience.

    PubMed

    Gardner, Daniel; Akil, Huda; Ascoli, Giorgio A; Bowden, Douglas M; Bug, William; Donohue, Duncan E; Goldberg, David H; Grafstein, Bernice; Grethe, Jeffrey S; Gupta, Amarnath; Halavi, Maryam; Kennedy, David N; Marenco, Luis; Martone, Maryann E; Miller, Perry L; Müller, Hans-Michael; Robert, Adrian; Shepherd, Gordon M; Sternberg, Paul W; Van Essen, David C; Williams, Robert W

    2008-09-01

    With support from the Institutes and Centers forming the NIH Blueprint for Neuroscience Research, we have designed and implemented a new initiative for integrating access to and use of Web-based neuroscience resources: the Neuroscience Information Framework. The Framework arises from the expressed need of the neuroscience community for neuroinformatic tools and resources to aid scientific inquiry, builds upon prior development of neuroinformatics by the Human Brain Project and others, and directly derives from the Society for Neuroscience's Neuroscience Database Gateway. Partnered with the Society, its Neuroinformatics Committee, and volunteer consultant-collaborators, our multi-site consortium has developed: (1) a comprehensive, dynamic, inventory of Web-accessible neuroscience resources, (2) an extended and integrated terminology describing resources and contents, and (3) a framework accepting and aiding concept-based queries. Evolving instantiations of the Framework may be viewed at http://nif.nih.gov , http://neurogateway.org , and other sites as they come on line.

  13. Automated knowledge base development from CAD/CAE databases

    NASA Technical Reports Server (NTRS)

    Wright, R. Glenn; Blanchard, Mary

    1988-01-01

    Knowledge base development requires a substantial investment in time, money, and resources in order to capture the knowledge and information necessary for anything other than trivial applications. This paper addresses a means to integrate the design and knowledge base development process through automated knowledge base development from CAD/CAE databases and files. Benefits of this approach include the development of a more efficient means of knowledge engineering, resulting in the timely creation of large knowledge based systems that are inherently free of error.

  14. Investigation of blended learning video resources to teach health students clinical skills: An integrative review.

    PubMed

    Coyne, Elisabeth; Rands, Hazel; Frommolt, Valda; Kain, Victoria; Plugge, Melanie; Mitchell, Marion

    2018-04-01

    The aim of this review is to inform future educational strategies by synthesising research related to blended learning resources using simulation videos to teach clinical skills for health students. An integrative review methodology was used to allow for the combination of diverse research methods to better understand the research topic. This review was guided by the framework described by Whittemore and Knafl (2005), DATA SOURCES: Systematic search of the following databases was conducted in consultation with a librarian using the following databases: SCOPUS, MEDLINE, COCHRANE, PsycINFO databases. Keywords and MeSH terms: clinical skills, nursing, health, student, blended learning, video, simulation and teaching. Data extracted from the studies included author, year, aims, design, sample, skill taught, outcome measures and findings. After screening the articles, extracting project data and completing summary tables, critical appraisal of the projects was completed using the Mixed Methods Appraisal Tool (MMAT). Ten articles met all the inclusion criteria and were included in this review. The MMAT scores varied from 50% to 100%. Thematic analysis was undertaken and we identified the following three themes: linking theory to practice, autonomy of learning and challenges of developing a blended learning model. Blended learning allowed for different student learning styles, repeated viewing, and enabled links between theory and practice. The video presentation needed to be realistic and culturally appropriate and this required both time and resources to create. A blended learning model, which incorporates video-assisted online resources, may be a useful tool to teach clinical skills to students of health including nursing. Blended learning not only increases students' knowledge and skills, but is often preferred by students due to its flexibility. Copyright © 2018 Elsevier Ltd. All rights reserved.

  15. The European Bioinformatics Institute's data resources 2014.

    PubMed

    Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

    2014-01-01

    Molecular Biology has been at the heart of the 'big data' revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff's 'Atlas of Protein Sequence and Structure' through the Human Genome Project in the late 1990s and early 2000s to today's population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI's database collection to complement the reviews of individual databases provided elsewhere in this issue.

  16. The evolution of a health hazard assessment database management system for military weapons, equipment, and materiel.

    PubMed

    Murnyak, George R; Spencer, Clark O; Chaney, Ann E; Roberts, Welford C

    2002-04-01

    During the 1970s, the Army health hazard assessment (HHA) process developed as a medical program to minimize hazards in military materiel during the development process. The HHA Program characterizes health hazards that soldiers and civilians may encounter as they interact with military weapons and equipment. Thus, it is a resource for medical planners and advisors to use that can identify and estimate potential hazards that soldiers may encounter as they train and conduct missions. The U.S. Army Center for Health Promotion and Preventive Medicine administers the program, which is integrated with the Army's Manpower and Personnel Integration program. As the HHA Program has matured, an electronic database has been developed to record and monitor the health hazards associated with military equipment and systems. The current database tracks the results of HHAs and provides reporting designed to assist the HHA Program manager in daily activities.

  17. CicerTransDB 1.0: a resource for expression and functional study of chickpea transcription factors.

    PubMed

    Gayali, Saurabh; Acharya, Shankar; Lande, Nilesh Vikram; Pandey, Aarti; Chakraborty, Subhra; Chakraborty, Niranjan

    2016-07-29

    Transcription factor (TF) databases are major resource for systematic studies of TFs in specific species as well as related family members. Even though there are several publicly available multi-species databases, the information on the amount and diversity of TFs within individual species is fragmented, especially for newly sequenced genomes of non-model species of agricultural significance. We constructed CicerTransDB (Cicer Transcription Factor Database), the first database of its kind, which would provide a centralized putatively complete list of TFs in a food legume, chickpea. CicerTransDB, available at www.cicertransdb.esy.es , is based on chickpea (Cicer arietinum L.) annotation v 1.0. The database is an outcome of genome-wide domain study and manual classification of TF families. This database not only provides information of the gene, but also gene ontology, domain and motif architecture. CicerTransDB v 1.0 comprises information of 1124 genes of chickpea and enables the user to not only search, browse and download sequences but also retrieve sequence features. CicerTransDB also provides several single click interfaces, transconnecting to various other databases to ease further analysis. Several webAPI(s) integrated in the database allow end-users direct access of data. A critical comparison of CicerTransDB with PlantTFDB (Plant Transcription Factor Database) revealed 68 novel TFs in the chickpea genome, hitherto unexplored. Database URL: http://www.cicertransdb.esy.es.

  18. EarthChem: International Collaboration for Solid Earth Geochemistry in Geoinformatics

    NASA Astrophysics Data System (ADS)

    Walker, J. D.; Lehnert, K. A.; Hofmann, A. W.; Sarbas, B.; Carlson, R. W.

    2005-12-01

    The current on-line information systems for igneous rock geochemistry - PetDB, GEOROC, and NAVDAT - convincingly demonstrate the value of rigorous scientific data management of geochemical data for research and education. The next generation of hypothesis formulation and testing can be vastly facilitated by enhancing these electronic resources through integration of available datasets, expansion of data coverage in location, time, and tectonic setting, timely updates with new data, and through intuitive and efficient access and data analysis tools for the broader geosciences community. PetDB, GEOROC, and NAVDAT have therefore formed the EarthChem consortium (www.earthchem.org) as a international collaborative effort to address these needs and serve the larger earth science community by facilitating the compilation, communication, serving, and visualization of geochemical data, and their integration with other geological, geochronological, geophysical, and geodetic information to maximize their scientific application. We report on the status of and future plans for EarthChem activities. EarthChem's development plan includes: (1) expanding the functionality of the web portal to become a `one-stop shop for geochemical data' with search capability across databases, standardized and integrated data output, generally applicable tools for data quality assessment, and data analysis/visualization including plotting methods and an information-rich map interface; and (2) expanding data holdings by generating new datasets as identified and prioritized through community outreach, and facilitating data contributions from the community by offering web-based data submission capability and technical assistance for design, implementation, and population of new databases and their integration with all EarthChem data holdings. Such federated databases and datasets will retain their identity within the EarthChem system. We also plan on working with publishers to ease the assimilation of geochemical data into the EarthChem database. As a community resource, EarthChem will address user concerns and respond to broad scientific and educational needs. EarthChem will hold yearly workshops, town hall meetings, and/or exhibits at major meetings. The group has established a two-tier committee structure to help ease the communication and coordination of database and IT issues between existing data management projects, and to receive feedback and support from individuals and groups from the larger geosciences community.

  19. AOP-DB: A database resource for the exploration of Adverse Outcome Pathways through integrated association networks.

    EPA Science Inventory

    The Adverse Outcome Pathway (AOP) framework describes the progression of a toxicity pathway from molecular perturbation to population-level outcome in a series of measurable, mechanistic responses. The controlled, computer-readable vocabulary that defines an AOP has the ability t...

  20. The research and development of water resources management information system based on ArcGIS

    NASA Astrophysics Data System (ADS)

    Cui, Weiqun; Gao, Xiaoli; Li, Yuzhi; Cui, Zhencai

    According to that there are large amount of data, complexity of data type and format in the water resources management, we built the water resources calculation model and established the water resources management information system based on the advanced ArcGIS and Visual Studio.NET development platform. The system can integrate the spatial data and attribute data organically, and manage them uniformly. It can analyze spatial data, inquire by map and data bidirectionally, provide various charts and report forms automatically, link multimedia information, manage database etc. . So it can provide spatial and static synthetical information services for study, management and decision of water resources, regional geology and eco-environment etc..

  1. UKPMC: a full text article resource for the life sciences.

    PubMed

    McEntyre, Johanna R; Ananiadou, Sophia; Andrews, Stephen; Black, William J; Boulderstone, Richard; Buttery, Paula; Chaplin, David; Chevuru, Sandeepreddy; Cobley, Norman; Coleman, Lee-Ann; Davey, Paul; Gupta, Bharti; Haji-Gholam, Lesley; Hawkins, Craig; Horne, Alan; Hubbard, Simon J; Kim, Jee-Hyub; Lewin, Ian; Lyte, Vic; MacIntyre, Ross; Mansoor, Sami; Mason, Linda; McNaught, John; Newbold, Elizabeth; Nobata, Chikashi; Ong, Ernest; Pillai, Sharmila; Rebholz-Schuhmann, Dietrich; Rosie, Heather; Rowbotham, Rob; Rupp, C J; Stoehr, Peter; Vaughan, Philip

    2011-01-01

    UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first 'mirror' site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access biomedical literature. UKPMC (http://ukpmc.ac.uk) has undergone considerable development since its inception in 2007 and now includes both a UKPMC and PubMed search, as well as access to other records such as Agricola, Patents and recent biomedical theses. UKPMC also differs from PubMed/PMC in that the full text and abstract information can be searched in an integrated manner from one input box. Furthermore, UKPMC contains 'Cited By' information as an alternative way to navigate the literature and has incorporated text-mining approaches to semantically enrich content and integrate it with related database resources. Finally, UKPMC also offers added-value services (UKPMC+) that enable grantees to deposit manuscripts, link papers to grants, publish online portfolios and view citation information on their papers. Here we describe UKPMC and clarify the relationship between PMC and UKPMC, providing historical context and future directions, 10 years on from when PMC was first launched.

  2. The BioExtract Server: a web-based bioinformatic workflow platform

    PubMed Central

    Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

    2011-01-01

    The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552

  3. UKPMC: a full text article resource for the life sciences

    PubMed Central

    McEntyre, Johanna R.; Ananiadou, Sophia; Andrews, Stephen; Black, William J.; Boulderstone, Richard; Buttery, Paula; Chaplin, David; Chevuru, Sandeepreddy; Cobley, Norman; Coleman, Lee-Ann; Davey, Paul; Gupta, Bharti; Haji-Gholam, Lesley; Hawkins, Craig; Horne, Alan; Hubbard, Simon J.; Kim, Jee-Hyub; Lewin, Ian; Lyte, Vic; MacIntyre, Ross; Mansoor, Sami; Mason, Linda; McNaught, John; Newbold, Elizabeth; Nobata, Chikashi; Ong, Ernest; Pillai, Sharmila; Rebholz-Schuhmann, Dietrich; Rosie, Heather; Rowbotham, Rob; Rupp, C. J.; Stoehr, Peter; Vaughan, Philip

    2011-01-01

    UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first ‘mirror’ site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access biomedical literature. UKPMC (http://ukpmc.ac.uk) has undergone considerable development since its inception in 2007 and now includes both a UKPMC and PubMed search, as well as access to other records such as Agricola, Patents and recent biomedical theses. UKPMC also differs from PubMed/PMC in that the full text and abstract information can be searched in an integrated manner from one input box. Furthermore, UKPMC contains ‘Cited By’ information as an alternative way to navigate the literature and has incorporated text-mining approaches to semantically enrich content and integrate it with related database resources. Finally, UKPMC also offers added-value services (UKPMC+) that enable grantees to deposit manuscripts, link papers to grants, publish online portfolios and view citation information on their papers. Here we describe UKPMC and clarify the relationship between PMC and UKPMC, providing historical context and future directions, 10 years on from when PMC was first launched. PMID:21062818

  4. IMG: the integrated microbial genomes database and comparative analysis system

    PubMed Central

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2012-01-01

    The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp). PMID:22194640

  5. IMG: the Integrated Microbial Genomes database and comparative analysis system.

    PubMed

    Markowitz, Victor M; Chen, I-Min A; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N; Kyrpides, Nikos C

    2012-01-01

    The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp).

  6. HoPaCI-DB: host-Pseudomonas and Coxiella interaction database

    PubMed Central

    Bleves, Sophie; Dunger, Irmtraud; Walter, Mathias C.; Frangoulidis, Dimitrios; Kastenmüller, Gabi; Voulhoux, Romé; Ruepp, Andreas

    2014-01-01

    Bacterial infectious diseases are the result of multifactorial processes affected by the interplay between virulence factors and host targets. The host-Pseudomonas and Coxiella interaction database (HoPaCI-DB) is a publicly available manually curated integrative database (http://mips.helmholtz-muenchen.de/HoPaCI/) of host–pathogen interaction data from Pseudomonas aeruginosa and Coxiella burnetii. The resource provides structured information on 3585 experimentally validated interactions between molecules, bioprocesses and cellular structures extracted from the scientific literature. Systematic annotation and interactive graphical representation of disease networks make HoPaCI-DB a versatile knowledge base for biologists and network biology approaches. PMID:24137008

  7. The Coral Triangle Atlas: an integrated online spatial database system for improving coral reef management.

    PubMed

    Cros, Annick; Ahamad Fatan, Nurulhuda; White, Alan; Teoh, Shwu Jiau; Tan, Stanley; Handayani, Christian; Huang, Charles; Peterson, Nate; Venegas Li, Ruben; Siry, Hendra Yusran; Fitriana, Ria; Gove, Jamison; Acoba, Tomoko; Knight, Maurice; Acosta, Renerio; Andrew, Neil; Beare, Doug

    2014-01-01

    In this paper we describe the construction of an online GIS database system, hosted by WorldFish, which stores bio-physical, ecological and socio-economic data for the 'Coral Triangle Area' in South-east Asia and the Pacific. The database has been built in partnership with all six (Timor-Leste, Malaysia, Indonesia, The Philippines, Solomon Islands and Papua New Guinea) of the Coral Triangle countries, and represents a valuable source of information for natural resource managers at the regional scale. Its utility is demonstrated using biophysical data, data summarising marine habitats, and data describing the extent of marine protected areas in the region.

  8. Advanced SPARQL querying in small molecule databases.

    PubMed

    Galgonek, Jakub; Hurt, Tomáš; Michlíková, Vendula; Onderka, Petr; Schwarz, Jan; Vondrášek, Jiří

    2016-01-01

    In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF.

  9. GALT protein database, a bioinformatics resource for the management and analysis of structural features of a galactosemia-related protein and its mutants.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2009-06-01

    We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.

  10. Chemical Informatics and the Drug Discovery Knowledge Pyramid

    PubMed Central

    Lushington, Gerald H.; Dong, Yinghua; Theertham, Bhargav

    2012-01-01

    The magnitude of the challenges in preclinical drug discovery is evident in the large amount of capital invested in such efforts in pursuit of a small static number of eventually successful marketable therapeutics. An explosion in the availability of potentially drug-like compounds and chemical biology data on these molecules can provide us with the means to improve the eventual success rates for compounds being considered at the preclinical level, but only if the community is able to access available information in an efficient and meaningful way. Thus, chemical database resources are critical to any serious drug discovery effort. This paper explores the basic principles underlying the development and implementation of chemical databases, and examines key issues of how molecular information may be encoded within these databases so as to enhance the likelihood that users will be able to extract meaningful information from data queries. In addition to a broad survey of conventional data representation and query strategies, key enabling technologies such as new context-sensitive chemical similarity measures and chemical cartridges are examined, with recommendations on how such resources may be integrated into a practical database environment. PMID:23782037

  11. Extending the NIF DISCO framework to automate complex workflow: coordinating the harvest and integration of data from diverse neuroscience information resources

    PubMed Central

    Marenco, Luis N.; Wang, Rixin; Bandrowski, Anita E.; Grethe, Jeffrey S.; Shepherd, Gordon M.; Miller, Perry L.

    2014-01-01

    This paper describes how DISCO, the data aggregator that supports the Neuroscience Information Framework (NIF), has been extended to play a central role in automating the complex workflow required to support and coordinate the NIF’s data integration capabilities. The NIF is an NIH Neuroscience Blueprint initiative designed to help researchers access the wealth of data related to the neurosciences available via the Internet. A central component is the NIF Federation, a searchable database that currently contains data from 231 data and information resources regularly harvested, updated, and warehoused in the DISCO system. In the past several years, DISCO has greatly extended its functionality and has evolved to play a central role in automating the complex, ongoing process of harvesting, validating, integrating, and displaying neuroscience data from a growing set of participating resources. This paper provides an overview of DISCO’s current capabilities and discusses a number of the challenges and future directions related to the process of coordinating the integration of neuroscience data within the NIF Federation. PMID:25018728

  12. Extending the NIF DISCO framework to automate complex workflow: coordinating the harvest and integration of data from diverse neuroscience information resources.

    PubMed

    Marenco, Luis N; Wang, Rixin; Bandrowski, Anita E; Grethe, Jeffrey S; Shepherd, Gordon M; Miller, Perry L

    2014-01-01

    This paper describes how DISCO, the data aggregator that supports the Neuroscience Information Framework (NIF), has been extended to play a central role in automating the complex workflow required to support and coordinate the NIF's data integration capabilities. The NIF is an NIH Neuroscience Blueprint initiative designed to help researchers access the wealth of data related to the neurosciences available via the Internet. A central component is the NIF Federation, a searchable database that currently contains data from 231 data and information resources regularly harvested, updated, and warehoused in the DISCO system. In the past several years, DISCO has greatly extended its functionality and has evolved to play a central role in automating the complex, ongoing process of harvesting, validating, integrating, and displaying neuroscience data from a growing set of participating resources. This paper provides an overview of DISCO's current capabilities and discusses a number of the challenges and future directions related to the process of coordinating the integration of neuroscience data within the NIF Federation.

  13. CEBS: a comprehensive annotated database of toxicological data

    PubMed Central

    Lea, Isabel A.; Gong, Hui; Paleja, Anand; Rashid, Asif; Fostel, Jennifer

    2017-01-01

    The Chemical Effects in Biological Systems database (CEBS) is a comprehensive and unique toxicology resource that compiles individual and summary animal data from the National Toxicology Program (NTP) testing program and other depositors into a single electronic repository. CEBS has undergone significant updates in recent years and currently contains over 11 000 test articles (exposure agents) and over 8000 studies including all available NTP carcinogenicity, short-term toxicity and genetic toxicity studies. Study data provided to CEBS are manually curated, accessioned and subject to quality assurance review prior to release to ensure high quality. The CEBS database has two main components: data collection and data delivery. To accommodate the breadth of data produced by NTP, the CEBS data collection component is an integrated relational design that allows the flexibility to capture any type of electronic data (to date). The data delivery component of the database comprises a series of dedicated user interface tables containing pre-processed data that support each component of the user interface. The user interface has been updated to include a series of nine Guided Search tools that allow access to NTP summary and conclusion data and larger non-NTP datasets. The CEBS database can be accessed online at http://www.niehs.nih.gov/research/resources/databases/cebs/. PMID:27899660

  14. DASMiner: discovering and integrating data from DAS sources

    PubMed Central

    2009-01-01

    Background DAS is a widely adopted protocol for providing syntactic interoperability among biological databases. The popularity of DAS is due to a simplified and elegant mechanism for data exchange that consists of sources exposing their RESTful interfaces for data access. As a growing number of DAS services are available for molecular biology resources, there is an incentive to explore this protocol in order to advance data discovery and integration among these resources. Results We developed DASMiner, a Matlab toolkit for querying DAS data sources that enables creation of integrated biological models using the information available in DAS-compliant repositories. DASMiner is composed by a browser application and an API that work together to facilitate gathering of data from different DAS sources, which can be used for creating enriched datasets from multiple sources. The browser is used to formulate queries and navigate data contained in DAS sources. Users can execute queries against these sources in an intuitive fashion, without the need of knowing the specific DAS syntax for the particular source. Using the source's metadata provided by the DAS Registry, the browser's layout adapts to expose only the set of commands and coordinate systems supported by the specific source. For this reason, the browser can interrogate any DAS source, independently of the type of data being served. The API component of DASMiner may be used for programmatic access of DAS sources by programs in Matlab. Once the desired data is found during navigation, the query is exported in the format of an API call to be used within any Matlab application. We illustrate the use of DASMiner by creating integrative models of histone modification maps and protein-protein interaction networks. These enriched datasets were built by retrieving and integrating distributed genomic and proteomic DAS sources using the API. Conclusion The support of the DAS protocol allows that hundreds of molecular biology databases to be treated as a federated, online collection of resources. DASMiner enables full exploration of these resources, and can be used to deploy applications and create integrated views of biological systems using the information deposited in DAS repositories. PMID:19919683

  15. Databases, data integration, and expert systems: new directions in mineral resource assessment and mineral exploration

    USGS Publications Warehouse

    McCammon, Richard B.; Ramani, Raja V.; Mozumdar, Bijoy K.; Samaddar, Arun B.

    1994-01-01

    Overcoming future difficulties in searching for ore deposits deeper in the earth's crust will require closer attention to the collection and analysis of more diverse types of data and to more efficient use of current computer technologies. Computer technologies of greatest interest include methods of storage and retrieval of resource information, methods for integrating geologic, geochemical, and geophysical data, and the introduction of advanced computer technologies such as expert systems, multivariate techniques, and neural networks. Much experience has been gained in the past few years in applying these technologies. More experience is needed if they are to be implemented for everyday use in future assessments and exploration.

  16. Using EMBL-EBI Services via Web Interface and Programmatically via Web Services.

    PubMed

    Lopez, Rodrigo; Cowley, Andrew; Li, Weizhong; McWilliam, Hamish

    2014-12-12

    The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages. Copyright © 2014 John Wiley & Sons, Inc.

  17. Plant Genome Resources at the National Center for Biotechnology Information

    PubMed Central

    Wheeler, David L.; Smith-White, Brian; Chetvernin, Vyacheslav; Resenchuk, Sergei; Dombrowski, Susan M.; Pechous, Steven W.; Tatusova, Tatiana; Ostell, James

    2005-01-01

    The National Center for Biotechnology Information (NCBI) integrates data from more than 20 biological databases through a flexible search and retrieval system called Entrez. A core Entrez database, Entrez Nucleotide, includes GenBank and is tightly linked to the NCBI Taxonomy database, the Entrez Protein database, and the scientific literature in PubMed. A suite of more specialized databases for genomes, genes, gene families, gene expression, gene variation, and protein domains dovetails with the core databases to make Entrez a powerful system for genomic research. Linked to the full range of Entrez databases is the NCBI Map Viewer, which displays aligned genetic, physical, and sequence maps for eukaryotic genomes including those of many plants. A specialized plant query page allow maps from all plant genomes covered by the Map Viewer to be searched in tandem to produce a display of aligned maps from several species. PlantBLAST searches against the sequences shown in the Map Viewer allow BLAST alignments to be viewed within a genomic context. In addition, precomputed sequence similarities, such as those for proteins offered by BLAST Link, enable fluid navigation from unannotated to annotated sequences, quickening the pace of discovery. NCBI Web pages for plants, such as Plant Genome Central, complete the system by providing centralized access to NCBI's genomic resources as well as links to organism-specific Web pages beyond NCBI. PMID:16010002

  18. Emissions & Generation Resource Integrated Database (eGRID), eGRID2010

    EPA Pesticide Factsheets

    The Emissions & Generation Resource Integrated Database (eGRID) is a comprehensive source of data on the environmental characteristics of almost all electric power generated in the United States. These environmental characteristics include air emissions for nitrogen oxides, sulfur dioxide, carbon dioxide, methane, and nitrous oxide; emissions rates; net generation; resource mix; and many other attributes.eGRID2010 contains the complete release of year 2007 data, as well as years 2005 and 2004 data. Excel spreadsheets, full documentation, summary data, eGRID subregion and NERC region representational maps, and GHG emission factors are included in this data set. The Archived data in eGRID2002 contain years 1996 through 2000 data.For year 2007 data, the first Microsoft Excel workbook, Plant, contains boiler, generator, and plant spreadsheets. The second Microsoft Excel workbook, Aggregation, contains aggregated data by state, electric generating company, parent company, power control area, eGRID subregion, NERC region, and U.S. total levels. The third Microsoft Excel workbook, ImportExport, contains state import-export data, as well as U.S. generation and consumption data for years 2007, 2005, and 2004. For eGRID data for years 2005 and 2004, a user friendly web application, eGRIDweb, is available to select, view, print, and export specified data.

  19. MicroScope—an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data

    PubMed Central

    Vallenet, David; Belda, Eugeni; Calteau, Alexandra; Cruveiller, Stéphane; Engelen, Stefan; Lajus, Aurélie; Le Fèvre, François; Longin, Cyrille; Mornico, Damien; Roche, David; Rouy, Zoé; Salvignol, Gregory; Scarpelli, Claude; Thil Smith, Adam Alexander; Weiman, Marion; Médigue, Claudine

    2013-01-01

    MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest. PMID:23193269

  20. The European Bioinformatics Institute’s data resources 2014

    PubMed Central

    Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

    2014-01-01

    Molecular Biology has been at the heart of the ‘big data’ revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff’s ‘Atlas of Protein Sequence and Structure’ through the Human Genome Project in the late 1990s and early 2000s to today’s population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI’s database collection to complement the reviews of individual databases provided elsewhere in this issue. PMID:24271396

  1. G6PDdb, an integrated database of glucose-6-phosphate dehydrogenase (G6PD) mutations.

    PubMed

    Kwok, Colin J; Martin, Andrew C R; Au, Shannon W N; Lam, Veronica M S

    2002-03-01

    G6PDdb (http://www.rubic.rdg.ac.uk/g6pd/ or http://www.bioinf.org.uk/g6pd/) is a newly created web-accessible locus-specific mutation database for the human Glucose-6-phosphate dehydrogenase (G6PD) gene. The relational database integrates up-to-date mutational and structural data from various databanks (GenBank, Protein Data Bank, etc.) with biochemically characterized variants and their associated phenotypes obtained from published literature and the Favism website. An automated analysis of the mutations likely to have a significant impact on the structure of the protein has been performed using a recently developed procedure. The database may be queried online and the full results of the analysis of the structural impact of mutations are available. The web page provides a form for submitting additional mutation data and is linked to resources such as the Favism website, OMIM, HGMD, HGVBASE, and the PDB. This database provides insights into the molecular aspects and clinical significance of G6PD deficiency for researchers and clinicians and the web page functions as a knowledge base relevant to the understanding of G6PD deficiency and its management. Copyright 2002 Wiley-Liss, Inc.

  2. MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore.

    PubMed

    Ren, Jian; Liu, Zexian; Gao, Xinjiao; Jin, Changjiang; Ye, Mingliang; Zou, Hanfa; Wen, Longping; Zhang, Zhaolei; Xue, Yu; Yao, Xuebiao

    2010-01-01

    During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into protein super complexes in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage furrow/phragmoplast/bud neck, and modulates cell division process faithfully. Although many experimental efforts have been carried out to investigate the characteristics of these proteins, no integrated database was available. Here, we present the MiCroKit database (http://microkit.biocuckoo.org) of proteins that localize in midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally verified microkit proteins from the scientific literature that have unambiguous supportive evidence for subcellular localization under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489 microkit proteins from seven model organisms, including Saccharomyces cerevisiae, Schizasaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. Moreover, the orthologous information was provided for these microkit proteins, and could be a useful resource for further experimental identification. The online service of MiCroKit database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0).

  3. MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore

    PubMed Central

    Liu, Zexian; Gao, Xinjiao; Jin, Changjiang; Ye, Mingliang; Zou, Hanfa; Wen, Longping; Zhang, Zhaolei; Xue, Yu; Yao, Xuebiao

    2010-01-01

    During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into protein super complexes in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage furrow/phragmoplast/bud neck, and modulates cell division process faithfully. Although many experimental efforts have been carried out to investigate the characteristics of these proteins, no integrated database was available. Here, we present the MiCroKit database (http://microkit.biocuckoo.org) of proteins that localize in midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally verified microkit proteins from the scientific literature that have unambiguous supportive evidence for subcellular localization under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489 microkit proteins from seven model organisms, including Saccharomyces cerevisiae, Schizasaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. Moreover, the orthologous information was provided for these microkit proteins, and could be a useful resource for further experimental identification. The online service of MiCroKit database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). PMID:19783819

  4. Teachers as Designers: Multimodal Immersion and Strategic Reading on the Internet

    ERIC Educational Resources Information Center

    Dalton, Bridget; Smith, Blaine E.

    2012-01-01

    This study examined teachers' literacy and technology integration in their design of Internet-based lessons for Grade 1-6 students using a tool that scaffolds the design process to focus on Internet resources and reading strategies. Twenty-six teachers' lessons on a public database were analyzed for design orientation, goals, curricular…

  5. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics

    PubMed Central

    Schoof, Heiko; Ernst, Rebecca; Nazarov, Vladimir; Pfeifer, Lukas; Mewes, Hans-Werner; Mayer, Klaus F. X.

    2004-01-01

    Arabidopsis thaliana is the most widely studied model plant. Functional genomics is intensively underway in many laboratories worldwide. Beyond the basic annotation of the primary sequence data, the annotated genetic elements of Arabidopsis must be linked to diverse biological data and higher order information such as metabolic or regulatory pathways. The MIPS Arabidopsis thaliana database MAtDB aims to provide a comprehensive resource for Arabidopsis as a genome model that serves as a primary reference for research in plants and is suitable for transfer of knowledge to other plants, especially crops. The genome sequence as a common backbone serves as a scaffold for the integration of data, while, in a complementary effort, these data are enhanced through the application of state-of-the-art bioinformatics tools. This information is visualized on a genome-wide and a gene-by-gene basis with access both for web users and applications. This report updates the information given in a previous report and provides an outlook on further developments. The MAtDB web interface can be accessed at http://mips.gsf.de/proj/thal/db. PMID:14681437

  6. Target-Pathogen: a structural bioinformatic approach to prioritize drug targets in pathogens

    PubMed Central

    Sosa, Ezequiel J; Burguener, Germán; Lanzarotti, Esteban; Radusky, Leandro; Pardo, Agustín M; Marti, Marcelo

    2018-01-01

    Abstract Available genomic data for pathogens has created new opportunities for drug discovery and development to fight them, including new resistant and multiresistant strains. In particular structural data must be integrated with both, gene information and experimental results. In this sense, there is a lack of an online resource that allows genome wide-based data consolidation from diverse sources together with thorough bioinformatic analysis that allows easy filtering and scoring for fast target selection for drug discovery. Here, we present Target-Pathogen database (http://target.sbg.qb.fcen.uba.ar/patho), designed and developed as an online resource that allows the integration and weighting of protein information such as: function, metabolic role, off-targeting, structural properties including druggability, essentiality and omic experiments, to facilitate the identification and prioritization of candidate drug targets in pathogens. We include in the database 10 genomes of some of the most relevant microorganisms for human health (Mycobacterium tuberculosis, Mycobacterium leprae, Klebsiella pneumoniae, Plasmodium vivax, Toxoplasma gondii, Leishmania major, Wolbachia bancrofti, Trypanosoma brucei, Shigella dysenteriae and Schistosoma Smanosoni) and show its applicability. New genomes can be uploaded upon request. PMID:29106651

  7. SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases.

    PubMed

    Chiba, Hirokazu; Uchiyama, Ikuo

    2017-02-08

    Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .

  8. PlantNATsDB: a comprehensive database of plant natural antisense transcripts.

    PubMed

    Chen, Dijun; Yuan, Chunhui; Zhang, Jian; Zhang, Zhao; Bai, Lin; Meng, Yijun; Chen, Ling-Ling; Chen, Ming

    2012-01-01

    Natural antisense transcripts (NATs), as one type of regulatory RNAs, occur prevalently in plant genomes and play significant roles in physiological and pathological processes. Although their important biological functions have been reported widely, a comprehensive database is lacking up to now. Consequently, we constructed a plant NAT database (PlantNATsDB) involving approximately 2 million NAT pairs in 69 plant species. GO annotation and high-throughput small RNA sequencing data currently available were integrated to investigate the biological function of NATs. PlantNATsDB provides various user-friendly web interfaces to facilitate the presentation of NATs and an integrated, graphical network browser to display the complex networks formed by different NATs. Moreover, a 'Gene Set Analysis' module based on GO annotation was designed to dig out the statistical significantly overrepresented GO categories from the specific NAT network. PlantNATsDB is currently the most comprehensive resource of NATs in the plant kingdom, which can serve as a reference database to investigate the regulatory function of NATs. The PlantNATsDB is freely available at http://bis.zju.edu.cn/pnatdb/.

  9. Wikidata as a semantic framework for the Gene Wiki initiative.

    PubMed

    Burgstaller-Muehlbacher, Sebastian; Waagmeester, Andra; Mitraka, Elvira; Turner, Julia; Putman, Tim; Leong, Justin; Naik, Chinmay; Pavlidis, Paul; Schriml, Lynn; Good, Benjamin M; Su, Andrew I

    2016-01-01

    Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/. © The Author(s) 2016. Published by Oxford University Press.

  10. New perspectives in toxicological information management, and the role of ISSTOX databases in assessing chemical mutagenicity and carcinogenicity.

    PubMed

    Benigni, Romualdo; Battistelli, Chiara Laura; Bossa, Cecilia; Tcheremenskaia, Olga; Crettaz, Pierre

    2013-07-01

    Currently, the public has access to a variety of databases containing mutagenicity and carcinogenicity data. These resources are crucial for the toxicologists and regulators involved in the risk assessment of chemicals, which necessitates access to all the relevant literature, and the capability to search across toxicity databases using both biological and chemical criteria. Towards the larger goal of screening chemicals for a wide range of toxicity end points of potential interest, publicly available resources across a large spectrum of biological and chemical data space must be effectively harnessed with current and evolving information technologies (i.e. systematised, integrated and mined), if long-term screening and prediction objectives are to be achieved. A key to rapid progress in the field of chemical toxicity databases is that of combining information technology with the chemical structure as identifier of the molecules. This permits an enormous range of operations (e.g. retrieving chemicals or chemical classes, describing the content of databases, finding similar chemicals, crossing biological and chemical interrogations, etc.) that other more classical databases cannot allow. This article describes the progress in the technology of toxicity databases, including the concepts of Chemical Relational Database and Toxicological Standardized Controlled Vocabularies (Ontology). Then it describes the ISSTOX cluster of toxicological databases at the Istituto Superiore di Sanitá. It consists of freely available databases characterised by the use of modern information technologies and by curation of the quality of the biological data. Finally, this article provides examples of analyses and results made possible by ISSTOX.

  11. Quality tools and resources to support organisational improvement integral to high-quality primary care: a systematic review of published and grey literature.

    PubMed

    Janamian, Tina; Upham, Susan J; Crossland, Lisa; Jackson, Claire L

    2016-04-18

    To conduct a systematic review of the literature to identify existing online primary care quality improvement tools and resources to support organisational improvement related to the seven elements in the Primary Care Practice Improvement Tool (PC-PIT), with the identified tools and resources to progress to a Delphi study for further assessment of relevance and utility. Systematic review of the international published and grey literature. CINAHL, Embase and PubMed databases were searched in March 2014 for articles published between January 2004 and December 2013. GreyNet International and other relevant websites and repositories were also searched in March-April 2014 for documents dated between 1992 and 2012. All citations were imported into a bibliographic database. Published and unpublished tools and resources were included in the review if they were in English, related to primary care quality improvement and addressed any of the seven PC-PIT elements of a high-performing practice. Tools and resources that met the eligibility criteria were then evaluated for their accessibility, relevance, utility and comprehensiveness using a four-criteria appraisal framework. We used a data extraction template to systematically extract information from eligible tools and resources. A content analysis approach was used to explore the tools and resources and collate relevant information: name of the tool or resource, year and country of development, author, name of the organisation that provided access and its URL, accessibility information or problems, overview of each tool or resource and the quality improvement element(s) it addresses. If available, a copy of the tool or resource was downloaded into the bibliographic database, along with supporting evidence (published or unpublished) on its use in primary care. This systematic review identified 53 tools and resources that can potentially be provided as part of a suite of tools and resources to support primary care practices in improving the quality of their practice, to achieve improved health outcomes.

  12. The EPA CompTox Chemistry Dashboard - an online resource ...

    EPA Pesticide Factsheets

    The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program integrates advances in biology, chemistry, and computer science to help prioritize chemicals for further research based on potential human health risks. This work involves computational and data driven approaches that integrate chemistry, exposure and biological data. As an outcome of these efforts the National Center for Computational Toxicology (NCCT) has measured, assembled and delivered an enormous quantity and diversity of data for the environmental sciences including high-throughput in vitro screening data, in vivo and functional use data, exposure models and chemical databases with associated properties. A series of software applications and databases have been produced over the past decade to deliver these data. Recent work has focused on the development of a new architecture that assembles the resources into a single platform. With a focus on delivering access to Open Data streams, web service integration accessibility and a user-friendly web application the CompTox Dashboard provides access to data associated with ~720,000 chemical substances. These data include research data in the form of bioassay screening data associated with the ToxCast program, experimental and predicted physicochemical properties, product and functional use information and related data of value to environmental scientists. This presentation will provide an overview of the CompTox Dashboard and its va

  13. The Watershed and River Systems Management Program: Decision Support for Water- and Environmental-Resource Management

    NASA Astrophysics Data System (ADS)

    Leavesley, G.; Markstrom, S.; Frevert, D.; Fulp, T.; Zagona, E.; Viger, R.

    2004-12-01

    Increasing demands for limited fresh-water supplies, and increasing complexity of water-management issues, present the water-resource manager with the difficult task of achieving an equitable balance of water allocation among a diverse group of water users. The Watershed and River System Management Program (WARSMP) is a cooperative effort between the U.S. Geological Survey (USGS) and the Bureau of Reclamation (BOR) to develop and deploy a database-centered, decision-support system (DSS) to address these multi-objective, resource-management problems. The decision-support system couples the USGS Modular Modeling System (MMS) with the BOR RiverWare tools using a shared relational database. MMS is an integrated system of computer software that provides a research and operational framework to support the development and integration of a wide variety of hydrologic and ecosystem models, and their application to water- and ecosystem-resource management. RiverWare is an object-oriented reservoir and river-system modeling framework developed to provide tools for evaluating and applying water-allocation and management strategies. The modeling capabilities of MMS and Riverware include simulating watershed runoff, reservoir inflows, and the impacts of resource-management decisions on municipal, agricultural, and industrial water users, environmental concerns, power generation, and recreational interests. Forecasts of future climatic conditions are a key component in the application of MMS models to resource-management decisions. Forecast methods applied in MMS include a modified version of the National Weather Service's Extended Streamflow Prediction Program (ESP) and statistical downscaling from atmospheric models. The WARSMP DSS is currently operational in the Gunnison River Basin, Colorado; Yakima River Basin, Washington; Rio Grande Basin in Colorado and New Mexico; and Truckee River Basin in California and Nevada.

  14. CMD: a Cotton Microsatellite Database resource for Gossypium genomics

    PubMed Central

    Blenda, Anna; Scheffler, Jodi; Scheffler, Brian; Palmer, Michael; Lacape, Jean-Marc; Yu, John Z; Jesudurai, Christopher; Jung, Sook; Muthukumar, Sriram; Yellambalase, Preetham; Ficklin, Stephen; Staton, Margaret; Eshelman, Robert; Ulloa, Mauricio; Saha, Sukumar; Burr, Ben; Liu, Shaolin; Zhang, Tianzhen; Fang, Deqiu; Pepper, Alan; Kumpatla, Siva; Jacobs, John; Tomkins, Jeff; Cantrell, Roy; Main, Dorrie

    2006-01-01

    Background The Cotton Microsatellite Database (CMD) is a curated and integrated web-based relational database providing centralized access to publicly available cotton microsatellites, an invaluable resource for basic and applied research in cotton breeding. Description At present CMD contains publication, sequence, primer, mapping and homology data for nine major cotton microsatellite projects, collectively representing 5,484 microsatellites. In addition, CMD displays data for three of the microsatellite projects that have been screened against a panel of core germplasm. The standardized panel consists of 12 diverse genotypes including genetic standards, mapping parents, BAC donors, subgenome representatives, unique breeding lines, exotic introgression sources, and contemporary Upland cottons with significant acreage. A suite of online microsatellite data mining tools are accessible at CMD. These include an SSR server which identifies microsatellites, primers, open reading frames, and GC-content of uploaded sequences; BLAST and FASTA servers providing sequence similarity searches against the existing cotton SSR sequences and primers, a CAP3 server to assemble EST sequences into longer transcripts prior to mining for SSRs, and CMap, a viewer for comparing cotton SSR maps. Conclusion The collection of publicly available cotton SSR markers in a centralized, readily accessible and curated web-enabled database provides a more efficient utilization of microsatellite resources and will help accelerate basic and applied research in molecular breeding and genetic mapping in Gossypium spp. PMID:16737546

  15. Information resources at the National Center for Biotechnology Information.

    PubMed Central

    Woodsmall, R M; Benson, D A

    1993-01-01

    The National Center for Biotechnology Information (NCBI), part of the National Library of Medicine, was established in 1988 to perform basic research in the field of computational molecular biology as well as build and distribute molecular biology databases. The basic research has led to new algorithms and analysis tools for interpreting genomic data and has been instrumental in the discovery of human disease genes for neurofibromatosis and Kallmann syndrome. The principal database responsibility is the National Institutes of Health (NIH) genetic sequence database, GenBank. NCBI, in collaboration with international partners, builds, distributes, and provides online and CD-ROM access to over 112,000 DNA sequences. Another major program is the integration of multiple sequences databases and related bibliographic information and the development of network-based retrieval systems for Internet access. PMID:8374583

  16. The Coral Triangle Atlas: An Integrated Online Spatial Database System for Improving Coral Reef Management

    PubMed Central

    Cros, Annick; Ahamad Fatan, Nurulhuda; White, Alan; Teoh, Shwu Jiau; Tan, Stanley; Handayani, Christian; Huang, Charles; Peterson, Nate; Venegas Li, Ruben; Siry, Hendra Yusran; Fitriana, Ria; Gove, Jamison; Acoba, Tomoko; Knight, Maurice; Acosta, Renerio; Andrew, Neil; Beare, Doug

    2014-01-01

    In this paper we describe the construction of an online GIS database system, hosted by WorldFish, which stores bio-physical, ecological and socio-economic data for the ‘Coral Triangle Area’ in South-east Asia and the Pacific. The database has been built in partnership with all six (Timor-Leste, Malaysia, Indonesia, The Philippines, Solomon Islands and Papua New Guinea) of the Coral Triangle countries, and represents a valuable source of information for natural resource managers at the regional scale. Its utility is demonstrated using biophysical data, data summarising marine habitats, and data describing the extent of marine protected areas in the region. PMID:24941442

  17. Taverna: a tool for building and running workflows of services

    PubMed Central

    Hull, Duncan; Wolstencroft, Katy; Stevens, Robert; Goble, Carole; Pocock, Mathew R.; Li, Peter; Oinn, Tom

    2006-01-01

    Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from . PMID:16845108

  18. The Development of PIPA: An Integrated and Automated Pipeline for Genome-Wide Protein Function Annotation

    DTIC Science & Technology

    2008-01-25

    limitations and plans for improvement Perhaps, one of PIPA’s main limitations is that all of its currently integrated resources to predict protein function...are planning on expending PIPA’s function prediction capabilities by incorporating comparative analysis approaches, e.g., phy- logenetic tree analysis...tools and services. Nucleic Acids Res 2005/12/31 edition. 2006, 34(Database issue):D247-51. 6. Bru C, Courcelle E, Carrere S, Beausse Y, Dalmar S

  19. CerealsDB 3.0: expansion of resources and data integration.

    PubMed

    Wilkinson, Paul A; Winfield, Mark O; Barker, Gary L A; Tyrrell, Simon; Bian, Xingdong; Allen, Alexandra M; Burridge, Amanda; Coghill, Jane A; Waterfall, Christy; Caccamo, Mario; Davey, Robert P; Edwards, Keith J

    2016-06-24

    The increase in human populations around the world has put pressure on resources, and as a consequence food security has become an important challenge for the 21st century. Wheat (Triticum aestivum) is one of the most important crops in human and livestock diets, and the development of wheat varieties that produce higher yields, combined with increased resistance to pests and resilience to changes in climate, has meant that wheat breeding has become an important focus of scientific research. In an attempt to facilitate these improvements in wheat, plant breeders have employed molecular tools to help them identify genes for important agronomic traits that can be bred into new varieties. Modern molecular techniques have ensured that the rapid and inexpensive characterisation of SNP markers and their validation with modern genotyping methods has produced a valuable resource that can be used in marker assisted selection. CerealsDB was created as a means of quickly disseminating this information to breeders and researchers around the globe. CerealsDB version 3.0 is an online resource that contains a wide range of genomic datasets for wheat that will assist plant breeders and scientists to select the most appropriate markers for use in marker assisted selection. CerealsDB includes a database which currently contains in excess of a million putative varietal SNPs, of which several hundreds of thousands have been experimentally validated. In addition, CerealsDB also contains new data on functional SNPs predicted to have a major effect on protein function and we have constructed a web service to encourage data integration and high-throughput programmatic access. CerealsDB is an open access website that hosts information on SNPs that are considered useful for both plant breeders and research scientists. The recent inclusion of web services designed to federate genomic data resources allows the information on CerealsDB to be more fully integrated with the WheatIS network and other biological databases.

  20. SPARQL-enabled identifier conversion with Identifiers.org

    PubMed Central

    Wimalaratne, Sarala M.; Bolleman, Jerven; Juty, Nick; Katayama, Toshiaki; Dumontier, Michel; Redaschi, Nicole; Le Novère, Nicolas; Hermjakob, Henning; Laibe, Camille

    2015-01-01

    Motivation: On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. Results: We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. Availability and implementation: The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql. Contact: sarala@ebi.ac.uk PMID:25638809

  1. SPARQL-enabled identifier conversion with Identifiers.org.

    PubMed

    Wimalaratne, Sarala M; Bolleman, Jerven; Juty, Nick; Katayama, Toshiaki; Dumontier, Michel; Redaschi, Nicole; Le Novère, Nicolas; Hermjakob, Henning; Laibe, Camille

    2015-06-01

    On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql. © The Author 2015. Published by Oxford University Press.

  2. A service-based framework for pharmacogenomics data integration

    NASA Astrophysics Data System (ADS)

    Wang, Kun; Bai, Xiaoying; Li, Jing; Ding, Cong

    2010-08-01

    Data are central to scientific research and practices. The advance of experiment methods and information retrieval technologies leads to explosive growth of scientific data and databases. However, due to the heterogeneous problems in data formats, structures and semantics, it is hard to integrate the diversified data that grow explosively and analyse them comprehensively. As more and more public databases are accessible through standard protocols like programmable interfaces and Web portals, Web-based data integration becomes a major trend to manage and synthesise data that are stored in distributed locations. Mashup, a Web 2.0 technique, presents a new way to compose content and software from multiple resources. The paper proposes a layered framework for integrating pharmacogenomics data in a service-oriented approach using the mashup technology. The framework separates the integration concerns from three perspectives including data, process and Web-based user interface. Each layer encapsulates the heterogeneous issues of one aspect. To facilitate the mapping and convergence of data, the ontology mechanism is introduced to provide consistent conceptual models across different databases and experiment platforms. To support user-interactive and iterative service orchestration, a context model is defined to capture information of users, tasks and services, which can be used for service selection and recommendation during a dynamic service composition process. A prototype system is implemented and cases studies are presented to illustrate the promising capabilities of the proposed approach.

  3. Systems biology of cancer biomarker detection.

    PubMed

    Mitra, Sanga; Das, Smarajit; Chakrabarti, Jayprokas

    2013-01-01

    Cancer systems-biology is an ever-growing area of research due to explosion of data; how to mine these data and extract useful information is the problem. To have an insight on carcinogenesis one need to systematically mine several resources, such as databases, microarray and next-generation sequences. This review encompasses management and analysis of cancer data, databases construction and data deposition, whole transcriptome and genome comparison, analysing results from high throughput experiments to uncover cellular pathways and molecular interactions, and the design of effective algorithms to identify potential biomarkers. Recent technical advances such as ChIP-on-chip, ChIP-seq and RNA-seq can be applied to get epigenetic information transformed into a high-throughput endeavour to which systems biology and bioinformatics are making significant inroads. The data from ENCODE and GENCODE projects available through UCSC genome browser can be considered as benchmark for comparison and meta-analysis. A pipeline for integrating next generation sequencing data, microarray data, and putting them together with the existing database is discussed. The understanding of cancer genomics is changing the way we approach cancer diagnosis and treatment. To give a better understanding of utilizing available resources' we have chosen oral cancer to show how and what kind of analysis can be done. This review is a computational genomic primer that provides a bird's eye view of computational and bioinformatics' tools currently available to perform integrated genomic and system biology analyses of several carcinoma.

  4. SolEST database: a "one-stop shop" approach to the study of Solanaceae transcriptomes.

    PubMed

    D'Agostino, Nunzio; Traini, Alessandra; Frusciante, Luigi; Chiusano, Maria Luisa

    2009-11-30

    Since no genome sequences of solanaceous plants have yet been completed, expressed sequence tag (EST) collections represent a reliable tool for broad sampling of Solanaceae transcriptomes, an attractive route for understanding Solanaceae genome functionality and a powerful reference for the structural annotation of emerging Solanaceae genome sequences. We describe the SolEST database http://biosrv.cab.unina.it/solestdb which integrates different EST datasets from both cultivated and wild Solanaceae species and from two species of the genus Coffea. Background as well as processed data contained in the database, extensively linked to external related resources, represent an invaluable source of information for these plant families. Two novel features differentiate SolEST from other resources: i) the option of accessing and then visualizing Solanaceae EST/TC alignments along the emerging tomato and potato genome sequences; ii) the opportunity to compare different Solanaceae assemblies generated by diverse research groups in the attempt to address a common complaint in the SOL community. Different databases have been established worldwide for collecting Solanaceae ESTs and are related in concept, content and utility to the one presented herein. However, the SolEST database has several distinguishing features that make it appealing for the research community and facilitates a "one-stop shop" for the study of Solanaceae transcriptomes.

  5. MMpI: A WideRange of Available Compounds of Matrix Metalloproteinase Inhibitors

    PubMed Central

    Muvva, Charuvaka; Patra, Sanjukta; Venkatesan, Subramanian

    2016-01-01

    Matrix metalloproteinases (MMPs) are a family of zinc-dependent proteinases involved in the regulation of the extracellular signaling and structural matrix environment of cells and tissues. MMPs are considered as promising targets for the treatment of many diseases. Therefore, creation of database on the inhibitors of MMP would definitely accelerate the research activities in this area due to its implication in above-mentioned diseases and associated limitations in the first and second generation inhibitors. In this communication, we report the development of a new MMpI database which provides resourceful information for all researchers working in this field. It is a web-accessible, unique resource that contains detailed information on the inhibitors of MMP including small molecules, peptides and MMP Drug Leads. The database contains entries of ~3000 inhibitors including ~72 MMP Drug Leads and ~73 peptide based inhibitors. This database provides the detailed molecular and structural details which are necessary for the drug discovery and development. The MMpI database contains physical properties, 2D and 3D structures (mol2 and pdb format files) of inhibitors of MMP. Other data fields are hyperlinked to PubChem, ChEMBL, BindingDB, DrugBank, PDB, MEROPS and PubMed. The database has extensive searching facility with MMpI ID, IUPAC name, chemical structure and with the title of research article. The MMP inhibitors provided in MMpI database are optimized using Python-based Hierarchical Environment for Integrated Xtallography (Phenix) software. MMpI Database is unique and it is the only public database that contains and provides the complete information on the inhibitors of MMP. Database URL: http://clri.res.in/subramanian/databases/mmpi/index.php. PMID:27509041

  6. Information management systems for pharmacogenomics.

    PubMed

    Thallinger, Gerhard G; Trajanoski, Slave; Stocker, Gernot; Trajanoski, Zlatko

    2002-09-01

    The value of high-throughput genomic research is dramatically enhanced by association with key patient data. These data are generally available but of disparate quality and not typically directly associated. A system that could bring these disparate data sources into a common resource connected with functional genomic data would be tremendously advantageous. However, the integration of clinical and accurate interpretation of the generated functional genomic data requires the development of information management systems capable of effectively capturing the data as well as tools to make that data accessible to the laboratory scientist or to the clinician. In this review these challenges and current information technology solutions associated with the management, storage and analysis of high-throughput data are highlighted. It is suggested that the development of a pharmacogenomic data management system which integrates public and proprietary databases, clinical datasets, and data mining tools embedded in a high-performance computing environment should include the following components: parallel processing systems, storage technologies, network technologies, databases and database management systems (DBMS), and application services.

  7. Integration of multiple DICOM Web servers into an enterprise-wide Web-based electronic medical record

    NASA Astrophysics Data System (ADS)

    Stewart, Brent K.; Langer, Steven G.; Martin, Kelly P.

    1999-07-01

    The purpose of this paper is to integrate multiple DICOM image webservers into the currently existing enterprises- wide web-browsable electronic medical record. Over the last six years the University of Washington has created a clinical data repository combining in a distributed relational database information from multiple departmental databases (MIND). A character cell-based view of this data called the Mini Medical Record (MMR) has been available for four years, MINDscape, unlike the text-based MMR. provides a platform independent, dynamic, web browser view of the MIND database that can be easily linked with medical knowledge resources on the network, like PubMed and the Federated Drug Reference. There are over 10,000 MINDscape user accounts at the University of Washington Academic Medical Centers. The weekday average number of hits to MINDscape is 35,302 and weekday average number of individual users is 1252. DICOM images from multiple webservers are now being viewed through the MINDscape electronic medical record.

  8. EarthRef.org: Exploring aspects of a Cyber Infrastructure in Earth Science and Education

    NASA Astrophysics Data System (ADS)

    Staudigel, H.; Koppers, A.; Tauxe, L.; Constable, C.; Helly, J.

    2004-12-01

    EarthRef.org is the common host and (co-) developer of a range of earth science databases and IT resources providing a test bed for a Cyberinfrastructure in Earth Science and Education (CIESE). EarthRef.org data base efforts include in particular the Geochemical Earth Reference Model (GERM), the Magnetics Information Consortium (MagIC), the Educational Resources for Earth Science Education (ERESE) project, the Seamount Catalog, the Mid-Ocean Ridge Catalog, the Radio-Isotope Geochronology (RiG) initiative for CHRONOS, and the Microbial Observatory for Fe oxidizing microbes on Loihi Seamount (FeMO; the most recent development). These diverse databases are developed under a single database umbrella and webserver at the San Diego Supercomputing Center. All the data bases have similar structures, with consistent metadata concepts, a common database layout, and automated upload wizards. Shared resources include supporting databases like an address book, a reference/publication catalog, and a common digital archive making database development and maintenance cost-effective, while guaranteeing interoperability. The EarthRef.org CIESE provides a common umbrella for synthesis information as well as sample-based data, and it bridges the gap between science and science education in middle and high schools, validating the potential for a system wide data infrastructure in a CIESE. EarthRef.org experiences have shown that effective communication with the respective communities is a key part of a successful CIESE facilitating both utility and community buy-in. GERM has been particularly successful at developing a metadata scheme for geochemistry and in the development of a new electronic journal (G-cubed) that has made much progress in data publication and linkages between journals and community data bases. GERM also has worked, through editors and publishers, towards interfacing databases with the publication process, to accomplish a more scholarly and database friendly data publication environment, and to interface with the respective science communities. MagIC has held several workshops that have resulted in an integrated data archival environment using metadata that are interchangeable with the geochemical metadata. MagIC archives a wide array of paleo and rock magnetic directional, intensity and magnetic property data as well as integrating computational tools. ERESE brought together librarians, teachers, and scientists to create an educational environment that supports inquiry driven education and the use of science data. Experiences in EarthRef.org demonstrates the feasibility of an effective, community wide CIESE for data publication, archival and modeling, as well as the outreach to the educational community.

  9. Introduction to TETHYS—an interdisciplinary GIS database for studying continental collisions

    NASA Astrophysics Data System (ADS)

    Khan, S. D.; Flower, M. F. J.; Sultan, M. I.; Sandvol, E.

    2006-05-01

    The TETHYS GIS database is being developed as a way to integrate relevant geologic, geophysical, geochemical, geochronologic, and remote sensing data bearing on Tethyan continental plate collisions. The project is predicated on a need for actualistic model 'templates' for interpreting the Earth's geologic record. Because of their time-transgressive character, Tethyan collisions offer 'actualistic' models for features such as continental 'escape', collision-induced upper mantle flow magmatism, and marginal basin opening, associated with modern convergent plate margins. Large integrated geochemical and geophysical databases allow for such models to be tested against the geologic record, leading to a better understanding of continental accretion throughout Earth history. The TETHYS database combines digital topographic and geologic information, remote sensing images, sample-based geochemical, geochronologic, and isotopic data (for pre- and post-collision igneous activity), and data for seismic tomography, shear-wave splitting, space geodesy, and information for plate tectonic reconstructions. Here, we report progress on developing such a database and the tools for manipulating and visualizing integrated 2-, 3-, and 4-d data sets with examples of research applications in progress. Based on an Oracle database system, linked with ArcIMS via ArcSDE, the TETHYS project is an evolving resource for researchers, educators, and others interested in studying the role of plate collisions in the process of continental accretion, and will be accessible as a node of the national Geosciences Cyberinfrastructure Network—GEON via the World-Wide Web and ultra-high speed internet2. Interim partial access to the data and metadata is available at: http://geoinfo.geosc.uh.edu/Tethys/ and http://www.esrs.wmich.edu/tethys.htm. We demonstrate the utility of the TETHYS database in building a framework for lithospheric interactions in continental collision and accretion.

  10. The aquatic animals' transcriptome resource for comparative functional analysis.

    PubMed

    Chou, Chih-Hung; Huang, Hsi-Yuan; Huang, Wei-Chih; Hsu, Sheng-Da; Hsiao, Chung-Der; Liu, Chia-Yu; Chen, Yu-Hung; Liu, Yu-Chen; Huang, Wei-Yun; Lee, Meng-Lin; Chen, Yi-Chang; Huang, Hsien-Da

    2018-05-09

    Aquatic animals have great economic and ecological importance. Among them, non-model organisms have been studied regarding eco-toxicity, stress biology, and environmental adaptation. Due to recent advances in next-generation sequencing techniques, large amounts of RNA-seq data for aquatic animals are publicly available. However, currently there is no comprehensive resource exist for the analysis, unification, and integration of these datasets. This study utilizes computational approaches to build a new resource of transcriptomic maps for aquatic animals. This aquatic animal transcriptome map database dbATM provides de novo assembly of transcriptome, gene annotation and comparative analysis of more than twenty aquatic organisms without draft genome. To improve the assembly quality, three computational tools (Trinity, Oases and SOAPdenovo-Trans) were employed to enhance individual transcriptome assembly, and CAP3 and CD-HIT-EST software were then used to merge these three assembled transcriptomes. In addition, functional annotation analysis provides valuable clues to gene characteristics, including full-length transcript coding regions, conserved domains, gene ontology and KEGG pathways. Furthermore, all aquatic animal genes are essential for comparative genomics tasks such as constructing homologous gene groups and blast databases and phylogenetic analysis. In conclusion, we establish a resource for non model organism aquatic animals, which is great economic and ecological importance and provide transcriptomic information including functional annotation and comparative transcriptome analysis. The database is now publically accessible through the URL http://dbATM.mbc.nctu.edu.tw/ .

  11. Insect barcode information system.

    PubMed

    Pratheepa, Maria; Jalali, Sushil Kumar; Arokiaraj, Robinson Silvester; Venkatesan, Thiruvengadam; Nagesh, Mandadi; Panda, Madhusmita; Pattar, Sharath

    2014-01-01

    Insect Barcode Information System called as Insect Barcode Informática (IBIn) is an online database resource developed by the National Bureau of Agriculturally Important Insects, Bangalore. This database provides acquisition, storage, analysis and publication of DNA barcode records of agriculturally important insects, for researchers specifically in India and other countries. It bridges a gap in bioinformatics by integrating molecular, morphological and distribution details of agriculturally important insects. IBIn was developed using PHP/My SQL by using relational database management concept. This database is based on the client- server architecture, where many clients can access data simultaneously. IBIn is freely available on-line and is user-friendly. IBIn allows the registered users to input new information, search and view information related to DNA barcode of agriculturally important insects.This paper provides a current status of insect barcode in India and brief introduction about the database IBIn. http://www.nabg-nbaii.res.in/barcode.

  12. A Linked Data-Based Collaborative Annotation System for Increasing Learning Achievements

    ERIC Educational Resources Information Center

    Zarzour, Hafed; Sellami, Mokhtar

    2017-01-01

    With the emergence of the Web 2.0, collaborative annotation practices have become more mature in the field of learning. In this context, several recent studies have shown the powerful effects of the integration of annotation mechanism in learning process. However, most of these studies provide poor support for semantically structured resources,…

  13. Bringing it all together: A Web-based Database for Chemical and Biological Data to Support Environmental Toxicology (ACS Fall meeting 8 of 12)

    EPA Science Inventory

    The EPA Comptox Chemistry Dashboard is a web-based application providing access to a set of data resources provided by the National Center of Computational Toxicology. Sitting on a foundation of chemistry data for ~750,000 chemical substances the application integrates bioassay s...

  14. Electronic Resources in a Next-Generation Catalog: The Case of WorldCat Local

    ERIC Educational Resources Information Center

    Shadle, Steve

    2009-01-01

    In April 2007, the University of Washington Libraries debuted WorldCat Local (WCL), a localized version of the WorldCat database that interoperates with a library's integrated library system and fulfillment services to provide a single-search interface for a library's physical and electronic content. This brief will describe how WCL incorporates a…

  15. Introducing the Phytophthora database: an integrated resource for detecting, monitoring, and managing Phytophthora diseases

    Treesearch

    Kelly L. Ivors; Frank Martin; Michael Coffey; Izabela Makalowska; David M. Geiser; Seogchan Kang

    2008-01-01

    Its virulence and ability to spread rapidly throughout the world by various means establishes Phytophthora as one of the most important groups of plant pathogens. Discoveries of interspecific hybridization among Phytophthora species in nature, which could yield novel pathogens, further underscore the threat posed by members of this genus. The ability...

  16. Teaching Information Literacy Using Electronic Resources for Grades 6-12. Professional Growth Series.

    ERIC Educational Resources Information Center

    Anderson, Mary Alice, Ed.

    This notebook is a compilation of 53 lesson plans for grades 6-12, written by various authors and focusing on the integration of technology into the curriculum. Lesson plans include topics such as online catalog searching, electronic encyclopedias, CD-ROM databases, exploring the Internet, creating a computer slide show, desktop publishing, and…

  17. An object-oriented programming system for the integration of internet-based bioinformatics resources.

    PubMed

    Beveridge, Allan

    2006-01-01

    The Internet consists of a vast inhomogeneous reservoir of data. Developing software that can integrate a wide variety of different data sources is a major challenge that must be addressed for the realisation of the full potential of the Internet as a scientific research tool. This article presents a semi-automated object-oriented programming system for integrating web-based resources. We demonstrate that the current Internet standards (HTML, CGI [common gateway interface], Java, etc.) can be exploited to develop a data retrieval system that scans existing web interfaces and then uses a set of rules to generate new Java code that can automatically retrieve data from the Web. The validity of the software has been demonstrated by testing it on several biological databases. We also examine the current limitations of the Internet and discuss the need for the development of universal standards for web-based data.

  18. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database

    PubMed Central

    Drabkin, Harold J.; Blake, Judith A.

    2012-01-01

    The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as ‘GO’ or ‘homology’ or ‘phenotype’. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as ‘papers selected for GO that refer to genes with NO GO annotation’. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported with statements of evidence as well as access to source publications. PMID:23110975

  19. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database.

    PubMed

    Drabkin, Harold J; Blake, Judith A

    2012-01-01

    The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as 'GO' or 'homology' or 'phenotype'. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as 'papers selected for GO that refer to genes with NO GO annotation'. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported with statements of evidence as well as access to source publications.

  20. On the Future of Thermochemical Databases, the Development of Solution Models and the Practical Use of Computational Thermodynamics in Volcanology, Geochemistry and Petrology: Can Innovations of Modern Data Science Democratize an Oligarchy?

    NASA Astrophysics Data System (ADS)

    Ghiorso, M. S.

    2014-12-01

    Computational thermodynamics (CT) has now become an essential tool of petrologic and geochemical research. CT is the basis for the construction of phase diagrams, the application of geothermometers and geobarometers, the equilibrium speciation of solutions, the construction of pseudosections, calculations of mass transfer between minerals, melts and fluids, and, it provides a means of estimating materials properties for the evaluation of constitutive relations in fluid dynamical simulations. The practical application of CT to Earth science problems requires data. Data on the thermochemical properties and the equation of state of relevant materials, and data on the relative stability and partitioning of chemical elements between phases as a function of temperature and pressure. These data must be evaluated and synthesized into a self consistent collection of theoretical models and model parameters that is colloquially known as a thermodynamic database. Quantitative outcomes derived from CT reply on the existence, maintenance and integrity of thermodynamic databases. Unfortunately, the community is reliant on too few such databases, developed by a small number of research groups, and mostly under circumstances where refinement and updates to the database lag behind or are unresponsive to need. Given the increasing level of reliance on CT calculations, what is required is a paradigm shift in the way thermodynamic databases are developed, maintained and disseminated. They must become community resources, with flexible and assessable software interfaces that permit easy modification, while at the same time maintaining theoretical integrity and fidelity to the underlying experimental observations. Advances in computational and data science give us the tools and resources to address this problem, allowing CT results to be obtained at the speed of thought, and permitting geochemical and petrological intuition to play a key role in model development and calibration.

  1. IDESSA: An Integrative Decision Support System for Sustainable Rangeland Management in Southern African Savannas

    NASA Astrophysics Data System (ADS)

    Meyer, Hanna; Authmann, Christian; Dreber, Niels; Hess, Bastian; Kellner, Klaus; Morgenthal, Theunis; Nauss, Thomas; Seeger, Bernhard; Tsvuura, Zivanai; Wiegand, Kerstin

    2017-04-01

    Bush encroachment is a syndrome of land degradation that occurs in many savannas including those of southern Africa. The increase in density, cover or biomass of woody vegetation often has negative effects on a range of ecosystem functions and services, which are hardly reversible. However, despite its importance, neither the causes of bush encroachment, nor the consequences of different resource management strategies to combat or mitigate related shifts in savanna states are fully understood. The project "IDESSA" (An Integrative Decision Support System for Sustainable Rangeland Management in Southern African Savannas) aims to improve the understanding of the complex interplays between land use, climate patterns and vegetation dynamics and to implement an integrative monitoring and decision-support system for the sustainable management of different savanna types. For this purpose, IDESSA follows an innovative approach that integrates local knowledge, botanical surveys, remote-sensing and machine-learning based time-series of atmospheric and land-cover dynamics, spatially explicit simulation modeling and analytical database management. The integration of the heterogeneous data will be implemented in a user oriented database infrastructure and scientific workflow system. Accessible via web-based interfaces, this database and analysis system will allow scientists to manage and analyze monitoring data and scenario computations, as well as allow stakeholders (e. g. land users, policy makers) to retrieve current ecosystem information and seasonal outlooks. We present the concept of the project and show preliminary results of the realization steps towards the integrative savanna management and decision-support system.

  2. IMGT/3Dstructure-DB and IMGT/StructuralQuery, a database and a tool for immunoglobulin, T cell receptor and MHC structural data

    PubMed Central

    Kaas, Quentin; Ruiz, Manuel; Lefranc, Marie-Paule

    2004-01-01

    IMGT/3Dstructure-DB and IMGT/Structural-Query are a novel 3D structure database and a new tool for immunological proteins. They are part of IMGT, the international ImMunoGenetics information system®, a high-quality integrated knowledge resource specializing in immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC) and related proteins of the immune system (RPI) of human and other vertebrate species, which consists of databases, Web resources and interactive on-line tools. IMGT/3Dstructure-DB data are described according to the IMGT Scientific chart rules based on the IMGT-ONTOLOGY concepts. IMGT/3Dstructure-DB provides IMGT gene and allele identification of IG, TR and MHC proteins with known 3D structures, domain delimitations, amino acid positions according to the IMGT unique numbering and renumbered coordinate flat files. Moreover IMGT/3Dstructure-DB provides 2D graphical representations (or Collier de Perles) and results of contact analysis. The IMGT/StructuralQuery tool allows search of this database based on specific structural characteristics. IMGT/3Dstructure-DB and IMGT/StructuralQuery are freely available at http://imgt.cines.fr. PMID:14681396

  3. SZDB: A Database for Schizophrenia Genetic Research

    PubMed Central

    Wu, Yong; Yao, Yong-Gang

    2017-01-01

    Abstract Schizophrenia (SZ) is a debilitating brain disorder with a complex genetic architecture. Genetic studies, especially recent genome-wide association studies (GWAS), have identified multiple variants (loci) conferring risk to SZ. However, how to efficiently extract meaningful biological information from bulk genetic findings of SZ remains a major challenge. There is a pressing need to integrate multiple layers of data from various sources, eg, genetic findings from GWAS, copy number variations (CNVs), association and linkage studies, gene expression, protein–protein interaction (PPI), co-expression, expression quantitative trait loci (eQTL), and Encyclopedia of DNA Elements (ENCODE) data, to provide a comprehensive resource to facilitate the translation of genetic findings into SZ molecular diagnosis and mechanism study. Here we developed the SZDB database (http://www.szdb.org/), a comprehensive resource for SZ research. SZ genetic data, gene expression data, network-based data, brain eQTL data, and SNP function annotation information were systematically extracted, curated and deposited in SZDB. In-depth analyses and systematic integration were performed to identify top prioritized SZ genes and enriched pathways. Multiple types of data from various layers of SZ research were systematically integrated and deposited in SZDB. In-depth data analyses and integration identified top prioritized SZ genes and enriched pathways. We further showed that genes implicated in SZ are highly co-expressed in human brain and proteins encoded by the prioritized SZ risk genes are significantly interacted. The user-friendly SZDB provides high-confidence candidate variants and genes for further functional characterization. More important, SZDB provides convenient online tools for data search and browse, data integration, and customized data analyses. PMID:27451428

  4. PedAM: a database for Pediatric Disease Annotation and Medicine.

    PubMed

    Jia, Jinmeng; An, Zhongxin; Ming, Yue; Guo, Yongli; Li, Wei; Li, Xin; Liang, Yunxiang; Guo, Dongming; Tai, Jun; Chen, Geng; Jin, Yaqiong; Liu, Zhimei; Ni, Xin; Shi, Tieliu

    2018-01-04

    There is a significant number of children around the world suffering from the consequence of the misdiagnosis and ineffective treatment for various diseases. To facilitate the precision medicine in pediatrics, a database namely the Pediatric Disease Annotations & Medicines (PedAM) has been built to standardize and classify pediatric diseases. The PedAM integrates both biomedical resources and clinical data from Electronic Medical Records to support the development of computational tools, by which enables robust data analysis and integration. It also uses disease-manifestation (D-M) integrated from existing biomedical ontologies as prior knowledge to automatically recognize text-mined, D-M-specific syntactic patterns from 774 514 full-text articles and 8 848 796 abstracts in MEDLINE. Additionally, disease connections based on phenotypes or genes can be visualized on the web page of PedAM. Currently, the PedAM contains standardized 8528 pediatric disease terms (4542 unique disease concepts and 3986 synonyms) with eight annotation fields for each disease, including definition synonyms, gene, symptom, cross-reference (Xref), human phenotypes and its corresponding phenotypes in the mouse. The database PedAM is freely accessible at http://www.unimd.org/pedam/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Integrated Water Resources Planning and Management in Arid/Semi-arid Regions: Data, Modeling, and Assessment

    NASA Astrophysics Data System (ADS)

    Gupta, H.; Liu, Y.; Wagener, T.; Durcik, M.; Duffy, C.; Springer, E.

    2005-12-01

    Water resources in arid and semi-arid regions are highly sensitive to climate variability and change. As the demand for water continues to increase due to economic and population growth, planning and management of available water resources under climate uncertainties becomes increasingly critical in order to achieve basin-scale water sustainability (i.e., to ensure a long-term balance between supply and demand of water).The tremendous complexity of the interactions between the natural hydrologic system and the human environment means that modeling is the only available mechanism for properly integrating new knowledge into the decision-making process. Basin-scale integrated models have the potential to allow us to study the feedback processes between the physical and human systems (including institutional, engineering, and behavioral components); and an integrated assessment of the potential second- and higher-order effects of political and management decisions can aid in the selection of a rational water-resources policy. Data and information, especially hydrological and water-use data, are critical to the integrated modeling and assessment for water resources management of any region. To this end we are in the process of developing a multi-resolution integrated modeling and assessment framework for the south-western USA, which can be used to generate simulations of the probable effects of human actions while taking into account the uncertainties brought about by future climatic variability and change. Data are being collected (including the development of a hydro-geospatial database) and used in support of the modeling and assessment activities. This paper will present a blueprint of the modeling framework, describe achievements so far and discuss the science questions which still require answers with a particular emphasis on issues related to dry regions.

  6. Decision support system for health care resources allocation

    PubMed Central

    Sebaa, Abderrazak; Nouicer, Amina; Tari, AbdelKamel; Tarik, Ramtani; Abdellah, Ouhab

    2017-01-01

    Background A study about healthcare resources can improve decisions regarding the allotment and mobilization of medical resources and to better guide future investment in the health sector. Aim The aim of this work was to design and implement a decision support system to improve medical resources allocation of Bejaia region. Methods To achieve the retrospective cohort study, we integrated existing clinical databases from different Bejaia department health sector institutions (an Algerian department) to collect information about patients from January 2015 through December 2015. Data integration was performed in a data warehouse using the multi-dimensional model and OLAP cube. During implementation, we used Microsoft SQL server 2012 and Microsoft Excel 2010. Results A medical decision support platform was introduced, and was implemented during the planning stages allowing the management of different medical orientations, it provides better apportionment and allotment of medical resources, and ensures that the allocation of health care resources has optimal effects on improving health. Conclusion In this study, we designed and implemented a decision support system which would improve health care in Bejaia department to especially assist in the selection of the optimum location of health center and hospital, the specialty of the health center, the medical equipment and the medical staff. PMID:28848645

  7. Decision support system for health care resources allocation.

    PubMed

    Sebaa, Abderrazak; Nouicer, Amina; Tari, AbdelKamel; Tarik, Ramtani; Abdellah, Ouhab

    2017-06-01

    A study about healthcare resources can improve decisions regarding the allotment and mobilization of medical resources and to better guide future investment in the health sector. The aim of this work was to design and implement a decision support system to improve medical resources allocation of Bejaia region. To achieve the retrospective cohort study, we integrated existing clinical databases from different Bejaia department health sector institutions (an Algerian department) to collect information about patients from January 2015 through December 2015. Data integration was performed in a data warehouse using the multi-dimensional model and OLAP cube. During implementation, we used Microsoft SQL server 2012 and Microsoft Excel 2010. A medical decision support platform was introduced, and was implemented during the planning stages allowing the management of different medical orientations, it provides better apportionment and allotment of medical resources, and ensures that the allocation of health care resources has optimal effects on improving health. In this study, we designed and implemented a decision support system which would improve health care in Bejaia department to especially assist in the selection of the optimum location of health center and hospital, the specialty of the health center, the medical equipment and the medical staff.

  8. PubChem BioAssay: 2017 update

    PubMed Central

    Wang, Yanli; Bryant, Stephen H.; Cheng, Tiejun; Wang, Jiyao; Gindulyte, Asta; Shoemaker, Benjamin A.; Thiessen, Paul A.; He, Siqian; Zhang, Jian

    2017-01-01

    PubChem's BioAssay database (https://pubchem.ncbi.nlm.nih.gov) has served as a public repository for small-molecule and RNAi screening data since 2004 providing open access of its data content to the community. PubChem accepts data submission from worldwide researchers at academia, industry and government agencies. PubChem also collaborates with other chemical biology database stakeholders with data exchange. With over a decade's development effort, it becomes an important information resource supporting drug discovery and chemical biology research. To facilitate data discovery, PubChem is integrated with all other databases at NCBI. In this work, we provide an update for the PubChem BioAssay database describing several recent development including added sources of research data, redesigned BioAssay record page, new BioAssay classification browser and new features in the Upload system facilitating data sharing. PMID:27899599

  9. A computational platform to maintain and migrate manual functional annotations for BioCyc databases.

    PubMed

    Walsh, Jesse R; Sen, Taner Z; Dickerson, Julie A

    2014-10-12

    BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database. We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers. Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.

  10. BμG@Sbase—a microbial gene expression and comparative genomic database

    PubMed Central

    Witney, Adam A.; Waldron, Denise E.; Brooks, Lucy A.; Tyler, Richard H.; Withers, Michael; Stoker, Neil G.; Wren, Brendan W.; Butcher, Philip D.; Hinds, Jason

    2012-01-01

    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future. PMID:21948792

  11. BμG@Sbase--a microbial gene expression and comparative genomic database.

    PubMed

    Witney, Adam A; Waldron, Denise E; Brooks, Lucy A; Tyler, Richard H; Withers, Michael; Stoker, Neil G; Wren, Brendan W; Butcher, Philip D; Hinds, Jason

    2012-01-01

    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future.

  12. Coupling computer-interpretable guidelines with a drug-database through a web-based system – The PRESGUID project

    PubMed Central

    Dufour, Jean-Charles; Fieschi, Dominique; Fieschi, Marius

    2004-01-01

    Background Clinical Practice Guidelines (CPGs) available today are not extensively used due to lack of proper integration into clinical settings, knowledge-related information resources, and lack of decision support at the point of care in a particular clinical context. Objective The PRESGUID project (PREScription and GUIDelines) aims to improve the assistance provided by guidelines. The project proposes an online service enabling physicians to consult computerized CPGs linked to drug databases for easier integration into the healthcare process. Methods Computable CPGs are structured as decision trees and coded in XML format. Recommendations related to drug classes are tagged with ATC codes. We use a mapping module to enhance computerized guidelines coupling with a drug database, which contains detailed information about each usable specific medication. In this way, therapeutic recommendations are backed up with current and up-to-date information from the database. Results Two authoritative CPGs, originally diffused as static textual documents, have been implemented to validate the computerization process and to illustrate the usefulness of the resulting automated CPGs and their coupling with a drug database. We discuss the advantages of this approach for practitioners and the implications for both guideline developers and drug database providers. Other CPGs will be implemented and evaluated in real conditions by clinicians working in different health institutions. PMID:15053828

  13. Caregiver Integration During Discharge Planning for Older Adults to Reduce Resource Use: A Metaanalysis.

    PubMed

    Rodakowski, Juleen; Rocco, Philip B; Ortiz, Maqui; Folb, Barbara; Schulz, Richard; Morton, Sally C; Leathers, Sally Caine; Hu, Lu; James, A Everette

    2017-08-01

    To determine the effect of integrating informal caregivers into discharge planning on postdischarge cost and resource use in older adults. A systematic review and metaanalysis of randomized controlled trials that examine the effect of discharge planning with caregiver integration begun before discharge on healthcare cost and resource use outcomes. MEDLINE, EMBASE, and the Cochrane Library databases were searched for all English-language articles published between 1990 and April 2016. Hospital or skilled nursing facility. Older adults with informal caregivers discharged to a community setting. Readmission rates, length of and time to post-discharge rehospitalizations, costs of postdischarge care. Of 10,715 abstracts identified, 15 studies met the inclusion criteria. Eleven studies provided sufficient detail to calculate readmission rates for treatment and control participants. Discharge planning interventions with caregiver integration were associated with a 25% fewer readmissions at 90 days (relative risk (RR) = 0.75, 95% confidence interval (CI) = 0.62-0.91) and 24% fewer readmissions at 180 days (RR = 0.76, 95% CI = 0.64-0.90). The majority of studies reported statistically significant shorter time to readmission, shorter rehospitalization, and lower costs of postdischarge care among discharge planning interventions with caregiver integration. For older adults discharged to a community setting, the integration of caregivers into the discharge planning process reduces the risk of hospital readmission. © 2017, Copyright the Authors Journal compilation © 2017, The American Geriatrics Society.

  14. Study of water-table behaviour for the Indian Punjab using GIS.

    PubMed

    Kaur, Samanpreet; Aggarwal, Rajan; Soni, Ashwani

    2011-01-01

    The state of Punjab (India) has witnessed a spectacular increase in agricultural production in the last few decades. This has been possible due to high use of fertilizers, good quality seeds and increased use of water resources. This increased demand of water resources has resulted in extensive use of groundwater in the central districts of the state and surface water (canals) in South-West Punjab, where groundwater is of poor quality in general. The state has been facing the twin problem of water table decline/rise in different parts. Efficient management relies on comprehensive database and regular monitoring of the resources. GIS is one of the important tools for integrating and analyzing spatial information from different sources or disciplines. It helps to integrate, analyze and represent spatial information and database of any resource, which could be easily used for planning of resource development, environmental protection and scientific researches and investigations. Geographical Information Systems (GIS) have been used for a variety of groundwater studies. Groundwater level change maps are useful in determining areas of greatest changes in storage in the regional systems. In this study, an attempt has been made to assess the long term groundwater behaviour of the state using GIS to visually and spatially analyze water level data obtained from the state and central agencies. The data was analysed for 0-3 m, 3-10 m, 10-20 m and beyond 20 m. The study revealed that per cent area with water table depth > 10 m was 20% in 1998 and has increased to 58% by 2006 which is critical limit for shifting from centrifugal pump to submersible pump.

  15. A public database of macromolecular diffraction experiments.

    PubMed

    Grabowski, Marek; Langner, Karol M; Cymborowski, Marcin; Porebski, Przemyslaw J; Sroka, Piotr; Zheng, Heping; Cooper, David R; Zimmerman, Matthew D; Elsliger, Marc André; Burley, Stephen K; Minor, Wladek

    2016-11-01

    The low reproducibility of published experimental results in many scientific disciplines has recently garnered negative attention in scientific journals and the general media. Public transparency, including the availability of `raw' experimental data, will help to address growing concerns regarding scientific integrity. Macromolecular X-ray crystallography has led the way in requiring the public dissemination of atomic coordinates and a wealth of experimental data, making the field one of the most reproducible in the biological sciences. However, there remains no mandate for public disclosure of the original diffraction data. The Integrated Resource for Reproducibility in Macromolecular Crystallography (IRRMC) has been developed to archive raw data from diffraction experiments and, equally importantly, to provide related metadata. Currently, the database of our resource contains data from 2920 macromolecular diffraction experiments (5767 data sets), accounting for around 3% of all depositions in the Protein Data Bank (PDB), with their corresponding partially curated metadata. IRRMC utilizes distributed storage implemented using a federated architecture of many independent storage servers, which provides both scalability and sustainability. The resource, which is accessible via the web portal at http://www.proteindiffraction.org, can be searched using various criteria. All data are available for unrestricted access and download. The resource serves as a proof of concept and demonstrates the feasibility of archiving raw diffraction data and associated metadata from X-ray crystallographic studies of biological macromolecules. The goal is to expand this resource and include data sets that failed to yield X-ray structures in order to facilitate collaborative efforts that will improve protein structure-determination methods and to ensure the availability of `orphan' data left behind for various reasons by individual investigators and/or extinct structural genomics projects.

  16. On the Development of Speech Resources for the Mixtec Language

    PubMed Central

    2013-01-01

    The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users). PMID:23710134

  17. Exploring the Social Ecological Model Based on National Student Achievements: Extracting Educational Leaders' Role

    ERIC Educational Resources Information Center

    Shapira-Lishchinsky, Orly; Ben-Amram, Miri

    2018-01-01

    The purpose of this paper is to reexamine the effect of internal school factors such as school violence and class size, and external school factors such as family socio-economic resources on student math achievements, based on the social ecological model, eliciting an integrative approach. Data were collected from an Israeli national database,…

  18. From Telecommunications to Networking: The MELVYL Online Union Catalog and the Development of Intercampus Networks at the University of California.

    ERIC Educational Resources Information Center

    Lynch, Clifford A.

    1989-01-01

    Reviews the history of the network that supports the MELVYL online union catalog, describes current technological and policy issues, and discusses the role the network plays in integrating local automation, the union catalog, access to resource databases, and other initiatives. Sidebars by Mark Needleman discuss the TCP/IP protocol suite, internet…

  19. TheHiveDB image data management and analysis framework.

    PubMed

    Muehlboeck, J-Sebastian; Westman, Eric; Simmons, Andrew

    2014-01-06

    The hive database system (theHiveDB) is a web-based brain imaging database, collaboration, and activity system which has been designed as an imaging workflow management system capable of handling cross-sectional and longitudinal multi-center studies. It can be used to organize and integrate existing data from heterogeneous projects as well as data from ongoing studies. It has been conceived to guide and assist the researcher throughout the entire research process, integrating all relevant types of data across modalities (e.g., brain imaging, clinical, and genetic data). TheHiveDB is a modern activity and resource management system capable of scheduling image processing on both private compute resources and the cloud. The activity component supports common image archival and management tasks as well as established pipeline processing (e.g., Freesurfer for extraction of scalar measures from magnetic resonance images). Furthermore, via theHiveDB activity system algorithm developers may grant access to virtual machines hosting versioned releases of their tools to collaborators and the imaging community. The application of theHiveDB is illustrated with a brief use case based on organizing, processing, and analyzing data from the publically available Alzheimer Disease Neuroimaging Initiative.

  20. Target-Pathogen: a structural bioinformatic approach to prioritize drug targets in pathogens.

    PubMed

    Sosa, Ezequiel J; Burguener, Germán; Lanzarotti, Esteban; Defelipe, Lucas; Radusky, Leandro; Pardo, Agustín M; Marti, Marcelo; Turjanski, Adrián G; Fernández Do Porto, Darío

    2018-01-04

    Available genomic data for pathogens has created new opportunities for drug discovery and development to fight them, including new resistant and multiresistant strains. In particular structural data must be integrated with both, gene information and experimental results. In this sense, there is a lack of an online resource that allows genome wide-based data consolidation from diverse sources together with thorough bioinformatic analysis that allows easy filtering and scoring for fast target selection for drug discovery. Here, we present Target-Pathogen database (http://target.sbg.qb.fcen.uba.ar/patho), designed and developed as an online resource that allows the integration and weighting of protein information such as: function, metabolic role, off-targeting, structural properties including druggability, essentiality and omic experiments, to facilitate the identification and prioritization of candidate drug targets in pathogens. We include in the database 10 genomes of some of the most relevant microorganisms for human health (Mycobacterium tuberculosis, Mycobacterium leprae, Klebsiella pneumoniae, Plasmodium vivax, Toxoplasma gondii, Leishmania major, Wolbachia bancrofti, Trypanosoma brucei, Shigella dysenteriae and Schistosoma Smanosoni) and show its applicability. New genomes can be uploaded upon request. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. TheHiveDB image data management and analysis framework

    PubMed Central

    Muehlboeck, J-Sebastian; Westman, Eric; Simmons, Andrew

    2014-01-01

    The hive database system (theHiveDB) is a web-based brain imaging database, collaboration, and activity system which has been designed as an imaging workflow management system capable of handling cross-sectional and longitudinal multi-center studies. It can be used to organize and integrate existing data from heterogeneous projects as well as data from ongoing studies. It has been conceived to guide and assist the researcher throughout the entire research process, integrating all relevant types of data across modalities (e.g., brain imaging, clinical, and genetic data). TheHiveDB is a modern activity and resource management system capable of scheduling image processing on both private compute resources and the cloud. The activity component supports common image archival and management tasks as well as established pipeline processing (e.g., Freesurfer for extraction of scalar measures from magnetic resonance images). Furthermore, via theHiveDB activity system algorithm developers may grant access to virtual machines hosting versioned releases of their tools to collaborators and the imaging community. The application of theHiveDB is illustrated with a brief use case based on organizing, processing, and analyzing data from the publically available Alzheimer Disease Neuroimaging Initiative. PMID:24432000

  2. Save medical personnel's time by improved user interfaces.

    PubMed

    Kindler, H

    1997-01-01

    Common objectives in the industrial countries are the improvement of quality of care, clinical effectiveness, and cost control. Cost control, in particular, has been addressed through the introduction of case mix systems for reimbursement by social-security institutions. More data is required to enable quality improvement, increases in clinical effectiveness and for juridical reasons. At first glance, this documentation effort is contradictory to cost reduction. However, integrated services for resource management based on better documentation should help to reduce costs. The clerical effort for documentation should be decreased by providing a co-operative working environment for healthcare professionals applying sophisticated human-computer interface technology. Additional services, e.g., automatic report generation, increase the efficiency of healthcare personnel. Modelling the medical work flow forms an essential prerequisite for integrated resource management services and for co-operative user interfaces. A user interface aware of the work flow provides intelligent assistance by offering the appropriate tools at the right moment. Nowadays there is a trend to client/server systems with relational databases or object-oriented databases as repository. The work flows used for controlling purposes and to steer the user interfaces must be represented in the repository.

  3. Power system modeling and optimization methods vis-a-vis integrated resource planning (IRP)

    NASA Astrophysics Data System (ADS)

    Arsali, Mohammad H.

    1998-12-01

    The state-of-the-art restructuring of power industries is changing the fundamental nature of retail electricity business. As a result, the so-called Integrated Resource Planning (IRP) strategies implemented on electric utilities are also undergoing modifications. Such modifications evolve from the imminent considerations to minimize the revenue requirements and maximize electrical system reliability vis-a-vis capacity-additions (viewed as potential investments). IRP modifications also provide service-design bases to meet the customer needs towards profitability. The purpose of this research as deliberated in this dissertation is to propose procedures for optimal IRP intended to expand generation facilities of a power system over a stretched period of time. Relevant topics addressed in this research towards IRP optimization are as follows: (1) Historical prospective and evolutionary aspects of power system production-costing models and optimization techniques; (2) A survey of major U.S. electric utilities adopting IRP under changing socioeconomic environment; (3) A new technique designated as the Segmentation Method for production-costing via IRP optimization; (4) Construction of a fuzzy relational database of a typical electric power utility system for IRP purposes; (5) A genetic algorithm based approach for IRP optimization using the fuzzy relational database.

  4. Distribution System Upgrade Unit Cost Database

    DOE Data Explorer

    Horowitz, Kelsey

    2017-11-30

    This database contains unit cost information for different components that may be used to integrate distributed photovotaic (D-PV) systems onto distribution systems. Some of these upgrades and costs may also apply to integration of other distributed energy resources (DER). Which components are required, and how many of each, is system-specific and should be determined by analyzing the effects of distributed PV at a given penetration level on the circuit of interest in combination with engineering assessments on the efficacy of different solutions to increase the ability of the circuit to host additional PV as desired. The current state of the distribution system should always be considered in these types of analysis. The data in this database was collected from a variety of utilities, PV developers, technology vendors, and published research reports. Where possible, we have included information on the source of each data point and relevant notes. In some cases where data provided is sensitive or proprietary, we were not able to specify the source, but provide other information that may be useful to the user (e.g. year, location where equipment was installed). NREL has carefully reviewed these sources prior to inclusion in this database. Additional information about the database, data sources, and assumptions is included in the "Unit_cost_database_guide.doc" file included in this submission. This guide provides important information on what costs are included in each entry. Please refer to this guide before using the unit cost database for any purpose.

  5. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

    PubMed

    Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E

    2015-01-01

    The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. A Dynamic Integration Method for Borderland Database using OSM data

    NASA Astrophysics Data System (ADS)

    Zhou, X.-G.; Jiang, Y.; Zhou, K.-X.; Zeng, L.

    2013-11-01

    Spatial data is the fundamental of borderland analysis of the geography, natural resources, demography, politics, economy, and culture. As the spatial region used in borderland researching usually covers several neighboring countries' borderland regions, the data is difficult to achieve by one research institution or government. VGI has been proven to be a very successful means of acquiring timely and detailed global spatial data at very low cost. Therefore VGI will be one reasonable source of borderland spatial data. OpenStreetMap (OSM) has been known as the most successful VGI resource. But OSM data model is far different from the traditional authoritative geographic information. Thus the OSM data needs to be converted to the scientist customized data model. With the real world changing fast, the converted data needs to be updated. Therefore, a dynamic integration method for borderland data is presented in this paper. In this method, a machine study mechanism is used to convert the OSM data model to the user data model; a method used to select the changed objects in the researching area over a given period from OSM whole world daily diff file is presented, the change-only information file with designed form is produced automatically. Based on the rules and algorithms mentioned above, we enabled the automatic (or semiautomatic) integration and updating of the borderland database by programming. The developed system was intensively tested.

  7. Toward a public toxicogenomics capability for supporting predictive toxicology: survey of current resources and chemical indexing of experiments in GEO and ArrayExpress.

    PubMed

    Williams-Devane, ClarLynda R; Wolf, Maritja A; Richard, Ann M

    2009-06-01

    A publicly available toxicogenomics capability for supporting predictive toxicology and meta-analysis depends on availability of gene expression data for chemical treatment scenarios, the ability to locate and aggregate such information by chemical, and broad data coverage within chemical, genomics, and toxicological information domains. This capability also depends on common genomics standards, protocol description, and functional linkages of diverse public Internet data resources. We present a survey of public genomics resources from these vantage points and conclude that, despite progress in many areas, the current state of the majority of public microarray databases is inadequate for supporting these objectives, particularly with regard to chemical indexing. To begin to address these inadequacies, we focus chemical annotation efforts on experimental content contained in the two primary public genomic resources: ArrayExpress and Gene Expression Omnibus. Automated scripts and extensive manual review were employed to transform free-text experiment descriptions into a standardized, chemically indexed inventory of experiments in both resources. These files, which include top-level summary annotations, allow for identification of current chemical-associated experimental content, as well as chemical-exposure-related (or "Treatment") content of greatest potential value to toxicogenomics investigation. With these chemical-index files, it is possible for the first time to assess the breadth and overlap of chemical study space represented in these databases, and to begin to assess the sufficiency of data with shared protocols for chemical similarity inferences. Chemical indexing of public genomics databases is a first important step toward integrating chemical, toxicological and genomics data into predictive toxicology.

  8. HIVsirDB: a database of HIV inhibiting siRNAs.

    PubMed

    Tyagi, Atul; Ahmed, Firoz; Thakur, Nishant; Sharma, Arun; Raghava, Gajendra P S; Kumar, Manoj

    2011-01-01

    Human immunodeficiency virus (HIV) is responsible for millions of deaths every year. The current treatment involves the use of multiple antiretroviral agents that may harm patients due to their toxic nature. RNA interference (RNAi) is a potent candidate for the future treatment of HIV, uses short interfering RNA (siRNA/shRNA) for silencing HIV genes. In this study, attempts have been made to create a database HIVsirDB of siRNAs responsible for silencing HIV genes. HIVsirDB is a manually curated database of HIV inhibiting siRNAs that provides comprehensive information about each siRNA or shRNA. Information was collected and compiled from literature and public resources. This database contains around 750 siRNAs that includes 75 partially complementary siRNAs differing by one or more bases with the target sites and over 100 escape mutant sequences. HIVsirDB structure contains sixteen fields including siRNA sequence, HIV strain, targeted genome region, efficacy and conservation of target sequences. In order to facilitate user, many tools have been integrated in this database that includes; i) siRNAmap for mapping siRNAs on target sequence, ii) HIVsirblast for BLAST search against database, iii) siRNAalign for aligning siRNAs. HIVsirDB is a freely accessible database of siRNAs which can silence or degrade HIV genes. It covers 26 types of HIV strains and 28 cell types. This database will be very useful for developing models for predicting efficacy of HIV inhibiting siRNAs. In summary this is a useful resource for researchers working in the field of siRNA based HIV therapy. HIVsirDB database is accessible at http://crdd.osdd.net/raghava/hivsir/.

  9. Secondary Analysis and Integration of Existing Data to Elucidate the Genetic Architecture of Cancer Risk and Related Outcomes, R21 | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    This funding opportunity announcement (FOA) encourages applications that propose to conduct secondary data analysis and integration of existing datasets and database resources, with the ultimate aim to elucidate the genetic architecture of cancer risk and related outcomes. The goal of this initiative is to address key scientific questions relevant to cancer epidemiology by supporting the analysis of existing genetic or genomic datasets, possibly in combination with environmental, outcomes, behavioral, lifestyle, and molecular profiles data.

  10. Secondary Analysis and Integration of Existing Data to Elucidate the Genetic Architecture of Cancer Risk and Related Outcomes, R01 | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    This funding opportunity announcement (FOA) encourages applications that propose to conduct secondary data analysis and integration of existing datasets and database resources, with the ultimate aim to elucidate the genetic architecture of cancer risk and related outcomes. The goal of this initiative is to address key scientific questions relevant to cancer epidemiology by supporting the analysis of existing genetic or genomic datasets, possibly in combination with environmental, outcomes, behavioral, lifestyle, and molecular profiles data.

  11. Point of care use of a personal digital assistant for patient consultation management: experience of an intravenous resource nurse team in a major Canadian teaching hospital.

    PubMed

    Bosma, Laine; Balen, Robert M; Davidson, Erin; Jewesson, Peter J

    2003-01-01

    The development and integration of a personal digital assistant (PDA)-based point-of-care database into an intravenous resource nurse (IVRN) consultation service for the purposes of consultation management and service characterization are described. The IVRN team provides a consultation service 7 days a week in this 1000-bed tertiary adult care teaching hospital. No simple, reliable method for documenting IVRN patient care activity and facilitating IVRN-initiated patient follow-up evaluation was available. Implementation of a PDA database with exportability of data to statistical analysis software was undertaken in July 2001. A Palm IIIXE PDA was purchased and a three-table, 13-field database was developed using HanDBase software. During the 7-month period of data collection, the IVRN team recorded 4868 consultations for 40 patient care areas. Full analysis of service characteristics was conducted using SPSS 10.0 software. Team members adopted the new technology with few problems, and the authors now can efficiently track and analyze the services provided by their IVRN team.

  12. MetNetAPI: A flexible method to access and manipulate biological network data from MetNet

    PubMed Central

    2010-01-01

    Background Convenient programmatic access to different biological databases allows automated integration of scientific knowledge. Many databases support a function to download files or data snapshots, or a webservice that offers "live" data. However, the functionality that a database offers cannot be represented in a static data download file, and webservices may consume considerable computational resources from the host server. Results MetNetAPI is a versatile Application Programming Interface (API) to the MetNetDB database. It abstracts, captures and retains operations away from a biological network repository and website. A range of database functions, previously only available online, can be immediately (and independently from the website) applied to a dataset of interest. Data is available in four layers: molecular entities, localized entities (linked to a specific organelle), interactions, and pathways. Navigation between these layers is intuitive (e.g. one can request the molecular entities in a pathway, as well as request in what pathways a specific entity participates). Data retrieval can be customized: Network objects allow the construction of new and integration of existing pathways and interactions, which can be uploaded back to our server. In contrast to webservices, the computational demand on the host server is limited to processing data-related queries only. Conclusions An API provides several advantages to a systems biology software platform. MetNetAPI illustrates an interface with a central repository of data that represents the complex interrelationships of a metabolic and regulatory network. As an alternative to data-dumps and webservices, it allows access to a current and "live" database and exposes analytical functions to application developers. Yet it only requires limited resources on the server-side (thin server/fat client setup). The API is available for Java, Microsoft.NET and R programming environments and offers flexible query and broad data- retrieval methods. Data retrieval can be customized to client needs and the API offers a framework to construct and manipulate user-defined networks. The design principles can be used as a template to build programmable interfaces for other biological databases. The API software and tutorials are available at http://www.metnetonline.org/api. PMID:21083943

  13. Database integration in a multimedia-modeling environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dorow, Kevin E.

    2002-09-02

    Integration of data from disparate remote sources has direct applicability to modeling, which can support Brownfield assessments. To accomplish this task, a data integration framework needs to be established. A key element in this framework is the metadata that creates the relationship between the pieces of information that are important in the multimedia modeling environment and the information that is stored in the remote data source. The design philosophy is to allow modelers and database owners to collaborate by defining this metadata in such a way that allows interaction between their components. The main parts of this framework include toolsmore » to facilitate metadata definition, database extraction plan creation, automated extraction plan execution / data retrieval, and a central clearing house for metadata and modeling / database resources. Cross-platform compatibility (using Java) and standard communications protocols (http / https) allow these parts to run in a wide variety of computing environments (Local Area Networks, Internet, etc.), and, therefore, this framework provides many benefits. Because of the specific data relationships described in the metadata, the amount of data that have to be transferred is kept to a minimum (only the data that fulfill a specific request are provided as opposed to transferring the complete contents of a data source). This allows for real-time data extraction from the actual source. Also, the framework sets up collaborative responsibilities such that the different types of participants have control over the areas in which they have domain knowledge-the modelers are responsible for defining the data relevant to their models, while the database owners are responsible for mapping the contents of the database using the metadata definitions. Finally, the data extraction mechanism allows for the ability to control access to the data and what data are made available.« less

  14. Information-seeking behavior of basic science researchers: implications for library services.

    PubMed

    Haines, Laura L; Light, Jeanene; O'Malley, Donna; Delwiche, Frances A

    2010-01-01

    This study examined the information-seeking behaviors of basic science researchers to inform the development of customized library services. A qualitative study using semi-structured interviews was conducted on a sample of basic science researchers employed at a university medical school. The basic science researchers used a variety of information resources ranging from popular Internet search engines to highly technical databases. They generally relied on basic keyword searching, using the simplest interface of a database or search engine. They were highly collegial, interacting primarily with coworkers in their laboratories and colleagues employed at other institutions. They made little use of traditional library services and instead performed many traditional library functions internally. Although the basic science researchers expressed a positive attitude toward the library, they did not view its resources or services as integral to their work. To maximize their use by researchers, library resources must be accessible via departmental websites. Use of library services may be increased by cultivating relationships with key departmental administrative personnel. Despite their self-sufficiency, subjects expressed a desire for centralized information about ongoing research on campus and shared resources, suggesting a role for the library in creating and managing an institutional repository.

  15. Information-seeking behavior of basic science researchers: implications for library services

    PubMed Central

    Haines, Laura L.; Light, Jeanene; O'Malley, Donna; Delwiche, Frances A.

    2010-01-01

    Objectives: This study examined the information-seeking behaviors of basic science researchers to inform the development of customized library services. Methods: A qualitative study using semi-structured interviews was conducted on a sample of basic science researchers employed at a university medical school. Results: The basic science researchers used a variety of information resources ranging from popular Internet search engines to highly technical databases. They generally relied on basic keyword searching, using the simplest interface of a database or search engine. They were highly collegial, interacting primarily with coworkers in their laboratories and colleagues employed at other institutions. They made little use of traditional library services and instead performed many traditional library functions internally. Conclusions: Although the basic science researchers expressed a positive attitude toward the library, they did not view its resources or services as integral to their work. To maximize their use by researchers, library resources must be accessible via departmental websites. Use of library services may be increased by cultivating relationships with key departmental administrative personnel. Despite their self-sufficiency, subjects expressed a desire for centralized information about ongoing research on campus and shared resources, suggesting a role for the library in creating and managing an institutional repository. PMID:20098658

  16. EMMA—mouse mutant resources for the international scientific community

    PubMed Central

    Wilkinson, Phil; Sengerova, Jitka; Matteoni, Raffaele; Chen, Chao-Kung; Soulat, Gaetan; Ureta-Vidal, Abel; Fessele, Sabine; Hagn, Michael; Massimi, Marzia; Pickford, Karen; Butler, Richard H.; Marschall, Susan; Mallon, Ann-Marie; Pickard, Amanda; Raspa, Marcello; Scavizzi, Ferdinando; Fray, Martin; Larrigaldie, Vanessa; Leyritz, Johan; Birney, Ewan; Tocchini-Valentini, Glauco P.; Brown, Steve; Herault, Yann; Montoliu, Lluis; de Angelis, Martin Hrabé; Smedley, Damian

    2010-01-01

    The laboratory mouse is the premier animal model for studying human disease and thousands of mutants have been identified or produced, most recently through gene-specific mutagenesis approaches. High throughput strategies by the International Knockout Mouse Consortium (IKMC) are producing mutants for all protein coding genes. Generating a knock-out line involves huge monetary and time costs so capture of both the data describing each mutant alongside archiving of the line for distribution to future researchers is critical. The European Mouse Mutant Archive (EMMA) is a leading international network infrastructure for archiving and worldwide provision of mouse mutant strains. It operates in collaboration with the other members of the Federation of International Mouse Resources (FIMRe), EMMA being the European component. Additionally EMMA is one of four repositories involved in the IKMC, and therefore the current figure of 1700 archived lines will rise markedly. The EMMA database gathers and curates extensive data on each line and presents it through a user-friendly website. A BioMart interface allows advanced searching including integrated querying with other resources e.g. Ensembl. Other resources are able to display EMMA data by accessing our Distributed Annotation System server. EMMA database access is publicly available at http://www.emmanet.org. PMID:19783817

  17. Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

    PubMed

    Tatusova, Tatiana

    2016-01-01

    The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.

  18. Integrating the IA2 Astronomical Archive in the VO: The VO-Dance Engine

    NASA Astrophysics Data System (ADS)

    Molinaro, M.; Laurino, O.; Smareglia, R.

    2012-09-01

    Virtual Observatory (VO) protocols and standards are getting mature and the astronomical community asks for astrophysical data to be easily reachable. This means data centers have to intensify their efforts to provide the data they manage not only through proprietary portals and services but also through interoperable resources developed on the basis of the IVOA (International Virtual Observatory Alliance) recommendations. Here we present the work and ideas developed at the IA2 (Italian Astronomical Archive) data center hosted by the INAF-OATs (Italian Institute for Astrophysics - Trieste Astronomical Observatory) to reach this goal. The core point is the development of an application that from existing DB and archive structures can translate their content to VO compliant resources: VO-Dance (written in Java). This application, in turn, relies on a database (potentially DBMS independent) to store the translation layer information of each resource and auxiliary content (UCDs, field names, authorizations, policies, etc.). The last token is an administrative interface (currently developed using the Django python framework) to allow the data center administrators to set up and maintain resources. This deployment, platform independent, with database and administrative interface highly customizable, means the package, when stable and easily distributable, can be also used by single astronomers or groups to set up their own resources from their public datasets.

  19. PAGER 2.0: an update to the pathway, annotated-list and gene-signature electronic repository for Human Network Biology

    PubMed Central

    Yue, Zongliang; Zheng, Qi; Neylon, Michael T; Yoo, Minjae; Shin, Jimin; Zhao, Zhiying; Tan, Aik Choon

    2018-01-01

    Abstract Integrative Gene-set, Network and Pathway Analysis (GNPA) is a powerful data analysis approach developed to help interpret high-throughput omics data. In PAGER 1.0, we demonstrated that researchers can gain unbiased and reproducible biological insights with the introduction of PAGs (Pathways, Annotated-lists and Gene-signatures) as the basic data representation elements. In PAGER 2.0, we improve the utility of integrative GNPA by significantly expanding the coverage of PAGs and PAG-to-PAG relationships in the database, defining a new metric to quantify PAG data qualities, and developing new software features to simplify online integrative GNPA. Specifically, we included 84 282 PAGs spanning 24 different data sources that cover human diseases, published gene-expression signatures, drug–gene, miRNA–gene interactions, pathways and tissue-specific gene expressions. We introduced a new normalized Cohesion Coefficient (nCoCo) score to assess the biological relevance of genes inside a PAG, and RP-score to rank genes and assign gene-specific weights inside a PAG. The companion web interface contains numerous features to help users query and navigate the database content. The database content can be freely downloaded and is compatible with third-party Gene Set Enrichment Analysis tools. We expect PAGER 2.0 to become a major resource in integrative GNPA. PAGER 2.0 is available at http://discovery.informatics.uab.edu/PAGER/. PMID:29126216

  20. Influenza Virus Database (IVDB): an integrated information resource and analysis platform for influenza virus research.

    PubMed

    Chang, Suhua; Zhang, Jiajie; Liao, Xiaoyun; Zhu, Xinxing; Wang, Dahai; Zhu, Jiang; Feng, Tao; Zhu, Baoli; Gao, George F; Wang, Jian; Yang, Huanming; Yu, Jun; Wang, Jing

    2007-01-01

    Frequent outbreaks of highly pathogenic avian influenza and the increasing data available for comparative analysis require a central database specialized in influenza viruses (IVs). We have established the Influenza Virus Database (IVDB) to integrate information and create an analysis platform for genetic, genomic, and phylogenetic studies of the virus. IVDB hosts complete genome sequences of influenza A virus generated by Beijing Institute of Genomics (BIG) and curates all other published IV sequences after expert annotation. Our Q-Filter system classifies and ranks all nucleotide sequences into seven categories according to sequence content and integrity. IVDB provides a series of tools and viewers for comparative analysis of the viral genomes, genes, genetic polymorphisms and phylogenetic relationships. A search system has been developed for users to retrieve a combination of different data types by setting search options. To facilitate analysis of global viral transmission and evolution, the IV Sequence Distribution Tool (IVDT) has been developed to display the worldwide geographic distribution of chosen viral genotypes and to couple genomic data with epidemiological data. The BLAST, multiple sequence alignment and phylogenetic analysis tools were integrated for online data analysis. Furthermore, IVDB offers instant access to pre-computed alignments and polymorphisms of IV genes and proteins, and presents the results as SNP distribution plots and minor allele distributions. IVDB is publicly available at http://influenza.genomics.org.cn.

  1. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions.

    PubMed

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases. © The Author(s) 2016. Published by Oxford University Press.

  2. ACToR Chemical Structure processing using Open Source ...

    EPA Pesticide Factsheets

    ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from over 1,950 public sources. ACToR contains chemical structure information and toxicological data for over 558,000 unique chemicals. The database primarily includes data from NCCT research programs, in vivo toxicity data from ToxRef, human exposure data from ExpoCast, high-throughput screening data from ToxCast and high quality chemical structure information from the EPA DSSTox program. The DSSTox database is a chemical structure inventory for the NCCT programs and currently has about 16,000 unique structures. Included are also data from PubChem, ChemSpider, USDA, FDA, NIH and several other public data sources. ACToR has been a resource to various international and national research groups. Most of our recent efforts on ACToR are focused on improving the structural identifiers and Physico-Chemical properties of the chemicals in the database. Organizing this huge collection of data and improving the chemical structure quality of the database has posed some major challenges. Workflows have been developed to process structures, calculate chemical properties and identify relationships between CAS numbers. The Structure processing workflow integrates web services (PubChem and NIH NCI Cactus) to d

  3. Inconsistencies in the red blood cell membrane proteome analysis: generation of a database for research and diagnostic applications

    PubMed Central

    Hegedűs, Tamás; Chaubey, Pururawa Mayank; Várady, György; Szabó, Edit; Sarankó, Hajnalka; Hofstetter, Lia; Roschitzki, Bernd; Sarkadi, Balázs

    2015-01-01

    Based on recent results, the determination of the easily accessible red blood cell (RBC) membrane proteins may provide new diagnostic possibilities for assessing mutations, polymorphisms or regulatory alterations in diseases. However, the analysis of the current mass spectrometry-based proteomics datasets and other major databases indicates inconsistencies—the results show large scattering and only a limited overlap for the identified RBC membrane proteins. Here, we applied membrane-specific proteomics studies in human RBC, compared these results with the data in the literature, and generated a comprehensive and expandable database using all available data sources. The integrated web database now refers to proteomic, genetic and medical databases as well, and contains an unexpected large number of validated membrane proteins previously thought to be specific for other tissues and/or related to major human diseases. Since the determination of protein expression in RBC provides a method to indicate pathological alterations, our database should facilitate the development of RBC membrane biomarker platforms and provide a unique resource to aid related further research and diagnostics. Database URL: http://rbcc.hegelab.org PMID:26078478

  4. Collaborative Resource Allocation

    NASA Technical Reports Server (NTRS)

    Wang, Yeou-Fang; Wax, Allan; Lam, Raymond; Baldwin, John; Borden, Chester

    2007-01-01

    Collaborative Resource Allocation Networking Environment (CRANE) Version 0.5 is a prototype created to prove the newest concept of using a distributed environment to schedule Deep Space Network (DSN) antenna times in a collaborative fashion. This program is for all space-flight and terrestrial science project users and DSN schedulers to perform scheduling activities and conflict resolution, both synchronously and asynchronously. Project schedulers can, for the first time, participate directly in scheduling their tracking times into the official DSN schedule, and negotiate directly with other projects in an integrated scheduling system. A master schedule covers long-range, mid-range, near-real-time, and real-time scheduling time frames all in one, rather than the current method of separate functions that are supported by different processes and tools. CRANE also provides private workspaces (both dynamic and static), data sharing, scenario management, user control, rapid messaging (based on Java Message Service), data/time synchronization, workflow management, notification (including emails), conflict checking, and a linkage to a schedule generation engine. The data structure with corresponding database design combines object trees with multiple associated mortal instances and relational database to provide unprecedented traceability and simplify the existing DSN XML schedule representation. These technologies are used to provide traceability, schedule negotiation, conflict resolution, and load forecasting from real-time operations to long-range loading analysis up to 20 years in the future. CRANE includes a database, a stored procedure layer, an agent-based middle tier, a Web service wrapper, a Windows Integrated Analysis Environment (IAE), a Java application, and a Web page interface.

  5. A Toolkit for ARB to Integrate Custom Databases and Externally Built Phylogenies

    DOE PAGES

    Essinger, Steven D.; Reichenberger, Erin; Morrison, Calvin; ...

    2015-01-21

    Researchers are perpetually amassing biological sequence data. The computational approaches employed by ecologists for organizing this data (e.g. alignment, phylogeny, etc.) typically scale nonlinearly in execution time with the size of the dataset. This often serves as a bottleneck for processing experimental data since many molecular studies are characterized by massive datasets. To keep up with experimental data demands, ecologists are forced to choose between continually upgrading expensive in-house computer hardware or outsourcing the most demanding computations to the cloud. Outsourcing is attractive since it is the least expensive option, but does not necessarily allow direct user interaction with themore » data for exploratory analysis. Desktop analytical tools such as ARB are indispensable for this purpose, but they do not necessarily offer a convenient solution for the coordination and integration of datasets between local and outsourced destinations. Therefore, researchers are currently left with an undesirable tradeoff between computational throughput and analytical capability. To mitigate this tradeoff we introduce a software package to leverage the utility of the interactive exploratory tools offered by ARB with the computational throughput of cloud-based resources. Our pipeline serves as middleware between the desktop and the cloud allowing researchers to form local custom databases containing sequences and metadata from multiple resources and a method for linking data outsourced for computation back to the local database. Furthermore, a tutorial implementation of the toolkit is provided in the supporting information, S1 Tutorial.« less

  6. A Toolkit for ARB to Integrate Custom Databases and Externally Built Phylogenies

    PubMed Central

    Essinger, Steven D.; Reichenberger, Erin; Morrison, Calvin; Blackwood, Christopher B.; Rosen, Gail L.

    2015-01-01

    Researchers are perpetually amassing biological sequence data. The computational approaches employed by ecologists for organizing this data (e.g. alignment, phylogeny, etc.) typically scale nonlinearly in execution time with the size of the dataset. This often serves as a bottleneck for processing experimental data since many molecular studies are characterized by massive datasets. To keep up with experimental data demands, ecologists are forced to choose between continually upgrading expensive in-house computer hardware or outsourcing the most demanding computations to the cloud. Outsourcing is attractive since it is the least expensive option, but does not necessarily allow direct user interaction with the data for exploratory analysis. Desktop analytical tools such as ARB are indispensable for this purpose, but they do not necessarily offer a convenient solution for the coordination and integration of datasets between local and outsourced destinations. Therefore, researchers are currently left with an undesirable tradeoff between computational throughput and analytical capability. To mitigate this tradeoff we introduce a software package to leverage the utility of the interactive exploratory tools offered by ARB with the computational throughput of cloud-based resources. Our pipeline serves as middleware between the desktop and the cloud allowing researchers to form local custom databases containing sequences and metadata from multiple resources and a method for linking data outsourced for computation back to the local database. A tutorial implementation of the toolkit is provided in the supporting information, S1 Tutorial. Availability: http://www.ece.drexel.edu/gailr/EESI/tutorial.php. PMID:25607539

  7. CycADS: an annotation database system to ease the development and update of BioCyc databases

    PubMed Central

    Vellozo, Augusto F.; Véron, Amélie S.; Baa-Puyoulet, Patrice; Huerta-Cepas, Jaime; Cottret, Ludovic; Febvay, Gérard; Calevro, Federica; Rahbé, Yvan; Douglas, Angela E.; Gabaldón, Toni; Sagot, Marie-France; Charles, Hubert; Colella, Stefano

    2011-01-01

    In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms. Database URL: http://www.cycadsys.org PMID:21474551

  8. An update on the Enzyme Portal: an integrative approach for exploring enzyme knowledge

    PubMed Central

    Onwubiko, J.; Zaru, R.; Rosanoff, S.; Antunes, R.; Bingley, M.; Watkins, X.; O'Donovan, C.; Martin, M. J.

    2017-01-01

    Abstract Enzymes are a key part of life processes and are increasingly important for various areas of research such as medicine, biotechnology, bioprocessing and drug research. The goal of the Enzyme Portal is to provide an interface to all European Bioinformatics Institute (EMBL-EBI) data about enzymes (de Matos, P., et al., (2013), BMC Bioinformatics, 14 (1), 103). These data include enzyme function, sequence features and family classification, protein structure, reactions, pathways, small molecules, diseases and the associated literature. The sources of enzyme data are: the UniProt Knowledgebase (UniProtKB) (UniProt Consortium, 2015), the Protein Data Bank in Europe (PDBe), (Valenkar, S., et al., Nucleic Acids Res.2016; 44, D385–D395) Rhea—a database of enzyme-catalysed reactions (Morgat, A., et al., Nucleic Acids Res. 2015; 43, D459-D464), Reactome—a database of biochemical pathways (Fabregat, A., et al., Nucleic Acids Res. 2016; 44, D481–D487), IntEnz—a resource with enzyme nomenclature information (Fleischmann, A., et al., Nucleic Acids Res. 2004 32, D434–D437) and ChEBI (Hastings, J., et al., Nucleic Acids Res. 2013) and ChEMBL (Bento, A. P., et al., Nucleic Acids Res. 201442, 1083–1090)—resources which contain information about small-molecule chemistry and bioactivity. This article describes the redesign of Enzyme Portal and the increased functionality added to maximise integration and interpretation of these data. Use case examples of the Enzyme Portal and the versatile workflows its supports are illustrated. We welcome the suggestion of new resources for integration. PMID:28158609

  9. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics.

    PubMed

    Omasits, Ulrich; Varadarajan, Adithi R; Schmid, Michael; Goetze, Sandra; Melidis, Damianos; Bourqui, Marc; Nikolayeva, Olga; Québatte, Maxime; Patrignani, Andrea; Dehio, Christoph; Frey, Juerg E; Robinson, Mark D; Wollscheid, Bernd; Ahrens, Christian H

    2017-12-01

    Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae , Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote. © 2017 Omasits et al.; Published by Cold Spring Harbor Laboratory Press.

  10. An update on the Enzyme Portal: an integrative approach for exploring enzyme knowledge.

    PubMed

    Pundir, S; Onwubiko, J; Zaru, R; Rosanoff, S; Antunes, R; Bingley, M; Watkins, X; O'Donovan, C; Martin, M J

    2017-03-01

    Enzymes are a key part of life processes and are increasingly important for various areas of research such as medicine, biotechnology, bioprocessing and drug research. The goal of the Enzyme Portal is to provide an interface to all European Bioinformatics Institute (EMBL-EBI) data about enzymes (de Matos, P., et al. , (2013), BMC Bioinformatics , (1), 103). These data include enzyme function, sequence features and family classification, protein structure, reactions, pathways, small molecules, diseases and the associated literature. The sources of enzyme data are: the UniProt Knowledgebase (UniProtKB) (UniProt Consortium, 2015), the Protein Data Bank in Europe (PDBe), (Valenkar, S., et al ., Nucleic Acids Res. 2016; , D385-D395) Rhea-a database of enzyme-catalysed reactions (Morgat, A., et al .,  Nucleic Acids Res.  2015; , D459-D464), Reactome-a database of biochemical pathways (Fabregat, A., et al ., Nucleic Acids Res. 2016;  , D481-D487), IntEnz-a resource with enzyme nomenclature information (Fleischmann, A., et al ., Nucleic Acids Res.  2004 , D434-D437) and ChEBI (Hastings, J., et al .,  Nucleic Acids Res. 2013) and ChEMBL (Bento, A. P., et al ., Nucleic Acids Res.  2014 , 1083-1090)-resources which contain information about small-molecule chemistry and bioactivity. This article describes the redesign of Enzyme Portal and the increased functionality added to maximise integration and interpretation of these data. Use case examples of the Enzyme Portal and the versatile workflows its supports are illustrated. We welcome the suggestion of new resources for integration. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  11. A GIS-Enabled, Michigan-Specific, Hierarchical Groundwater Modeling and Visualization System

    NASA Astrophysics Data System (ADS)

    Liu, Q.; Li, S.; Mandle, R.; Simard, A.; Fisher, B.; Brown, E.; Ross, S.

    2005-12-01

    Efficient management of groundwater resources relies on a comprehensive database that represents the characteristics of the natural groundwater system as well as analysis and modeling tools to describe the impacts of decision alternatives. Many agencies in Michigan have spent several years compiling expensive and comprehensive surface water and groundwater inventories and other related spatial data that describe their respective areas of responsibility. However, most often this wealth of descriptive data has only been utilized for basic mapping purposes. The benefits from analyzing these data, using GIS analysis functions or externally developed analysis models or programs, has yet to be systematically realized. In this talk, we present a comprehensive software environment that allows Michigan groundwater resources managers and frontline professionals to make more effective use of the available data and improve their ability to manage and protect groundwater resources, address potential conflicts, design cleanup schemes, and prioritize investigation activities. In particular, we take advantage of the Interactive Ground Water (IGW) modeling system and convert it to a customized software environment specifically for analyzing, modeling, and visualizing the Michigan statewide groundwater database. The resulting Michigan IGW modeling system (IGW-M) is completely window-based, fully interactive, and seamlessly integrated with a GIS mapping engine. The system operates in real-time (on the fly) providing dynamic, hierarchical mapping, modeling, spatial analysis, and visualization. Specifically, IGW-M allows water resources and environmental professionals in Michigan to: * Access and utilize the extensive data from the statewide groundwater database, interactively manipulate GIS objects, and display and query the associated data and attributes; * Analyze and model the statewide groundwater database, interactively convert GIS objects into numerical model features, automatically extract data and attributes, and simulate unsteady groundwater flow and contaminant transport in response to water and land management decisions; * Visualize and map model simulations and predictions with data from the statewide groundwater database in a seamless interactive environment. IGW-M has the potential to significantly improve the productivity of Michigan groundwater management investigations. It changes the role of engineers and scientists in modeling and analyzing the statewide groundwater database from heavily physical to cognitive problem-solving and decision-making tasks. The seamless real-time integration, real-time visual interaction, and real-time processing capability allows a user to focus on critical management issues, conflicts, and constraints, to quickly and iteratively examine conceptual approximations, management and planning scenarios, and site characterization assumptions, to identify dominant processes, to evaluate data worth and sensitivity, and to guide further data-collection activities. We illustrate the power and effectiveness of the M-IGW modeling and visualization system with a real case study and a real-time, live demonstration.

  12. The Modular Modeling System (MMS): A modeling framework for water- and environmental-resources management

    USGS Publications Warehouse

    Leavesley, G.H.; Markstrom, S.L.; Viger, R.J.

    2004-01-01

    The interdisciplinary nature and increasing complexity of water- and environmental-resource problems require the use of modeling approaches that can incorporate knowledge from a broad range of scientific disciplines. The large number of distributed hydrological and ecosystem models currently available are composed of a variety of different conceptualizations of the associated processes they simulate. Assessment of the capabilities of these distributed models requires evaluation of the conceptualizations of the individual processes, and the identification of which conceptualizations are most appropriate for various combinations of criteria, such as problem objectives, data constraints, and spatial and temporal scales of application. With this knowledge, "optimal" models for specific sets of criteria can be created and applied. The U.S. Geological Survey (USGS) Modular Modeling System (MMS) is an integrated system of computer software that has been developed to provide these model development and application capabilities. MMS supports the integration of models and tools at a variety of levels of modular design. These include individual process models, tightly coupled models, loosely coupled models, and fully-integrated decision support systems. A variety of visualization and statistical tools are also provided. MMS has been coupled with the Bureau of Reclamation (BOR) object-oriented reservoir and river-system modeling framework, RiverWare, under a joint USGS-BOR program called the Watershed and River System Management Program. MMS and RiverWare are linked using a shared relational database. The resulting database-centered decision support system provides tools for evaluating and applying optimal resource-allocation and management strategies to complex, operational decisions on multipurpose reservoir systems and watersheds. Management issues being addressed include efficiency of water-resources management, environmental concerns such as meeting flow needs for endangered species, and optimizing operations within the constraints of multiple objectives such as power generation, irrigation, and water conservation. This decision support system approach is being developed, tested, and implemented in the Gunni-son, Yakima, San Juan, Rio Grande, and Truckee River basins of the western United States. Copyright ASCE 2004.

  13. [Establishment of a comprehensive database for laryngeal cancer related genes and the miRNAs].

    PubMed

    Li, Mengjiao; E, Qimin; Liu, Jialin; Huang, Tingting; Liang, Chuanyu

    2015-09-01

    By collecting and analyzing the laryngeal cancer related genes and the miRNAs, to build a comprehensive laryngeal cancer-related gene database, which differs from the current biological information database with complex and clumsy structure and focuses on the theme of gene and miRNA, and it could make the research and teaching more convenient and efficient. Based on the B/S architecture, using Apache as a Web server, MySQL as coding language of database design and PHP as coding language of web design, a comprehensive database for laryngeal cancer-related genes was established, providing with the gene tables, protein tables, miRNA tables and clinical information tables of the patients with laryngeal cancer. The established database containsed 207 laryngeal cancer related genes, 243 proteins, 26 miRNAs, and their particular information such as mutations, methylations, diversified expressions, and the empirical references of laryngeal cancer relevant molecules. The database could be accessed and operated via the Internet, by which browsing and retrieval of the information were performed. The database were maintained and updated regularly. The database for laryngeal cancer related genes is resource-integrated and user-friendly, providing a genetic information query tool for the study of laryngeal cancer.

  14. MEIMAN: Database exploring Medicinal and Edible insects of Manipur

    PubMed Central

    Shantibala, Tourangbam; Lokeshwari, Rajkumari; Thingnam, Gourshyam; Somkuwar, Bharat Gopalrao

    2012-01-01

    We have developed MEIMAN, a unique database on medicinal and edible insects of Manipur which comprises 51 insects species collected through extensive survey and questionnaire for two years. MEIMAN provides integrated access to insect species thorough sophisticated web interface which has following capabilities a) Graphical interface of seasonality, b) Method of preparation, c) Form of use - edible and medicinal, d) habitat, e) medicinal uses, f) commercial importance and g) economic status. This database will be useful for scientific validations and updating of traditional wisdom in bioprospecting aspects. It will be useful in analyzing the insect biodiversity for the development of virgin resources and their industrialization. Further, the features will be suited for detailed investigation on potential medicinal and edible insects that make MEIMAN a powerful tool for sustainable management. Availability The database is available for free at www.ibsd.gov.in/meiman PMID:22715305

  15. Global catalogue of microorganisms (gcm): a comprehensive database and information retrieval, analysis, and visualization system for microbial resources

    PubMed Central

    2013-01-01

    Background Throughout the long history of industrial and academic research, many microbes have been isolated, characterized and preserved (whenever possible) in culture collections. With the steady accumulation in observational data of biodiversity as well as microbial sequencing data, bio-resource centers have to function as data and information repositories to serve academia, industry, and regulators on behalf of and for the general public. Hence, the World Data Centre for Microorganisms (WDCM) started to take its responsibility for constructing an effective information environment that would promote and sustain microbial research data activities, and bridge the gaps currently present within and outside the microbiology communities. Description Strain catalogue information was collected from collections by online submission. We developed tools for automatic extraction of strain numbers and species names from various sources, including Genbank, Pubmed, and SwissProt. These new tools connect strain catalogue information with the corresponding nucleotide and protein sequences, as well as to genome sequence and references citing a particular strain. All information has been processed and compiled in order to create a comprehensive database of microbial resources, and was named Global Catalogue of Microorganisms (GCM). The current version of GCM contains information of over 273,933 strains, which includes 43,436bacterial, fungal and archaea species from 52 collections in 25 countries and regions. A number of online analysis and statistical tools have been integrated, together with advanced search functions, which should greatly facilitate the exploration of the content of GCM. Conclusion A comprehensive dynamic database of microbial resources has been created, which unveils the resources preserved in culture collections especially for those whose informatics infrastructures are still under development, which should foster cumulative research, facilitating the activities of microbiologists world-wide, who work in both public and industrial research centres. This database is available from http://gcm.wfcc.info. PMID:24377417

  16. An integrative computational approach for prioritization of genomic variants

    DOE PAGES

    Dubchak, Inna; Balasubramanian, Sandhya; Wang, Sheng; ...

    2014-12-15

    An essential step in the discovery of molecular mechanisms contributing to disease phenotypes and efficient experimental planning is the development of weighted hypotheses that estimate the functional effects of sequence variants discovered by high-throughput genomics. With the increasing specialization of the bioinformatics resources, creating analytical workflows that seamlessly integrate data and bioinformatics tools developed by multiple groups becomes inevitable. Here we present a case study of a use of the distributed analytical environment integrating four complementary specialized resources, namely the Lynx platform, VISTA RViewer, the Developmental Brain Disorders Database (DBDB), and the RaptorX server, for the identification of high-confidence candidatemore » genes contributing to pathogenesis of spina bifida. The analysis resulted in prediction and validation of deleterious mutations in the SLC19A placental transporter in mothers of the affected children that causes narrowing of the outlet channel and therefore leads to the reduced folate permeation rate. The described approach also enabled correct identification of several genes, previously shown to contribute to pathogenesis of spina bifida, and suggestion of additional genes for experimental validations. This study demonstrates that the seamless integration of bioinformatics resources enables fast and efficient prioritization and characterization of genomic factors and molecular networks contributing to the phenotypes of interest.« less

  17. Prairie Resources

    Science.gov Websites

    Search Prairie Resources for Students Plant Database Plant Database Butterfly Info Butterfly Info Insects Insect Database Frogs Frog Info Bird Database Bird Database Online Prairie Data Online Prairie Data

  18. Publishing Linked Open Data for Physical Samples - Lessons Learned

    NASA Astrophysics Data System (ADS)

    Ji, P.; Arko, R. A.; Lehnert, K.; Bristol, S.

    2016-12-01

    Most data and information about physical samples and associated sampling features currently reside in relational databases. Integrating common concepts from various databases has motivated us to publish Linked Open Data for collections of physical samples, using Semantic Web technologies including the Resource Description Framework (RDF), RDF Query Language (SPARQL), and Web Ontology Language (OWL). The goal of our work is threefold: To evaluate and select ontologies in different granularities for common concepts; to establish best practices and develop a generic methodology for publishing physical sample data stored in relational database as Linked Open Data; and to reuse standard community vocabularies from the International Commission on Stratigraphy (ICS), Global Volcanism Program (GVP), General Bathymetric Chart of the Oceans (GEBCO), and others. Our work leverages developments in the EarthCube GeoLink project and the Interdisciplinary Earth Data Alliance (IEDA) facility for modeling and extracting physical sample data stored in relational databases. Reusing ontologies developed by GeoLink and IEDA has facilitated discovery and integration of data and information across multiple collections including the USGS National Geochemical Database (NGDB), System for Earth Sample Registration (SESAR), and Index to Marine & Lacustrine Geological Samples (IMLGS). We have evaluated, tested, and deployed Linked Open Data tools including Morph, Virtuoso Server, LodView, LodLive, and YASGUI for converting, storing, representing, and querying data in a knowledge base (RDF triplestore). Using persistent identifiers such as Open Researcher & Contributor IDs (ORCIDs) and International Geo Sample Numbers (IGSNs) at the record level makes it possible for other repositories to link related resources such as persons, datasets, documents, expeditions, awards, etc. to samples, features, and collections. This work is supported by the EarthCube "GeoLink" project (NSF# ICER14-40221 and others) and the "USGS-IEDA Partnership to Support a Data Lifecycle Framework and Tools" project (USGS# G13AC00381).

  19. EPA Facility Registry Service (FRS): Facility Interests Dataset - Intranet

    EPA Pesticide Factsheets

    This web feature service consists of location and facility identification information from EPA's Facility Registry Service (FRS) for all sites that are available in the FRS individual feature layers. The layers comprise the FRS major program databases, including:Assessment Cleanup and Redevelopment Exchange System (ACRES) : brownfields sites ; Air Facility System (AFS) : stationary sources of air pollution ; Air Quality System (AQS) : ambient air pollution data from monitoring stations; Bureau of Indian Affairs (BIA) : schools data on Indian land; Base Realignment and Closure (BRAC) facilities; Clean Air Markets Division Business System (CAMDBS) : market-based air pollution control programs; Comprehensive Environmental Response, Compensation, and Liability Information System (CERCLIS) : hazardous waste sites; Integrated Compliance Information System (ICIS) : integrated enforcement and compliance information; National Compliance Database (NCDB) : Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA) and the Toxic Substances Control Act (TSCA); National Pollutant Discharge Elimination System (NPDES) module of ICIS : NPDES surface water permits; Radiation Information Database (RADINFO) : radiation and radioactivity facilities; RACT/BACT/LAER Clearinghouse (RBLC) : best available air pollution technology requirements; Resource Conservation and Recovery Act Information System (RCRAInfo) : tracks generators, transporters, treaters, storers, and disposers of haz

  20. EPA Facility Registry Service (FRS): Facility Interests Dataset - Intranet Download

    EPA Pesticide Factsheets

    This downloadable data package consists of location and facility identification information from EPA's Facility Registry Service (FRS) for all sites that are available in the FRS individual feature layers. The layers comprise the FRS major program databases, including:Assessment Cleanup and Redevelopment Exchange System (ACRES) : brownfields sites ; Air Facility System (AFS) : stationary sources of air pollution ; Air Quality System (AQS) : ambient air pollution data from monitoring stations; Bureau of Indian Affairs (BIA) : schools data on Indian land; Base Realignment and Closure (BRAC) facilities; Clean Air Markets Division Business System (CAMDBS) : market-based air pollution control programs; Comprehensive Environmental Response, Compensation, and Liability Information System (CERCLIS) : hazardous waste sites; Integrated Compliance Information System (ICIS) : integrated enforcement and compliance information; National Compliance Database (NCDB) : Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA) and the Toxic Substances Control Act (TSCA); National Pollutant Discharge Elimination System (NPDES) module of ICIS : NPDES surface water permits; Radiation Information Database (RADINFO) : radiation and radioactivity facilities; RACT/BACT/LAER Clearinghouse (RBLC) : best available air pollution technology requirements; Resource Conservation and Recovery Act Information System (RCRAInfo) : tracks generators, transporters, treaters, storers, and disposers

  1. The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching

    PubMed Central

    Howe, Douglas G.; Bradford, Yvonne M.; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte

    2017-01-01

    The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, ‘Fish’ records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search. PMID:27899582

  2. EPA Facility Registry Service (FRS): Facility Interests Dataset Download

    EPA Pesticide Factsheets

    This downloadable data package consists of location and facility identification information from EPA's Facility Registry Service (FRS) for all sites that are available in the FRS individual feature layers. The layers comprise the FRS major program databases, including:Assessment Cleanup and Redevelopment Exchange System (ACRES) : brownfields sites ; Air Facility System (AFS) : stationary sources of air pollution ; Air Quality System (AQS) : ambient air pollution data from monitoring stations; Bureau of Indian Affairs (BIA) : schools data on Indian land; Base Realignment and Closure (BRAC) facilities; Clean Air Markets Division Business System (CAMDBS) : market-based air pollution control programs; Comprehensive Environmental Response, Compensation, and Liability Information System (CERCLIS) : hazardous waste sites; Integrated Compliance Information System (ICIS) : integrated enforcement and compliance information; National Compliance Database (NCDB) : Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA) and the Toxic Substances Control Act (TSCA); National Pollutant Discharge Elimination System (NPDES) module of ICIS : NPDES surface water permits; Radiation Information Database (RADINFO) : radiation and radioactivity facilities; RACT/BACT/LAER Clearinghouse (RBLC) : best available air pollution technology requirements; Resource Conservation and Recovery Act Information System (RCRAInfo) : tracks generators, transporters, treaters, storers, and disposers

  3. EPA Facility Registry Service (FRS): Facility Interests Dataset

    EPA Pesticide Factsheets

    This web feature service consists of location and facility identification information from EPA's Facility Registry Service (FRS) for all sites that are available in the FRS individual feature layers. The layers comprise the FRS major program databases, including:Assessment Cleanup and Redevelopment Exchange System (ACRES) : brownfields sites ; Air Facility System (AFS) : stationary sources of air pollution ; Air Quality System (AQS) : ambient air pollution data from monitoring stations; Bureau of Indian Affairs (BIA) : schools data on Indian land; Base Realignment and Closure (BRAC) facilities; Clean Air Markets Division Business System (CAMDBS) : market-based air pollution control programs; Comprehensive Environmental Response, Compensation, and Liability Information System (CERCLIS) : hazardous waste sites; Integrated Compliance Information System (ICIS) : integrated enforcement and compliance information; National Compliance Database (NCDB) : Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA) and the Toxic Substances Control Act (TSCA); National Pollutant Discharge Elimination System (NPDES) module of ICIS : NPDES surface water permits; Radiation Information Database (RADINFO) : radiation and radioactivity facilities; RACT/BACT/LAER Clearinghouse (RBLC) : best available air pollution technology requirements; Resource Conservation and Recovery Act Information System (RCRAInfo) : tracks generators, transporters, treaters, storers, and disposers of haz

  4. An Accessible Proteogenomics Informatics Resource for Cancer Researchers.

    PubMed

    Chambers, Matthew C; Jagtap, Pratik D; Johnson, James E; McGowan, Thomas; Kumar, Praveen; Onsongo, Getiria; Guerrero, Candace R; Barsnes, Harald; Vaudel, Marc; Martens, Lennart; Grüning, Björn; Cooke, Ira R; Heydarian, Mohammad; Reddy, Karen L; Griffin, Timothy J

    2017-11-01

    Proteogenomics has emerged as a valuable approach in cancer research, which integrates genomic and transcriptomic data with mass spectrometry-based proteomics data to directly identify expressed, variant protein sequences that may have functional roles in cancer. This approach is computationally intensive, requiring integration of disparate software tools into sophisticated workflows, challenging its adoption by nonexpert, bench scientists. To address this need, we have developed an extensible, Galaxy-based resource aimed at providing more researchers access to, and training in, proteogenomic informatics. Our resource brings together software from several leading research groups to address two foundational aspects of proteogenomics: (i) generation of customized, annotated protein sequence databases from RNA-Seq data; and (ii) accurate matching of tandem mass spectrometry data to putative variants, followed by filtering to confirm their novelty. Directions for accessing software tools and workflows, along with instructional documentation, can be found at z.umn.edu/canresgithub. Cancer Res; 77(21); e43-46. ©2017 AACR . ©2017 American Association for Cancer Research.

  5. genenames.org: the HGNC resources in 2011

    PubMed Central

    Seal, Ruth L.; Gordon, Susan M.; Lush, Michael J.; Wright, Mathew W.; Bruford, Elspeth A.

    2011-01-01

    The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique gene symbol and name to every human gene. The HGNC database currently contains almost 30 000 approved gene symbols, over 19 000 of which represent protein-coding genes. The public website, www.genenames.org, displays all approved nomenclature within Symbol Reports that contain data curated by HGNC editors and links to related genomic, phenotypic and proteomic information. Here we describe improvements to our resources, including a new Quick Gene Search, a new List Search, an integrated HGNC BioMart and a new Statistics and Downloads facility. PMID:20929869

  6. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry.

    PubMed

    Williams, Antony J; Grulke, Christopher M; Edwards, Jeff; McEachran, Andrew D; Mansouri, Kamel; Baker, Nancy C; Patlewicz, Grace; Shah, Imran; Wambaugh, John F; Judson, Richard S; Richard, Ann M

    2017-11-28

    Despite an abundance of online databases providing access to chemical data, there is increasing demand for high-quality, structure-curated, open data to meet the various needs of the environmental sciences and computational toxicology communities. The U.S. Environmental Protection Agency's (EPA) web-based CompTox Chemistry Dashboard is addressing these needs by integrating diverse types of relevant domain data through a cheminformatics layer, built upon a database of curated substances linked to chemical structures. These data include physicochemical, environmental fate and transport, exposure, usage, in vivo toxicity, and in vitro bioassay data, surfaced through an integration hub with link-outs to additional EPA data and public domain online resources. Batch searching allows for direct chemical identifier (ID) mapping and downloading of multiple data streams in several different formats. This facilitates fast access to available structure, property, toxicity, and bioassay data for collections of chemicals (hundreds to thousands at a time). Advanced search capabilities are available to support, for example, non-targeted analysis and identification of chemicals using mass spectrometry. The contents of the chemistry database, presently containing ~ 760,000 substances, are available as public domain data for download. The chemistry content underpinning the Dashboard has been aggregated over the past 15 years by both manual and auto-curation techniques within EPA's DSSTox project. DSSTox chemical content is subject to strict quality controls to enforce consistency among chemical substance-structure identifiers, as well as list curation review to ensure accurate linkages of DSSTox substances to chemical lists and associated data. The Dashboard, publicly launched in April 2016, has expanded considerably in content and user traffic over the past year. It is continuously evolving with the growth of DSSTox into high-interest or data-rich domains of interest to EPA, such as chemicals on the Toxic Substances Control Act listing, while providing the user community with a flexible and dynamic web-based platform for integration, processing, visualization and delivery of data and resources. The Dashboard provides support for a broad array of research and regulatory programs across the worldwide community of toxicologists and environmental scientists.

  7. A Brief Review of RNA–Protein Interaction Database Resources

    PubMed Central

    Yi, Ying; Zhao, Yue; Huang, Yan; Wang, Dong

    2017-01-01

    RNA–Protein interactions play critical roles in various biological processes. By collecting and analyzing the RNA–Protein interactions and binding sites from experiments and predictions, RNA–Protein interaction databases have become an essential resource for the exploration of the transcriptional and post-transcriptional regulatory network. Here, we briefly review several widely used RNA–Protein interaction database resources developed in recent years to provide a guide of these databases. The content and major functions in databases are presented. The brief description of database helps users to quickly choose the database containing information they interested. In short, these RNA–Protein interaction database resources are continually updated, but the current state shows the efforts to identify and analyze the large amount of RNA–Protein interactions. PMID:29657278

  8. Technologies and standards in the information systems of the soil-geographic database of Russia

    NASA Astrophysics Data System (ADS)

    Golozubov, O. M.; Rozhkov, V. A.; Alyabina, I. O.; Ivanov, A. V.; Kolesnikova, V. M.; Shoba, S. A.

    2015-01-01

    The achievements, problems, and challenges of the modern stage of the development of the Soil-Geographic Database of Russia (SGDBR) and the history of this project are outlined. The structure of the information system of the SGDBR as an internet-based resource to collect data on soil profiles and to integrate the geographic and attribute databases on the same platform is described. The pilot project in Rostov oblast illustrates the inclusion of regional information in the SGDBR and its application for solving practical problems. For the first time in Russia, the GeoRSS standard based on the structured hypertext representation of the geographic and attribute information has been applied in the state system for the agromonitoring of agricultural lands in Rostov oblast and information exchange through the internet.

  9. Design of Knowledge Bases for Plant Gene Regulatory Networks.

    PubMed

    Mukundi, Eric; Gomez-Cano, Fabio; Ouma, Wilberforce Zachary; Grotewold, Erich

    2017-01-01

    Developing a knowledge base that contains all the information necessary for the researcher studying gene regulation in a particular organism can be accomplished in four stages. This begins with defining the data scope. We describe here the necessary information and resources, and outline the methods for obtaining data. The second stage consists of designing the schema, which involves defining the entire arrangement of the database in a systematic plan. The third stage is the implementation, defined by actualization of the database by using software according to a predefined schema. The final stage is development, where the database is made available to users in a web-accessible system. The result is a knowledgebase that integrates all the information pertaining to gene regulation, and which is easily expandable and transferable.

  10. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  11. MethBank: a database integrating next-generation sequencing single-base-resolution DNA methylation programming data.

    PubMed

    Zou, Dong; Sun, Shixiang; Li, Rujiao; Liu, Jiang; Zhang, Jing; Zhang, Zhang

    2015-01-01

    DNA methylation plays crucial roles during embryonic development. Here we present MethBank (http://dnamethylome.org), a DNA methylome programming database that integrates the genome-wide single-base nucleotide methylomes of gametes and early embryos in different model organisms. Unlike extant relevant databases, MethBank incorporates the whole-genome single-base-resolution methylomes of gametes and early embryos at multiple different developmental stages in zebrafish and mouse. MethBank allows users to retrieve methylation levels, differentially methylated regions, CpG islands, gene expression profiles and genetic polymorphisms for a specific gene or genomic region. Moreover, it offers a methylome browser that is capable of visualizing high-resolution DNA methylation profiles as well as other related data in an interactive manner and thus is of great helpfulness for users to investigate methylation patterns and changes of gametes and early embryos at different developmental stages. Ongoing efforts are focused on incorporation of methylomes and related data from other organisms. Together, MethBank features integration and visualization of high-resolution DNA methylation data as well as other related data, enabling identification of potential DNA methylation signatures in different developmental stages and accordingly providing an important resource for the epigenetic and developmental studies. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. An Integrative Review of Pain Resource Nurse Programs.

    PubMed

    Crawford, Cecelia L; Boller, Jan; Jadalla, Ahlam; Cuenca, Emma

    2016-01-01

    Mismanaged pain challenges health care systems. In the early 1990s, pain resource nurse programs were developed by Ferrell and colleagues. Variations of the model have existed for more than 20 years. While results of these programs have been disseminated, conclusive evidence has not been examined via a synthesis of the literature. A structured systematic search using multiple databases was conducted for research studies published 2005-2012. The search identified 11 studies on effective use of a pain resource nurse and/or a pain resource nurse program. The results revealed wide variations existing in program design, research methodology, practice settings, and reported outcomes. Overall, the strength of the evidence on pain resource nurse programs was determined to range from low to moderate quality for making generalizable conclusions. However, 4 key elements were identified as integral to effective pain resource nurse programs and useful for the program design and development: leadership commitment and active involvement in embedding a culture of effective pain management throughout the organization; addressing staff-related and organization-related challenges and barriers to pain management; a combination of strategies to overcome these barriers; and collaborative multidisciplinary teamwork and communication. Specific recommendations are provided for program implementation. Although the evidence was inconclusive, useful information exists to create the design of effective pain resource nurse programs. Collaborative multisite studies on the long-term effects of pain resource nurse programs are recommended.

  13. IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model

    PubMed Central

    Xia, Kai; Dong, Dong; Han, Jing-Dong J

    2006-01-01

    Background Although protein-protein interaction (PPI) networks have been explored by various experimental methods, the maps so built are still limited in coverage and accuracy. To further expand the PPI network and to extract more accurate information from existing maps, studies have been carried out to integrate various types of functional relationship data. A frequently updated database of computationally analyzed potential PPIs to provide biological researchers with rapid and easy access to analyze original data as a biological network is still lacking. Results By applying a probabilistic model, we integrated 27 heterogeneous genomic, proteomic and functional annotation datasets to predict PPI networks in human. In addition to previously studied data types, we show that phenotypic distances and genetic interactions can also be integrated to predict PPIs. We further built an easy-to-use, updatable integrated PPI database, the Integrated Network Database (IntNetDB) online, to provide automatic prediction and visualization of PPI network among genes of interest. The networks can be visualized in SVG (Scalable Vector Graphics) format for zooming in or out. IntNetDB also provides a tool to extract topologically highly connected network neighborhoods from a specific network for further exploration and research. Using the MCODE (Molecular Complex Detections) algorithm, 190 such neighborhoods were detected among all the predicted interactions. The predicted PPIs can also be mapped to worm, fly and mouse interologs. Conclusion IntNetDB includes 180,010 predicted protein-protein interactions among 9,901 human proteins and represents a useful resource for the research community. Our study has increased prediction coverage by five-fold. IntNetDB also provides easy-to-use network visualization and analysis tools that allow biological researchers unfamiliar with computational biology to access and analyze data over the internet. The web interface of IntNetDB is freely accessible at . Visualization requires Mozilla version 1.8 (or higher) or Internet Explorer with installation of SVGviewer. PMID:17112386

  14. [Exploiture and application of an internet-based Computation Platform for Integrative Pharmacology of Traditional Chinese Medicine].

    PubMed

    Xu, Hai-Yu; Liu, Zhen-Ming; Fu, Yan; Zhang, Yan-Qiong; Yu, Jian-Jun; Guo, Fei-Fei; Tang, Shi-Huan; Lv, Chuan-Yu; Su, Jin; Cui, Ru-Yi; Yang, Hong-Jun

    2017-09-01

    Recently, integrative pharmacology(IP) has become a pivotal paradigm for the modernization of traditional Chinese medicines(TCM) and combinatorial drugs discovery, which is an interdisciplinary science for establishing the in vitro and in vivo correlation between absorption, distribution, metabolism, and excretion/pharmacokinetic(ADME/PK) profiles of TCM and the molecular networks of disease by the integration of the knowledge of multi-disciplinary and multi-stages. In the present study, an internet-based Computation Platform for IP of TCM(TCM-IP, www.tcmip.cn) is established to promote the development of the emerging discipline. Among them, a big data of TCM is an important resource for TCM-IP including Chinese Medicine Formula Database, Chinese Medical Herbs Database, Chemical Database of Chinese Medicine, Target Database for Disease and Symptoms, et al. Meanwhile, some data mining and bioinformatics approaches are critical technology for TCM-IP including the identification of the TCM constituents, ADME prediction, target prediction for the TCM constituents, network construction and analysis, et al. Furthermore, network beautification and individuation design are employed to meet the consumer's requirement. We firmly believe that TCM-IP is a very useful tool for the identification of active constituents of TCM and their involving potential molecular mechanism for therapeutics, which would wildly applied in quality evaluation, clinical repositioning, scientific discovery based on original thinking, prescription compatibility and new drug of TCM, et al. Copyright© by the Chinese Pharmaceutical Association.

  15. The Center for Integrated Molecular Brain Imaging (Cimbi) database.

    PubMed

    Knudsen, Gitte M; Jensen, Peter S; Erritzoe, David; Baaré, William F C; Ettrup, Anders; Fisher, Patrick M; Gillings, Nic; Hansen, Hanne D; Hansen, Lars Kai; Hasselbalch, Steen G; Henningsson, Susanne; Herth, Matthias M; Holst, Klaus K; Iversen, Pernille; Kessing, Lars V; Macoveanu, Julian; Madsen, Kathrine Skak; Mortensen, Erik L; Nielsen, Finn Årup; Paulson, Olaf B; Siebner, Hartwig R; Stenbæk, Dea S; Svarer, Claus; Jernigan, Terry L; Strother, Stephen C; Frokjaer, Vibe G

    2016-01-01

    We here describe a multimodality neuroimaging containing data from healthy volunteers and patients, acquired within the Lundbeck Foundation Center for Integrated Molecular Brain Imaging (Cimbi) in Copenhagen, Denmark. The data is of particular relevance for neurobiological research questions related to the serotonergic transmitter system with its normative data on the serotonergic subtype receptors 5-HT1A, 5-HT1B, 5-HT2A, and 5-HT4 and the 5-HT transporter (5-HTT), but can easily serve other purposes. The Cimbi database and Cimbi biobank were formally established in 2008 with the purpose to store the wealth of Cimbi-acquired data in a highly structured and standardized manner in accordance with the regulations issued by the Danish Data Protection Agency as well as to provide a quality-controlled resource for future hypothesis-generating and hypothesis-driven studies. The Cimbi database currently comprises a total of 1100 PET and 1000 structural and functional MRI scans and it holds a multitude of additional data, such as genetic and biochemical data, and scores from 17 self-reported questionnaires and from 11 neuropsychological paper/computer tests. The database associated Cimbi biobank currently contains blood and in some instances saliva samples from about 500 healthy volunteers and 300 patients with e.g., major depression, dementia, substance abuse, obesity, and impulsive aggression. Data continue to be added to the Cimbi database and biobank. Copyright © 2015. Published by Elsevier Inc.

  16. NREL: Renewable Resource Data Center - Biomass Resource Publications

    Science.gov Websites

    Marginal Lands in APEC Economies NREL Publications Database For a comprehensive list of other NREL biomass resource publications, explore NREL's Publications Database. When searching the database, search on "

  17. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement

    PubMed Central

    Govindaraj, Mahalingam

    2015-01-01

    The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away. PMID:25874133

  18. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize

    PubMed Central

    2010-01-01

    Background Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. Results In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. Conclusions CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu. PMID:20946609

  19. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize.

    PubMed

    Kelley, Rowena Y; Gresham, Cathy; Harper, Jonathan; Bridges, Susan M; Warburton, Marilyn L; Hawkins, Leigh K; Pechanova, Olga; Peethambaran, Bela; Pechan, Tibor; Luthe, Dawn S; Mylroie, J E; Ankala, Arunkanth; Ozkan, Seval; Henry, W B; Williams, W P

    2010-10-07

    Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu.

  20. The Cancer Epidemiology Descriptive Cohort Database: A Tool to Support Population-Based Interdisciplinary Research

    PubMed Central

    Kennedy, Amy E.; Khoury, Muin J.; Ioannidis, John P.A.; Brotzman, Michelle; Miller, Amy; Lane, Crystal; Lai, Gabriel Y.; Rogers, Scott D.; Harvey, Chinonye; Elena, Joanne W.; Seminara, Daniela

    2017-01-01

    Background We report on the establishment of a web-based Cancer Epidemiology Descriptive Cohort Database (CEDCD). The CEDCD’s goals are to enhance awareness of resources, facilitate interdisciplinary research collaborations, and support existing cohorts for the study of cancer-related outcomes. Methods Comprehensive descriptive data were collected from large cohorts established to study cancer as primary outcome using a newly developed questionnaire. These included an inventory of baseline and follow-up data, biospecimens, genomics, policies, and protocols. Additional descriptive data extracted from publicly available sources were also collected. This information was entered in a searchable and publicly accessible database. We summarized the descriptive data across cohorts and reported the characteristics of this resource. Results As of December 2015, the CEDCD includes data from 46 cohorts representing more than 6.5 million individuals (29% ethnic/racial minorities). Overall, 78% of the cohorts have collected blood at least once, 57% at multiple time points, and 46% collected tissue samples. Genotyping has been performed by 67% of the cohorts, while 46% have performed whole-genome or exome sequencing in subsets of enrolled individuals. Information on medical conditions other than cancer has been collected in more than 50% of the cohorts. More than 600,000 incident cancer cases and more than 40,000 prevalent cases are reported, with 24 cancer sites represented. Conclusions The CEDCD assembles detailed descriptive information on a large number of cancer cohorts in a searchable database. Impact Information from the CEDCD may assist the interdisciplinary research community by facilitating identification of well-established population resources and large-scale collaborative and integrative research. PMID:27439404

  1. Preliminary United States-Mexico border watershed analysis, twin cities area of Nogales, Arizona and Nogales, Sonora

    USGS Publications Warehouse

    Brady, Laura Margaret; Gray, Floyd; Castaneda, Mario; Bultman, Mark; Bolm, Karen Sue

    2002-01-01

    The United States - Mexico border area faces the challenge of integrating aspects of its binational physical boundaries to form a unified or, at least, compatible natural resource management plan. Specified geospatial components such as stream drainages, mineral occurrences, vegetation, wildlife, and land-use can be analyzed in terms of their overlapping impacts upon one another. Watersheds have been utilized as a basic unit in resource analysis because they contain components that are interrelated and can be viewed as a single interactive ecological system. In developing and analyzing critical regional natural resource databases, the Environmental Protection Agency (EPA) and other federal and non-governmental agencies have adopted a ?watershed by watershed? approach to dealing with such complicated issues as ecosystem health, natural resource use, urban growth, and pollutant transport within hydrologic systems. These watersheds can facilitate the delineation of both large scale and locally important hydrologic systems and urban management parameters necessary for sustainable, diversified land-use. The twin border cities area of Nogales, Sonora and Nogales, Arizona, provide the ideal setting to demonstrate the utility and application of a complete, cross-border, geographic information systems (GIS) based, watershed analysis in the characterization of a wide range of natural resource as well as urban features and their interactions. In addition to the delineation of a unified, cross-border watershed, the database contains sewer/water line locations and status, well locations, geology, hydrology, topography, soils, geomorphology, and vegetation data, as well as remotely sensed imagery. This report is preliminary and part of an ongoing project to develop a GIS database that will be widely accessible to the general public, researchers, and the local land management community with a broad range of application and utility.

  2. Mediator infrastructure for information integration and semantic data integration environment for biomedical research.

    PubMed

    Grethe, Jeffrey S; Ross, Edward; Little, David; Sanders, Brian; Gupta, Amarnath; Astakhov, Vadim

    2009-01-01

    This paper presents current progress in the development of semantic data integration environment which is a part of the Biomedical Informatics Research Network (BIRN; http://www.nbirn.net) project. BIRN is sponsored by the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). A goal is the development of a cyberinfrastructure for biomedical research that supports advance data acquisition, data storage, data management, data integration, data mining, data visualization, and other computing and information processing services over the Internet. Each participating institution maintains storage of their experimental or computationally derived data. Mediator-based data integration system performs semantic integration over the databases to enable researchers to perform analyses based on larger and broader datasets than would be available from any single institution's data. This paper describes recent revision of the system architecture, implementation, and capabilities of the semantically based data integration environment for BIRN.

  3. SGDB: a database of synthetic genes re-designed for optimizing protein over-expression.

    PubMed

    Wu, Gang; Zheng, Yuanpu; Qureshi, Imran; Zin, Htar Thant; Beck, Tyler; Bulka, Blazej; Freeland, Stephen J

    2007-01-01

    Here we present the Synthetic Gene Database (SGDB): a relational database that houses sequences and associated experimental information on synthetic (artificially engineered) genes from all peer-reviewed studies published to date. At present, the database comprises information from more than 200 published experiments. This resource not only provides reference material to guide experimentalists in designing new genes that improve protein expression, but also offers a dataset for analysis by bioinformaticians who seek to test ideas regarding the underlying factors that influence gene expression. The SGDB was built under MySQL database management system. We also offer an XML schema for standardized data description of synthetic genes. Users can access the database at http://www.evolvingcode.net/codon/sgdb/index.php, or batch downloads all information through XML files. Moreover, users may visually compare the coding sequences of a synthetic gene and its natural counterpart with an integrated web tool at http://www.evolvingcode.net/codon/sgdb/aligner.php, and discuss questions, findings and related information on an associated e-forum at http://www.evolvingcode.net/forum/viewforum.php?f=27.

  4. Genotator: a disease-agnostic tool for genetic annotation of disease.

    PubMed

    Wall, Dennis P; Pivovarov, Rimma; Tong, Mark; Jung, Jae-Yoon; Fusaro, Vincent A; DeLuca, Todd F; Tonellato, Peter J

    2010-10-29

    Disease-specific genetic information has been increasing at rapid rates as a consequence of recent improvements and massive cost reductions in sequencing technologies. Numerous systems designed to capture and organize this mounting sea of genetic data have emerged, but these resources differ dramatically in their disease coverage and genetic depth. With few exceptions, researchers must manually search a variety of sites to assemble a complete set of genetic evidence for a particular disease of interest, a process that is both time-consuming and error-prone. We designed a real-time aggregation tool that provides both comprehensive coverage and reliable gene-to-disease rankings for any disease. Our tool, called Genotator, automatically integrates data from 11 externally accessible clinical genetics resources and uses these data in a straightforward formula to rank genes in order of disease relevance. We tested the accuracy of coverage of Genotator in three separate diseases for which there exist specialty curated databases, Autism Spectrum Disorder, Parkinson's Disease, and Alzheimer Disease. Genotator is freely available at http://genotator.hms.harvard.edu. Genotator demonstrated that most of the 11 selected databases contain unique information about the genetic composition of disease, with 2514 genes found in only one of the 11 databases. These findings confirm that the integration of these databases provides a more complete picture than would be possible from any one database alone. Genotator successfully identified at least 75% of the top ranked genes for all three of our use cases, including a 90% concordance with the top 40 ranked candidates for Alzheimer Disease. As a meta-query engine, Genotator provides high coverage of both historical genetic research as well as recent advances in the genetic understanding of specific diseases. As such, Genotator provides a real-time aggregation of ranked data that remains current with the pace of research in the disease fields. Genotator's algorithm appropriately transforms query terms to match the input requirements of each targeted databases and accurately resolves named synonyms to ensure full coverage of the genetic results with official nomenclature. Genotator generates an excel-style output that is consistent across disease queries and readily importable to other applications.

  5. XML-based information system for planetary sciences

    NASA Astrophysics Data System (ADS)

    Carraro, F.; Fonte, S.; Turrini, D.

    2009-04-01

    EuroPlaNet (EPN in the following) has been developed by the planetological community under the "Sixth Framework Programme" (FP6 in the following), the European programme devoted to the improvement of the European research efforts through the creation of an internal market for science and technology. The goal of the EPN programme is the creation of a European network aimed to the diffusion of data produced by space missions dedicated to the study of the Solar System. A special place within the EPN programme is that of I.D.I.S. (Integrated and Distributed Information Service). The main goal of IDIS is to offer to the planetary science community a user-friendly access to the data and information produced by the various types of research activities, i.e. Earth-based observations, space observations, modeling, theory and laboratory experiments. During the FP6 programme IDIS development consisted in the creation of a series of thematic nodes, each of them specialized in a specific scientific domain, and a technical coordination node. The four thematic nodes are the Atmosphere node, the Plasma node, the Interiors & Surfaces node and the Small Bodies & Dust node. The main task of the nodes have been the building up of selected scientific cases related with the scientific domain of each node. The second work done by EPN nodes have been the creation of a catalogue of resources related to their main scientific theme. Both these efforts have been used as the basis for the development of the main IDIS goal, i.e. the integrated distributed service. An XML-based data model have been developed to describe resources using meta-data and to store the meta-data within an XML-based database called eXist. A search engine has been then developed in order to allow users to search resources within the database. Users can select the resource type and can insert one or more values or can choose a value among those present in a list, depending on selected resource. The system searches for all the resources containing the inserted values within the resources descriptions. An important facility of the IDIS search system is the multi-node search capability. This is due to the capacity of eXist to make queries on remote databases. This allows the system to show all resources which satisfy the search criteria on local node and to show how many resources are found on remote nodes, giving also a link to open the results page on remote nodes. During FP7 the development of the IDIS system will have the main goal to make the service Virtual Observatory compliant.

  6. MalaCards: an integrated compendium for diseases and their annotation

    PubMed Central

    Rappaport, Noa; Nativ, Noam; Stelzer, Gil; Twik, Michal; Guan-Golan, Yaron; Iny Stein, Tsippi; Bahir, Iris; Belinky, Frida; Morrey, C. Paul; Safran, Marilyn; Lancet, Doron

    2013-01-01

    Comprehensive disease classification, integration and annotation are crucial for biomedical discovery. At present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms. We introduce MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and strategy of the GeneCards database of human genes. MalaCards mines and merges 44 data sources to generate a computerized card for each of 16 919 human diseases. Each MalaCard contains disease-specific prioritized annotations, as well as inter-disease connections, empowered by the GeneCards relational database, its searches and GeneDecks set analyses. First, we generate a disease list from 15 ranked sources, using disease-name unification heuristics. Next, we use four schemes to populate MalaCards sections: (i) directly interrogating disease resources, to establish integrated disease names, synonyms, summaries, drugs/therapeutics, clinical features, genetic tests and anatomical context; (ii) searching GeneCards for related publications, and for associated genes with corresponding relevance scores; (iii) analyzing disease-associated gene sets in GeneDecks to yield affiliated pathways, phenotypes, compounds and GO terms, sorted by a composite relevance score and presented with GeneCards links; and (iv) searching within MalaCards itself, e.g. for additional related diseases and anatomical context. The latter forms the basis for the construction of a disease network, based on shared MalaCards annotations, embodying associations based on etiology, clinical features and clinical conditions. This broadly disposed network has a power-law degree distribution, suggesting that this might be an inherent property of such networks. Work in progress includes hierarchical malady classification, ontological mapping and disease set analyses, striving to make MalaCards an even more effective tool for biomedical research. Database URL: http://www.malacards.org/ PMID:23584832

  7. U.S. Army Research Laboratory (ARL) multimodal signatures database

    NASA Astrophysics Data System (ADS)

    Bennett, Kelly

    2008-04-01

    The U.S. Army Research Laboratory (ARL) Multimodal Signatures Database (MMSDB) is a centralized collection of sensor data of various modalities that are co-located and co-registered. The signatures include ground and air vehicles, personnel, mortar, artillery, small arms gunfire from potential sniper weapons, explosives, and many other high value targets. This data is made available to Department of Defense (DoD) and DoD contractors, Intel agencies, other government agencies (OGA), and academia for use in developing target detection, tracking, and classification algorithms and systems to protect our Soldiers. A platform independent Web interface disseminates the signatures to researchers and engineers within the scientific community. Hierarchical Data Format 5 (HDF5) signature models provide an excellent solution for the sharing of complex multimodal signature data for algorithmic development and database requirements. Many open source tools for viewing and plotting HDF5 signatures are available over the Web. Seamless integration of HDF5 signatures is possible in both proprietary computational environments, such as MATLAB, and Free and Open Source Software (FOSS) computational environments, such as Octave and Python, for performing signal processing, analysis, and algorithm development. Future developments include extending the Web interface into a portal system for accessing ARL algorithms and signatures, High Performance Computing (HPC) resources, and integrating existing database and signature architectures into sensor networking environments.

  8. CPLA 1.0: an integrated database of protein lysine acetylation.

    PubMed

    Liu, Zexian; Cao, Jun; Gao, Xinjiao; Zhou, Yanhong; Wen, Longping; Yang, Xiangjiao; Yao, Xuebiao; Ren, Jian; Xue, Yu

    2011-01-01

    As a reversible post-translational modification (PTM) discovered decades ago, protein lysine acetylation was known for its regulation of transcription through the modification of histones. Recent studies discovered that lysine acetylation targets broad substrates and especially plays an essential role in cellular metabolic regulation. Although acetylation is comparable with other major PTMs such as phosphorylation, an integrated resource still remains to be developed. In this work, we presented the compendium of protein lysine acetylation (CPLA) database for lysine acetylated substrates with their sites. From the scientific literature, we manually collected 7151 experimentally identified acetylation sites in 3311 targets. We statistically studied the regulatory roles of lysine acetylation by analyzing the Gene Ontology (GO) and InterPro annotations. Combined with protein-protein interaction information, we systematically discovered a potential human lysine acetylation network (HLAN) among histone acetyltransferases (HATs), substrates and histone deacetylases (HDACs). In particular, there are 1862 triplet relationships of HAT-substrate-HDAC retrieved from the HLAN, at least 13 of which were previously experimentally verified. The online services of CPLA database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). The CPLA database is freely available for all users at: http://cpla.biocuckoo.org.

  9. CPLA 1.0: an integrated database of protein lysine acetylation

    PubMed Central

    Liu, Zexian; Cao, Jun; Gao, Xinjiao; Zhou, Yanhong; Wen, Longping; Yang, Xiangjiao; Yao, Xuebiao; Ren, Jian; Xue, Yu

    2011-01-01

    As a reversible post-translational modification (PTM) discovered decades ago, protein lysine acetylation was known for its regulation of transcription through the modification of histones. Recent studies discovered that lysine acetylation targets broad substrates and especially plays an essential role in cellular metabolic regulation. Although acetylation is comparable with other major PTMs such as phosphorylation, an integrated resource still remains to be developed. In this work, we presented the compendium of protein lysine acetylation (CPLA) database for lysine acetylated substrates with their sites. From the scientific literature, we manually collected 7151 experimentally identified acetylation sites in 3311 targets. We statistically studied the regulatory roles of lysine acetylation by analyzing the Gene Ontology (GO) and InterPro annotations. Combined with protein–protein interaction information, we systematically discovered a potential human lysine acetylation network (HLAN) among histone acetyltransferases (HATs), substrates and histone deacetylases (HDACs). In particular, there are 1862 triplet relationships of HAT-substrate-HDAC retrieved from the HLAN, at least 13 of which were previously experimentally verified. The online services of CPLA database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). The CPLA database is freely available for all users at: http://cpla.biocuckoo.org. PMID:21059677

  10. search GenBank: interactive orchestration and ad-hoc choreography of Web services in the exploration of the biomedical resources of the National Center For Biotechnology Information

    PubMed Central

    2013-01-01

    Background Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. Results We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user’s query, advanced data searching based on the specified user’s query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. Conclusions search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/. PMID:23452691

  11. search GenBank: interactive orchestration and ad-hoc choreography of Web services in the exploration of the biomedical resources of the National Center For Biotechnology Information.

    PubMed

    Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Siążnik, Artur

    2013-03-01

    Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user's query, advanced data searching based on the specified user's query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/.

  12. SNPchiMp v.3: integrating and standardizing single nucleotide polymorphism data for livestock species.

    PubMed

    Nicolazzi, Ezequiel L; Caprera, Andrea; Nazzicari, Nelson; Cozzi, Paolo; Strozzi, Francesco; Lawley, Cindy; Pirani, Ali; Soans, Chandrasen; Brew, Fiona; Jorjani, Hossein; Evans, Gary; Simpson, Barry; Tosser-Klopp, Gwenola; Brauning, Rudiger; Williams, John L; Stella, Alessandra

    2015-04-10

    In recent years, the use of genomic information in livestock species for genetic improvement, association studies and many other fields has become routine. In order to accommodate different market requirements in terms of genotyping cost, manufacturers of single nucleotide polymorphism (SNP) arrays, private companies and international consortia have developed a large number of arrays with different content and different SNP density. The number of currently available SNP arrays differs among species: ranging from one for goats to more than ten for cattle, and the number of arrays available is increasing rapidly. However, there is limited or no effort to standardize and integrate array- specific (e.g. SNP IDs, allele coding) and species-specific (i.e. past and current assemblies) SNP information. Here we present SNPchiMp v.3, a solution to these issues for the six major livestock species (cow, pig, horse, sheep, goat and chicken). Original data was collected directly from SNP array producers and specific international genome consortia, and stored in a MySQL database. The database was then linked to an open-access web tool and to public databases. SNPchiMp v.3 ensures fast access to the database (retrieving within/across SNP array data) and the possibility of annotating SNP array data in a user-friendly fashion. This platform allows easy integration and standardization, and it is aimed at both industry and research. It also enables users to easily link the information available from the array producer with data in public databases, without the need of additional bioinformatics tools or pipelines. In recognition of the open-access use of Ensembl resources, SNPchiMp v.3 was officially credited as an Ensembl E!mpowered tool. Availability at http://bioinformatics.tecnoparco.org/SNPchimp.

  13. Semantic Web repositories for genomics data using the eXframe platform.

    PubMed

    Merrill, Emily; Corlosquet, Stéphane; Ciccarese, Paolo; Clark, Tim; Das, Sudeshna

    2014-01-01

    With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge.

  14. The Eimeria Transcript DB: an integrated resource for annotated transcripts of protozoan parasites of the genus Eimeria

    PubMed Central

    Rangel, Luiz Thibério; Novaes, Jeniffer; Durham, Alan M.; Madeira, Alda Maria B. N.; Gruber, Arthur

    2013-01-01

    Parasites of the genus Eimeria infect a wide range of vertebrate hosts, including chickens. We have recently reported a comparative analysis of the transcriptomes of Eimeria acervulina, Eimeria maxima and Eimeria tenella, integrating ORESTES data produced by our group and publicly available Expressed Sequence Tags (ESTs). All cDNA reads have been assembled, and the reconstructed transcripts have been submitted to a comprehensive functional annotation pipeline. Additional studies included orthology assignment across apicomplexan parasites and clustering analyses of gene expression profiles among different developmental stages of the parasites. To make all this body of information publicly available, we constructed the Eimeria Transcript Database (EimeriaTDB), a web repository that provides access to sequence data, annotation and comparative analyses. Here, we describe the web interface, available sequence data sets and query tools implemented on the site. The main goal of this work is to offer a public repository of sequence and functional annotation data of reconstructed transcripts of parasites of the genus Eimeria. We believe that EimeriaTDB will represent a valuable and complementary resource for the Eimeria scientific community and for those researchers interested in comparative genomics of apicomplexan parasites. Database URL: http://www.coccidia.icb.usp.br/eimeriatdb/ PMID:23411718

  15. microRNAs Databases: Developmental Methodologies, Structural and Functional Annotations.

    PubMed

    Singh, Nagendra Kumar

    2017-09-01

    microRNA (miRNA) is an endogenous and evolutionary conserved non-coding RNA, involved in post-transcriptional process as gene repressor and mRNA cleavage through RNA-induced silencing complex (RISC) formation. In RISC, miRNA binds in complementary base pair with targeted mRNA along with Argonaut proteins complex, causes gene repression or endonucleolytic cleavage of mRNAs and results in many diseases and syndromes. After the discovery of miRNA lin-4 and let-7, subsequently large numbers of miRNAs were discovered by low-throughput and high-throughput experimental techniques along with computational process in various biological and metabolic processes. The miRNAs are important non-coding RNA for understanding the complex biological phenomena of organism because it controls the gene regulation. This paper reviews miRNA databases with structural and functional annotations developed by various researchers. These databases contain structural and functional information of animal, plant and virus miRNAs including miRNAs-associated diseases, stress resistance in plant, miRNAs take part in various biological processes, effect of miRNAs interaction on drugs and environment, effect of variance on miRNAs, miRNAs gene expression analysis, sequence of miRNAs, structure of miRNAs. This review focuses on the developmental methodology of miRNA databases such as computational tools and methods used for extraction of miRNAs annotation from different resources or through experiment. This study also discusses the efficiency of user interface design of every database along with current entry and annotations of miRNA (pathways, gene ontology, disease ontology, etc.). Here, an integrated schematic diagram of construction process for databases is also drawn along with tabular and graphical comparison of various types of entries in different databases. Aim of this paper is to present the importance of miRNAs-related resources at a single place.

  16. BioCarian: search engine for exploratory searches in heterogeneous biological databases.

    PubMed

    Zaki, Nazar; Tennakoon, Chandana

    2017-10-02

    There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.

  17. ALDB: a domestic-animal long noncoding RNA database.

    PubMed

    Li, Aimin; Zhang, Junying; Zhou, Zhongyin; Wang, Lei; Liu, Yujuan; Liu, Yajun

    2015-01-01

    Long noncoding RNAs (lncRNAs) have attracted significant attention in recent years due to their important roles in many biological processes. Domestic animals constitute a unique resource for understanding the genetic basis of phenotypic variation and are ideal models relevant to diverse areas of biomedical research. With improving sequencing technologies, numerous domestic-animal lncRNAs are now available. Thus, there is an immediate need for a database resource that can assist researchers to store, organize, analyze and visualize domestic-animal lncRNAs. The domestic-animal lncRNA database, named ALDB, is the first comprehensive database with a focus on the domestic-animal lncRNAs. It currently archives 12,103 pig intergenic lncRNAs (lincRNAs), 8,923 chicken lincRNAs and 8,250 cow lincRNAs. In addition to the annotations of lincRNAs, it offers related data that is not available yet in existing lncRNA databases (lncRNAdb and NONCODE), such as genome-wide expression profiles and animal quantitative trait loci (QTLs) of domestic animals. Moreover, a collection of interfaces and applications, such as the Basic Local Alignment Search Tool (BLAST), the Generic Genome Browser (GBrowse) and flexible search functionalities, are available to help users effectively explore, analyze and download data related to domestic-animal lncRNAs. ALDB enables the exploration and comparative analysis of lncRNAs in domestic animals. A user-friendly web interface, integrated information and tools make it valuable to researchers in their studies. ALDB is freely available from http://res.xaut.edu.cn/aldb/index.jsp.

  18. Minimal Information for Neural Electromagnetic Ontologies (MINEMO): A standards-compliant method for analysis and integration of event-related potentials (ERP) data

    PubMed Central

    Frishkoff, Gwen; Sydes, Jason; Mueller, Kurt; Frank, Robert; Curran, Tim; Connolly, John; Kilborn, Kerry; Molfese, Dennis; Perfetti, Charles; Malony, Allen

    2011-01-01

    We present MINEMO (Minimal Information for Neural ElectroMagnetic Ontologies), a checklist for the description of event-related potentials (ERP) studies. MINEMO extends MINI (Minimal Information for Neuroscience Investigations)to the ERP domain. Checklist terms are explicated in NEMO, a formal ontology that is designed to support ERP data sharing and integration. MINEMO is also linked to an ERP database and web application (the NEMO portal). Users upload their data and enter MINEMO information through the portal. The database then stores these entries in RDF (Resource Description Framework), along with summary metrics, i.e., spatial and temporal metadata. Together these spatial, temporal, and functional metadata provide a complete description of ERP data and the context in which these data were acquired. The RDF files then serve as inputs to ontology-based labeling and meta-analysis. Our ultimate goal is to represent ERPs using a rich semantic structure, so results can be queried at multiple levels, to stimulate novel hypotheses and to promote a high-level, integrative account of ERP results across diverse study methods and paradigms. PMID:22180824

  19. Gramene 2016: comparative plant genomics and pathway resources

    PubMed Central

    Tello-Ruiz, Marcela K.; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M.; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A.; Huerta, Laura; Keays, Maria; Tang, Y. Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J.; Jaiswal, Pankaj; Ware, Doreen

    2016-01-01

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. PMID:26553803

  20. New tools and methods for direct programmatic access to the dbSNP relational database.

    PubMed

    Saccone, Scott F; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A; Rice, John P

    2011-01-01

    Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale.

  1. XML-based approaches for the integration of heterogeneous bio-molecular data.

    PubMed

    Mesiti, Marco; Jiménez-Ruiz, Ernesto; Sanz, Ismael; Berlanga-Llavori, Rafael; Perlasca, Paolo; Valentini, Giorgio; Manset, David

    2009-10-15

    The today's public database infrastructure spans a very large collection of heterogeneous biological data, opening new opportunities for molecular biology, bio-medical and bioinformatics research, but raising also new problems for their integration and computational processing. In this paper we survey the most interesting and novel approaches for the representation, integration and management of different kinds of biological data by exploiting XML and the related recommendations and approaches. Moreover, we present new and interesting cutting edge approaches for the appropriate management of heterogeneous biological data represented through XML. XML has succeeded in the integration of heterogeneous biomolecular information, and has established itself as the syntactic glue for biological data sources. Nevertheless, a large variety of XML-based data formats have been proposed, thus resulting in a difficult effective integration of bioinformatics data schemes. The adoption of a few semantic-rich standard formats is urgent to achieve a seamless integration of the current biological resources.

  2. Productive extension of semantic memory in school-aged children: Relations with reading comprehension and deployment of cognitive resources.

    PubMed

    Bauer, Patricia J; Blue, Shala N; Xu, Aoxiang; Esposito, Alena G

    2016-07-01

    We investigated 7- to 10-year-old children's productive extension of semantic memory through self-generation of new factual knowledge derived through integration of separate yet related facts learned through instruction or through reading. In Experiment 1, an experimenter read the to-be-integrated facts. Children successfully learned and integrated the information and used it to further extend their semantic knowledge, as evidenced by high levels of correct responses in open-ended and forced-choice testing. In Experiment 2, on half of the trials, the to-be-integrated facts were read by an experimenter (as in Experiment 1) and on half of the trials, children read the facts themselves. Self-generation performance was high in both conditions (experimenter- and self-read); in both conditions, self-generation of new semantic knowledge was related to an independent measure of children's reading comprehension. In Experiment 3, the way children deployed cognitive resources during reading was predictive of their subsequent recall of newly learned information derived through integration. These findings indicate self-generation of new semantic knowledge through integration in school-age children as well as relations between this productive means of extension of semantic memory and cognitive processes engaged during reading. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  3. Integrative medicine in head and neck cancer

    PubMed Central

    Matovina, Chloe; Birkeland, Andrew C.; Zick, Suzanna; Shuman, Andrew G.

    2017-01-01

    Objective Complementary and alternative medicine (CAM), or integrative medicine, has become increasingly popular among patients with head and neck cancer. Despite its increasing prevalence, many patients feel uncomfortable discussing such therapies with their physicians, and many physicians are unaware and underequipped to evaluate or discuss their use with patients. The aim of this manuscript is to use recent data to outline the decision-making inherent to integrative medicine utilization among patients with head and neck cancer, to discuss the ethical implications inherent to balancing integrative and conventional approaches to treatment, and to highlight available resources to enhance head and neck cancer providers’ understanding of integrative medicine. Data Sources Randomized controlled trials involving integrative medicine or CAM treatment for cancer patients. Review Methods Trials were drawn from a systematic PubMed database search categorized into cancer prevention, treatment, and symptom management. Conclusions Integrative medicine is gaining popularity for the management of cancer and is most commonly used for symptom management. A number of randomized controlled trials provide data to support integrative therapies, yet physicians who treat head and neck cancer may be faced with ethical dilemmas and practical barriers surrounding incorporation of integrative medicine. Implications for Practice In the management of head and neck cancer, there is an increasing demand for awareness of, dialogue about, and research evaluating integrative medicine therapies. It is important for otolaryngologists to become aware of integrative therapy options, their risks and benefits, and resources for further information in order to effectively counsel their patients. PMID:27729559

  4. Integrative Medicine in Head and Neck Cancer.

    PubMed

    Matovina, Chloe; Birkeland, Andrew C; Zick, Suzanna; Shuman, Andrew G

    2017-02-01

    Objective Complementary and alternative medicine, or integrative medicine, has become increasingly popular among patients with head and neck cancer. Despite its increasing prevalence, many patients feel uncomfortable discussing such therapies with their physicians, and many physicians are unaware and underequipped to evaluate or discuss their use with patients. The aim of this article is to use recent data to outline the decision making inherent to integrative medicine utilization among patients with head and neck cancer, to discuss the ethical implications inherent to balancing integrative and conventional approaches to treatment, and to highlight available resources to enhance head and neck cancer providers' understanding of integrative medicine. Data Sources Randomized controlled trials involving integrative medicine or complementary and alternative medicine treatment for cancer patients. Review Methods Trials were drawn from a systematic PubMed database search categorized into cancer prevention, treatment, and symptom management. Conclusions Integrative medicine is gaining popularity for the management of cancer and is most commonly used for symptom management. A number of randomized controlled trials provide data to support integrative therapies, yet physicians who treat head and neck cancer may be faced with ethical dilemmas and practical barriers surrounding incorporation of integrative medicine. Implications for Practice In the management of head and neck cancer, there is an increasing demand for awareness of, dialogue about, and research evaluating integrative medicine therapies. It is important for otolaryngologists to become aware of integrative therapy options, their risks and benefits, and resources for further information to effectively counsel their patients.

  5. The Computing and Data Grid Approach: Infrastructure for Distributed Science Applications

    NASA Technical Reports Server (NTRS)

    Johnston, William E.

    2002-01-01

    With the advent of Grids - infrastructure for using and managing widely distributed computing and data resources in the science environment - there is now an opportunity to provide a standard, large-scale, computing, data, instrument, and collaboration environment for science that spans many different projects and provides the required infrastructure and services in a relatively uniform and supportable way. Grid technology has evolved over the past several years to provide the services and infrastructure needed for building 'virtual' systems and organizations. We argue that Grid technology provides an excellent basis for the creation of the integrated environments that can combine the resources needed to support the large- scale science projects located at multiple laboratories and universities. We present some science case studies that indicate that a paradigm shift in the process of science will come about as a result of Grids providing transparent and secure access to advanced and integrated information and technologies infrastructure: powerful computing systems, large-scale data archives, scientific instruments, and collaboration tools. These changes will be in the form of services that can be integrated with the user's work environment, and that enable uniform and highly capable access to these computers, data, and instruments, regardless of the location or exact nature of these resources. These services will integrate transient-use resources like computing systems, scientific instruments, and data caches (e.g., as they are needed to perform a simulation or analyze data from a single experiment); persistent-use resources. such as databases, data catalogues, and archives, and; collaborators, whose involvement will continue for the lifetime of a project or longer. While we largely address large-scale science in this paper, Grids, particularly when combined with Web Services, will address a broad spectrum of science scenarios. both large and small scale.

  6. Medical Optimization Network for Space Telemedicine Resources

    NASA Technical Reports Server (NTRS)

    Rubin, D.; Shah, R. V.; Kerstman, E. L.; Reyes, D.; Mulcahy, R.; Antonsen, E.

    2017-01-01

    INTRODUCTION: Long-duration missions beyond low Earth orbit introduce new constraints to the space medical system. Beyond the traditional limitations in mass, power, and volume, consideration must be given to other factors such as the inability to evacuate to Earth, communication delays, and limitations in clinical skillsets. As NASA develops the medical system for an exploration mission, it must have an ability to evaluate the trade space of what resources will be most important. The Medical Optimization Network for Space Telemedicine Resources (MONSTR) was developed over the past year for this reason, and is now a system for managing data pertaining to medical resources and their relative importance when addressing medical conditions. METHODS: The MONSTR web application with a Microsoft SQL database backend was developed and made accessible to Tableau v9.3 for analysis and visualization. The database was initially populated with a list of medical conditions of concern for an exploration mission taken from the Integrated Medical Model (IMM), a probabilistic model designed to quantify in-flight medical risk. A team of physicians working within the Exploration Medical Capability Element of NASA's Human Research Program compiled a list diagnostic and treatment medical resources required to address best- and worst-case scenarios of each medical condition using a terrestrial standard of care and entered this data into the system. This list included both tangible resources (e.g. medical equipment, medications) and intangible resources (e.g. clinical skills required to perform a procedure). The physician team then assigned criticality values to each instance of a resource, representing the importance of that resource to diagnosing or treating its associated condition(s). Medical condition probabilities of occurrence during a Mars mission were pulled from the IMM and imported into the MONSTR database for use within a resource criticality-weighting algorithm. DISCUSSION: The MONSTR tool is a novel approach to assess the relative value of individual resources needed for the diagnosis and treatment of medical conditions. Future work will add resources for prevention and long term care of these conditions. Once data collection is complete, MONSTR will provide the operational and research communities at NASA with information to support informed decisions regarding areas of research investment, future crew training, and medical supplies manifested as part of any exploration medical system.

  7. Abstract - Cooperative Research and Development Agreement between Environmental Defense Fund and National Energy Technology Laboratory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rose, Kelly K.; Zavala-Zraiza, Daniel

    Here, we summarize an effort to develop a global oil and gas infrastructure (GOGI) taxonomy and geodatabase, using a combination of big data computing, custom search and data integration algorithms, and expert driven spatio-temporal analytics to identify, access, and evaluate open oil and gas data resources and uncertainty trends worldwide. This approach leveraged custom National Energy Technology Laboratory (NETL) tools and capabilities in collaboration with Environmental Defense Fund (EDF) and Carbon Limits subject matter expertise, to identify over 380 datasets and integrate more than 4.8 million features into the GOGI database. In addition to acquisition of open oil and gasmore » infrastructure data, information was collected and analyzed to assess the spatial, temporal, and source quality of these resources, and estimate their completeness relative to the top 40 hydrocarbon producing and consuming countries.« less

  8. [Playful strategies for data collection with child cancer patients: an integrative review].

    PubMed

    Sposito, Amanda Mota Pacciulio; de Sparapani, Valéria Cássia; Pfeifer, Luzia Iara; de Lima, Regina Aparecida Garcia; Nascimento, Lucila Castanheira

    2013-09-01

    Children are the best sources of information on their experiences and opinions, and qualitative studies have favored the development and application of techniques that facilitate their self-expression and approaching the researcher. Through an integrative literature review, the objective of this research was to identify playful resources used in qualitative research data collection with child cancer patients, and their forms of application. Systemized searches of electronic databases and a virtual library were undertaken, which, combined with a non-systemized sample, totaled 15 studies spanning the period from 2000 and 2010. Drawing, toys, puppets, photography and creativity and sensitivity dynamics were identified which, in association with interviews or not, were shown to directly or indirectly facilitate data collection, thereby broadening the interaction with the children, and permitting further expression of their feelings. The advantages and limitations of using these resources are presented thus contributing to planning research with children.

  9. [Ethical problems experienced by nurses in primary health care: integrative literature review].

    PubMed

    Nora, Carlise Rigon Dalla; Zoboli, Elma Lourdes Campos Pavone; Vieira, Margarida

    2015-03-01

    The aim of this study is to identify ethical problems experienced by nurses in primary health care and resources for coping based on publications on the subject. An integrative literature review was performed between the months of October and November 2013, using the databases: BDTD, CINAHL, LILACS, MEDLINE, Biblioteca Cochrane, PubMed, RCAAP and SciELO. Articles, dissertations and theses published in Portuguese, English and Spanish were included, totalling 31 studies published from 1992 to 2013. This analysis resulted in four categories: ethical problems in the relationship between team members, ethical problems in the relationship with the user, ethical problems in health services management and resources for coping with ethical problems. Results showed that nurses need to be prepared to face ethical problems, emphasizing the importance of ethics education during the education process before and during professional practice to enhance the development of ethical sensitivity and competence for problem resolution.

  10. OntoCAT -- simple ontology search and integration in Java, R and REST/JavaScript

    PubMed Central

    2011-01-01

    Background Ontologies have become an essential asset in the bioinformatics toolbox and a number of ontology access resources are now available, for example, the EBI Ontology Lookup Service (OLS) and the NCBO BioPortal. However, these resources differ substantially in mode, ease of access, and ontology content. This makes it relatively difficult to access each ontology source separately, map their contents to research data, and much of this effort is being replicated across different research groups. Results OntoCAT provides a seamless programming interface to query heterogeneous ontology resources including OLS and BioPortal, as well as user-specified local OWL and OBO files. Each resource is wrapped behind easy to learn Java, Bioconductor/R and REST web service commands enabling reuse and integration of ontology software efforts despite variation in technologies. It is also available as a stand-alone MOLGENIS database and a Google App Engine application. Conclusions OntoCAT provides a robust, configurable solution for accessing ontology terms specified locally and from remote services, is available as a stand-alone tool and has been tested thoroughly in the ArrayExpress, MOLGENIS, EFO and Gen2Phen phenotype use cases. Availability http://www.ontocat.org PMID:21619703

  11. OntoCAT--simple ontology search and integration in Java, R and REST/JavaScript.

    PubMed

    Adamusiak, Tomasz; Burdett, Tony; Kurbatova, Natalja; Joeri van der Velde, K; Abeygunawardena, Niran; Antonakaki, Despoina; Kapushesky, Misha; Parkinson, Helen; Swertz, Morris A

    2011-05-29

    Ontologies have become an essential asset in the bioinformatics toolbox and a number of ontology access resources are now available, for example, the EBI Ontology Lookup Service (OLS) and the NCBO BioPortal. However, these resources differ substantially in mode, ease of access, and ontology content. This makes it relatively difficult to access each ontology source separately, map their contents to research data, and much of this effort is being replicated across different research groups. OntoCAT provides a seamless programming interface to query heterogeneous ontology resources including OLS and BioPortal, as well as user-specified local OWL and OBO files. Each resource is wrapped behind easy to learn Java, Bioconductor/R and REST web service commands enabling reuse and integration of ontology software efforts despite variation in technologies. It is also available as a stand-alone MOLGENIS database and a Google App Engine application. OntoCAT provides a robust, configurable solution for accessing ontology terms specified locally and from remote services, is available as a stand-alone tool and has been tested thoroughly in the ArrayExpress, MOLGENIS, EFO and Gen2Phen phenotype use cases. http://www.ontocat.org.

  12. MAPI: a software framework for distributed biomedical applications

    PubMed Central

    2013-01-01

    Background The amount of web-based resources (databases, tools etc.) in biomedicine has increased, but the integrated usage of those resources is complex due to differences in access protocols and data formats. However, distributed data processing is becoming inevitable in several domains, in particular in biomedicine, where researchers face rapidly increasing data sizes. This big data is difficult to process locally because of the large processing, memory and storage capacity required. Results This manuscript describes a framework, called MAPI, which provides a uniform representation of resources available over the Internet, in particular for Web Services. The framework enhances their interoperability and collaborative use by enabling a uniform and remote access. The framework functionality is organized in modules that can be combined and configured in different ways to fulfil concrete development requirements. Conclusions The framework has been tested in the biomedical application domain where it has been a base for developing several clients that are able to integrate different web resources. The MAPI binaries and documentation are freely available at http://www.bitlab-es.com/mapi under the Creative Commons Attribution-No Derivative Works 2.5 Spain License. The MAPI source code is available by request (GPL v3 license). PMID:23311574

  13. Techniques to Access Databases and Integrate Data for Hydrologic Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whelan, Gene; Tenney, Nathan D.; Pelton, Mitchell A.

    2009-06-17

    This document addresses techniques to access and integrate data for defining site-specific conditions and behaviors associated with ground-water and surface-water radionuclide transport applicable to U.S. Nuclear Regulatory Commission reviews. Environmental models typically require input data from multiple internal and external sources that may include, but are not limited to, stream and rainfall gage data, meteorological data, hydrogeological data, habitat data, and biological data. These data may be retrieved from a variety of organizations (e.g., federal, state, and regional) and source types (e.g., HTTP, FTP, and databases). Available data sources relevant to hydrologic analyses for reactor licensing are identified and reviewed.more » The data sources described can be useful to define model inputs and parameters, including site features (e.g., watershed boundaries, stream locations, reservoirs, site topography), site properties (e.g., surface conditions, subsurface hydraulic properties, water quality), and site boundary conditions, input forcings, and extreme events (e.g., stream discharge, lake levels, precipitation, recharge, flood and drought characteristics). Available software tools for accessing established databases, retrieving the data, and integrating it with models were identified and reviewed. The emphasis in this review was on existing software products with minimal required modifications to enable their use with the FRAMES modeling framework. The ability of four of these tools to access and retrieve the identified data sources was reviewed. These four software tools were the Hydrologic Data Acquisition and Processing System (HDAPS), Integrated Water Resources Modeling System (IWRMS) External Data Harvester, Data for Environmental Modeling Environmental Data Download Tool (D4EM EDDT), and the FRAMES Internet Database Tools. The IWRMS External Data Harvester and the D4EM EDDT were identified as the most promising tools based on their ability to access and retrieve the required data, and their ability to integrate the data into environmental models using the FRAMES environment.« less

  14. Central Colorado Assessment Project (CCAP)-Geochemical data for rock, sediment, soil, and concentrate sample media

    USGS Publications Warehouse

    Granitto, Matthew; DeWitt, Ed H.; Klein, Terry L.

    2010-01-01

    This database was initiated, designed, and populated to collect and integrate geochemical data from central Colorado in order to facilitate geologic mapping, petrologic studies, mineral resource assessment, definition of geochemical baseline values and statistics, environmental impact assessment, and medical geology. The Microsoft Access database serves as a geochemical data warehouse in support of the Central Colorado Assessment Project (CCAP) and contains data tables describing historical and new quantitative and qualitative geochemical analyses determined by 70 analytical laboratory and field methods for 47,478 rock, sediment, soil, and heavy-mineral concentrate samples. Most samples were collected by U.S. Geological Survey (USGS) personnel and analyzed either in the analytical laboratories of the USGS or by contract with commercial analytical laboratories. These data represent analyses of samples collected as part of various USGS programs and projects. In addition, geochemical data from 7,470 sediment and soil samples collected and analyzed under the Atomic Energy Commission National Uranium Resource Evaluation (NURE) Hydrogeochemical and Stream Sediment Reconnaissance (HSSR) program (henceforth called NURE) have been included in this database. In addition to data from 2,377 samples collected and analyzed under CCAP, this dataset includes archived geochemical data originally entered into the in-house Rock Analysis Storage System (RASS) database (used by the USGS from the mid-1960s through the late 1980s) and the in-house PLUTO database (used by the USGS from the mid-1970s through the mid-1990s). All of these data are maintained in the Oracle-based National Geochemical Database (NGDB). Retrievals from the NGDB and from the NURE database were used to generate most of this dataset. In addition, USGS data that have been excluded previously from the NGDB because the data predate earliest USGS geochemical databases, or were once excluded for programmatic reasons, have been included in the CCAP Geochemical Database and are planned to be added to the NGDB.

  15. The Research of Spatial-Temporal Analysis and Decision-Making Assistant System for Disabled Person Affairs Based on Mapworld

    NASA Astrophysics Data System (ADS)

    Zhang, J. H.; Yang, J.; Sun, Y. S.

    2015-06-01

    This system combines the Mapworld platform and informationization of disabled person affairs, uses the basic information of disabled person as center frame. Based on the disabled person population database, the affairs management system and the statistical account system, the data were effectively integrated and the united information resource database was built. Though the data analysis and mining, the system provides powerful data support to the decision making, the affairs managing and the public serving. It finally realizes the rationalization, normalization and scientization of disabled person affairs management. It also makes significant contributions to the great-leap-forward development of the informationization of China Disabled Person's Federation.

  16. Soil and Land Resources Information System (SLISYS-Tarim) for Sustainable Management of River Oases along the Tarim River, China

    NASA Astrophysics Data System (ADS)

    Othmanli, Hussein; Zhao, Chengyi; Stahr, Karl

    2017-04-01

    The Tarim River Basin is the largest continental basin in China. The region has extremely continental desert climate characterized by little rainfall <50 mm/a and high potential evaporation >3000 mm/a. The climate change is affecting severely the basin causing soil salinization, water shortage, and regression in crop production. Therefore, a Soil and Land Resources Information System (SLISYS-Tarim) for the regional simulation of crop yield production in the basin was developed. The SLISYS-Tarim consists of a database and an agro-ecological simulation model EPIC (Environmental Policy Integrated Climate). The database comprises relational tables including information about soils, terrain conditions, land use, and climate. The soil data implicate information of 50 soil profiles which were dug, analyzed, described and classified in order to characterize the soils in the region. DEM data were integrated with geological maps to build a digital terrain structure. Remote sensing data of Landsat images were applied for soil mapping, and for land use and land cover classification. An additional database for climate data, land management and crop information were linked to the system, too. Construction of the SLISYS-Tarim database was accomplished by integrating and overlaying the recommended thematic maps within environment of the geographic information system (GIS) to meet the data standard of the global and national SOTER digital database. This database forms appropriate input- and output data for the crop modelling with the EPIC model at various scales in the Tarim Basin. The EPIC model was run for simulating cotton production under a constructed scenario characterizing the current management practices, soil properties and climate conditions. For the EPIC model calibration, some parameters were adjusted so that the modeled cotton yield fits to the measured yield on the filed scale. The validation of the modeling results was achieved in a later step based on remote sensing data. The simulated cotton yield varied according to field management, soil type and salinity level, where soil salinity was the main limiting factor. Furthermore, the calibrated and validated EPIC model was run under several scenarios of climate conditions and land management practices to estimate the effect of climate change on cotton production and sustainability of agriculture systems in the basin. The application of SLISYS-Tarim showed that this database can be a suitable framework for storage and retrieval of soil and terrain data at various scales. The simulation with the EPIC model can assess the impact of climate change and management strategies. Therefore, SLISYS-Tarim can be a good tool for regional planning and serve the decision support system on regional and national scale.

  17. THE CELL CENTERED DATABASE PROJECT: AN UPDATE ON BUILDING COMMUNITY RESOURCES FOR MANAGING AND SHARING 3D IMAGING DATA

    PubMed Central

    Martone, Maryann E.; Tran, Joshua; Wong, Willy W.; Sargis, Joy; Fong, Lisa; Larson, Stephen; Lamont, Stephan P.; Gupta, Amarnath; Ellisman, Mark H.

    2008-01-01

    Databases have become integral parts of data management, dissemination and mining in biology. At the Second Annual Conference on Electron Tomography, held in Amsterdam in 2001, we proposed that electron tomography data should be shared in a manner analogous to structural data at the protein and sequence scales. At that time, we outlined our progress in creating a database to bring together cell level imaging data across scales, The Cell Centered Database (CCDB). The CCDB was formally launched in 2002 as an on-line repository of high-resolution 3D light and electron microscopic reconstructions of cells and subcellular structures. It contains 2D, 3D and 4D structural and protein distribution information from confocal, multiphoton and electron microscopy, including correlated light and electron microscopy. Many of the data sets are derived from electron tomography of cells and tissues. In the five years since its debut, we have moved the CCDB from a prototype to a stable resource and expanded the scope of the project to include data management and knowledge engineering. Here we provide an update on the CCDB and how it is used by the scientific community. We also describe our work in developing additional knowledge tools, e.g., ontologies, for annotation and query of electron microscopic data. PMID:18054501

  18. CancerLectinDB: a database of lectins relevant to cancer.

    PubMed

    Damodaran, Deepa; Jeyakani, Justin; Chauhan, Alok; Kumar, Nirmal; Chandra, Nagasuma R; Surolia, Avadhesha

    2008-04-01

    The role of lectins in mediating cancer metastasis, apoptosis as well as various other signaling events has been well established in the past few years. Data on various aspects of the role of lectins in cancer is being accumulated at a rapid pace. The data on lectins available in the literature is so diverse, that it becomes difficult and time-consuming, if not impossible to comprehend the advances in various areas and obtain the maximum benefit. Not only do the lectins vary significantly in their individual functional roles, but they are also diverse in their sequences, structures, binding site architectures, quaternary structures, carbohydrate affinities and specificities as well as their potential applications. An organization of these seemingly independent data into a common framework is essential in order to achieve effective use of all the data towards understanding the roles of different lectins in different aspects of cancer and any resulting applications. An integrated knowledge base (CancerLectinDB) together with appropriate analytical tools has therefore been developed for lectins relevant for any aspect of cancer, by collating and integrating diverse data. This database is unique in terms of providing sequence, structural, and functional annotations for lectins from all known sources in cancer and is expected to be a useful addition to the number of glycan related resources now available to the community. The database has been implemented using MySQL on a Linux platform and web-enabled using Perl-CGI and Java tools. Data for individual lectins pertain to taxonomic, biochemical, domain architecture, molecular sequence and structural details as well as carbohydrate specificities. Extensive links have also been provided for relevant bioinformatics resources and analytical tools. Availability of diverse data integrated into a common framework is expected to be of high value for various studies on lectin cancer biology. CancerLectinDB can be accessed through http://proline.physics.iisc.ernet.in/cancerdb .

  19. ARIANE: integration of information databases within a hospital intranet.

    PubMed

    Joubert, M; Aymard, S; Fieschi, D; Volot, F; Staccini, P; Robert, J J; Fieschi, M

    1998-05-01

    Large information systems handle massive volume of data stored in heterogeneous sources. Each server has its own model of representation of concepts with regard to its aims. One of the main problems end-users encounter when accessing different servers is to match their own viewpoint on biomedical concepts with the various representations that are made in the databases servers. The aim of the project ARIANE is to provide end-users with easy-to-use and natural means to access and query heterogeneous information databases. The objectives of this research work consist in building a conceptual interface by means of the Internet technology inside an enterprise Intranet and to propose a method to realize it. This method is based on the knowledge sources provided by the Unified Medical Language System (UMLS) project of the US National Library of Medicine. Experiments concern queries to three different information servers: PubMed, a Medline server of the NLM; Thériaque, a French database on drugs implemented in the Hospital Intranet; and a Web site dedicated to Internet resources in gastroenterology and nutrition, located at the Faculty of Medicine of Nice (France). Accessing to each of these servers is different according to the kind of information delivered and according to the technology used to query it. Dealing with health care professional workstation, the authors introduced in the ARIANE project quality criteria in order to attempt a homogeneous and efficient way to build a query system able to be integrated in existing information systems and to integrate existing and new information sources.

  20. Androgen-responsive gene database: integrated knowledge on androgen-responsive genes.

    PubMed

    Jiang, Mei; Ma, Yunsheng; Chen, Congcong; Fu, Xuping; Yang, Shu; Li, Xia; Yu, Guohua; Mao, Yumin; Xie, Yi; Li, Yao

    2009-11-01

    Androgen signaling plays an important role in many biological processes. Androgen Responsive Gene Database (ARGDB) is devoted to providing integrated knowledge on androgen-controlled genes. Gene records were collected on the basis of PubMed literature collections. More than 6000 abstracts and 950 original publications were manually screened, leading to 1785 human genes, 993 mouse genes, and 583 rat genes finally included in the database. All the collected genes were experimentally proved to be regulated by androgen at the expression level or to contain androgen-responsive regions. For each gene important details of the androgen regulation experiments were collected from references, such as expression change, androgen-responsive sequence, response time, tissue/cell type, experimental method, ligand identity, and androgen amount, which will facilitate further evaluation by researchers. Furthermore, the database was integrated with multiple annotation resources, including National Center for Biotechnology Information, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes pathway, to reveal the biological characteristics and significance of androgen-regulated genes. The ARGDB web site is mainly composed of the Browse, Search, Element Scan, and Submission modules. It is user friendly and freely accessible at http://argdb.fudan.edu.cn. Preliminary analysis of the collected data was performed. Many disease pathways, such as prostate carcinogenesis, were found to be enriched in androgen-regulated genes. The discovered androgen-response motifs were similar to those in previous reports. The analysis results are displayed in the web site. In conclusion, ARGDB provides a unified gateway to storage, retrieval, and update of information on androgen-regulated genes.

  1. Colorado Late Cenozoic Fault and Fold Database and Internet Map Server: User-friendly technology for complex information

    USGS Publications Warehouse

    Morgan, K.S.; Pattyn, G.J.; Morgan, M.L.

    2005-01-01

    Internet mapping applications for geologic data allow simultaneous data delivery and collection, enabling quick data modification while efficiently supplying the end user with information. Utilizing Web-based technologies, the Colorado Geological Survey's Colorado Late Cenozoic Fault and Fold Database was transformed from a monothematic, nonspatial Microsoft Access database into a complex information set incorporating multiple data sources. The resulting user-friendly format supports easy analysis and browsing. The core of the application is the Microsoft Access database, which contains information compiled from available literature about faults and folds that are known or suspected to have moved during the late Cenozoic. The database contains nonspatial fields such as structure type, age, and rate of movement. Geographic locations of the fault and fold traces were compiled from previous studies at 1:250,000 scale to form a spatial database containing information such as length and strike. Integration of the two databases allowed both spatial and nonspatial information to be presented on the Internet as a single dataset (http://geosurvey.state.co.us/pubs/ceno/). The user-friendly interface enables users to view and query the data in an integrated manner, thus providing multiple ways to locate desired information. Retaining the digital data format also allows continuous data updating and quick delivery of newly acquired information. This dataset is a valuable resource to anyone interested in earthquake hazards and the activity of faults and folds in Colorado. Additional geologic hazard layers and imagery may aid in decision support and hazard evaluation. The up-to-date and customizable maps are invaluable tools for researchers or the public.

  2. Information literacy in science writing: how students find, identify, and use scientific literature

    NASA Astrophysics Data System (ADS)

    Klucevsek, Kristin M.; Brungard, Allison B.

    2016-11-01

    For undergraduate students to achieve science literacy, they must first develop information literacy skils. These skills align with Information Literacy Standards and include determining appropriate databases, distinguishing among resource types, and citing resources ethically. To effectively improve information literacy and science literacy, we must identify how students interact with authentic scientific texts. In this case study, we addressed this aim by embedding a science librarian into a science writing course, where students wrote a literature review on a research topic of their choice. Library instruction was further integrated through the use of an online guide and outside assistance. To evaluate the evolution of information literacy in our students and provide evidence of student practices, we used task-scaffolded writing assessments, a reflection, and surveys. We found that students improved their ability and confidence in finding research articles using discipline-specific databases as well as their ability to distinguish primary from secondary research articles. We also identified ways students improperly used and cited resources in their writing assignments. While our results reveal a better understanding of how students find and approach scientific research articles, additional research is needed to develop effective strategies to improve long-term information literacy in the sciences.

  3. Characteristics of Resources Represented in the OCLC CORC Database.

    ERIC Educational Resources Information Center

    Connell, Tschera Harkness; Prabha, Chandra

    2002-01-01

    Examines the characteristics of Web resources in Online Computer Library Center's (OCLC) Cooperative Online Resource Catalog (CORC) in terms of subject matter, source of content, publication patterns, and units of information chosen for representation in the database. Suggests that the ability to successfully use a database depends on…

  4. The SBOL Stack: A Platform for Storing, Publishing, and Sharing Synthetic Biology Designs.

    PubMed

    Madsen, Curtis; McLaughlin, James Alastair; Mısırlı, Göksel; Pocock, Matthew; Flanagan, Keith; Hallinan, Jennifer; Wipat, Anil

    2016-06-17

    Recently, synthetic biologists have developed the Synthetic Biology Open Language (SBOL), a data exchange standard for descriptions of genetic parts, devices, modules, and systems. The goals of this standard are to allow scientists to exchange designs of biological parts and systems, to facilitate the storage of genetic designs in repositories, and to facilitate the description of genetic designs in publications. In order to achieve these goals, the development of an infrastructure to store, retrieve, and exchange SBOL data is necessary. To address this problem, we have developed the SBOL Stack, a Resource Description Framework (RDF) database specifically designed for the storage, integration, and publication of SBOL data. This database allows users to define a library of synthetic parts and designs as a service, to share SBOL data with collaborators, and to store designs of biological systems locally. The database also allows external data sources to be integrated by mapping them to the SBOL data model. The SBOL Stack includes two Web interfaces: the SBOL Stack API and SynBioHub. While the former is designed for developers, the latter allows users to upload new SBOL biological designs, download SBOL documents, search by keyword, and visualize SBOL data. Since the SBOL Stack is based on semantic Web technology, the inherent distributed querying functionality of RDF databases can be used to allow different SBOL stack databases to be queried simultaneously, and therefore, data can be shared between different institutes, centers, or other users.

  5. Integration of Sustainable Practices into Standard Army MILCON Designs

    DTIC Science & Technology

    2011-09-01

    Sustainable Installations Regional Resource Assessment (SIRRA™) web -based database analysis tool output, ERDC-CERL, 2010. 18 Roy, Sujoy, B. L. Chen, E...Water issues white paper . ERDC/CERL TR-11-27 25 streams. This policy states that in the absence of other flow limits as estab- lished by the...89 Note that flush quality in some efficient toilets is undermined by non-flushable recyclable toilet paper that builds up in the

  6. UMass at TREC 2002: Cross Language and Novelty Tracks

    DTIC Science & Technology

    2002-01-01

    resources – stemmers, dictionaries , machine translation, and an acronym database. We found that proper names were extremely important in this year’s queries...data by manually annotating 48 additional topics. 1. Cross Language Track We submitted one monolingual run and four cross-language runs. For the... monolingual run, the technology was essentially the same as the system we used for TREC 2001. For the cross-language run, we integrated some new

  7. The Planetary Virtual Observatory and Laboratory (PVOL) and its integration into the Virtual European Solar and Planetary Access (VESPA)

    NASA Astrophysics Data System (ADS)

    Hueso, R.; Juaristi, J.; Legarreta, J.; Sánchez-Lavega, A.; Rojas, J. F.; Erard, S.; Cecconi, B.; Le Sidaner, Pierre

    2018-01-01

    Since 2003 the Planetary Virtual Observatory and Laboratory (PVOL) has been storing and serving publicly through its web site a large database of amateur observations of the Giant Planets (Hueso et al., 2010a). These images are used for scientific research of the atmospheric dynamics and cloud structure on these planets and constitute a powerful resource to address time variable phenomena in their atmospheres. Advances over the last decade in observation techniques, and a wider recognition by professional astronomers of the quality of amateur observations, have resulted in the need to upgrade this database. We here present major advances in the PVOL database, which has evolved into a full virtual planetary observatory encompassing also observations of Mercury, Venus, Mars, the Moon and the Galilean satellites. Besides the new objects, the images can be tagged and the database allows simple and complex searches over the data. The new web service: PVOL2 is available online in http://pvol2.ehu.eus/.

  8. SeqDepot: streamlined database of biological sequences and precomputed features.

    PubMed

    Ulrich, Luke E; Zhulin, Igor B

    2014-01-15

    Assembling and/or producing integrated knowledge of sequence features continues to be an onerous and redundant task despite a large number of existing resources. We have developed SeqDepot-a novel database that focuses solely on two primary goals: (i) assimilating known primary sequences with predicted feature data and (ii) providing the most simple and straightforward means to procure and readily use this information. Access to >28.5 million sequences and 300 million features is provided through a well-documented and flexible RESTful interface that supports fetching specific data subsets, bulk queries, visualization and searching by MD5 digests or external database identifiers. We have also developed an HTML5/JavaScript web application exemplifying how to interact with SeqDepot and Perl/Python scripts for use with local processing pipelines. Freely available on the web at http://seqdepot.net/. RESTaccess via http://seqdepot.net/api/v1. Database files and scripts maybe downloaded from http://seqdepot.net/download.

  9. PeptideDepot: flexible relational database for visual analysis of quantitative proteomic data and integration of existing protein information.

    PubMed

    Yu, Kebing; Salomon, Arthur R

    2009-12-01

    Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through MS/MS. Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to various experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our high throughput autonomous proteomic pipeline used in the automated acquisition and post-acquisition analysis of proteomic data.

  10. Development of Korean Rare Disease Knowledge Base

    PubMed Central

    Seo, Heewon; Kim, Dokyoon; Chae, Jong-Hee; Kang, Hee Gyung; Lim, Byung Chan; Cheong, Hae Il

    2012-01-01

    Objectives Rare disease research requires a broad range of disease-related information for the discovery of causes of genetic disorders that are maladies caused by abnormalities in genes or chromosomes. A rarity in cases makes it difficult for researchers to elucidate definite inception. This knowledge base will be a major resource not only for clinicians, but also for the general public, who are unable to find consistent information on rare diseases in a single location. Methods We design a compact database schema for faster querying; its structure is optimized to store heterogeneous data sources. Then, clinicians at Seoul National University Hospital (SNUH) review and revise those resources. Additionally, we integrated other sources to capture genomic resources and clinical trials in detail on the Korean Rare Disease Knowledge base (KRDK). Results As a result, we have developed a Web-based knowledge base, KRDK, suitable for study of Mendelian diseases that commonly occur among Koreans. This knowledge base is comprised of disease summary and review, causal gene list, laboratory and clinic directory, patient registry, and so on. Furthermore, database for analyzing and giving access to human biological information and the clinical trial management system are integrated on KRDK. Conclusions We expect that KRDK, the first rare disease knowledge base in Korea, may contribute to collaborative research and be a reliable reference for application to clinical trials. Additionally, this knowledge base is ready for querying of drug information so that visitors can search a list of rare diseases that is relative to specific drugs. Visitors can have access to KRDK via http://www.snubi.org/software/raredisease/. PMID:23346478

  11. PLMD: An updated data resource of protein lysine modifications.

    PubMed

    Xu, Haodong; Zhou, Jiaqi; Lin, Shaofeng; Deng, Wankun; Zhang, Ying; Xue, Yu

    2017-05-20

    Post-translational modifications (PTMs) occurring at protein lysine residues, or protein lysine modifications (PLMs), play critical roles in regulating biological processes. Due to the explosive expansion of the amount of PLM substrates and the discovery of novel PLM types, here we greatly updated our previous studies, and presented a much more integrative resource of protein lysine modification database (PLMD). In PLMD, we totally collected and integrated 284,780 modification events in 53,501 proteins across 176 eukaryotes and prokaryotes for up to 20 types of PLMs, including ubiquitination, acetylation, sumoylation, methylation, succinylation, malonylation, glutarylation, glycation, formylation, hydroxylation, butyrylation, propionylation, crotonylation, pupylation, neddylation, 2-hydroxyisobutyrylation, phosphoglycerylation, carboxylation, lipoylation and biotinylation. Using the data set, a motif-based analysis was performed for each PLM type, and the results demonstrated that different PLM types preferentially recognize distinct sequence motifs for the modifications. Moreover, various PLMs synergistically orchestrate specific cellular biological processes by mutual crosstalks with each other, and we totally found 65,297 PLM events involved in 90 types of PLM co-occurrences on the same lysine residues. Finally, various options were provided for accessing the data, while original references and other annotations were also present for each PLM substrate. Taken together, we anticipated the PLMD database can serve as a useful resource for further researches of PLMs. PLMD 3.0 was implemented in PHP + MySQL and freely available at http://plmd.biocuckoo.org. Copyright © 2017 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Ltd. All rights reserved.

  12. The Requirements and Design of the Rapid Prototyping Capabilities System

    NASA Astrophysics Data System (ADS)

    Haupt, T. A.; Moorhead, R.; O'Hara, C.; Anantharaj, V.

    2006-12-01

    The Rapid Prototyping Capabilities (RPC) system will provide the capability to rapidly evaluate innovative methods of linking science observations. To this end, the RPC will provide the capability to integrate the software components and tools needed to evaluate the use of a wide variety of current and future NASA sensors, numerical models, and research results, model outputs, and knowledge, collectively referred to as "resources". It is assumed that the resources are geographically distributed, and thus RPC will provide the support for the location transparency of the resources. The RPC system requires providing support for: (1) discovery, semantic understanding, secure access and transport mechanisms for data products available from the known data provides; (2) data assimilation and geo- processing tools for all data transformations needed to match given data products to the model input requirements; (3) model management including catalogs of models and model metadata, and mechanisms for creation environments for model execution; and (4) tools for model output analysis and model benchmarking. The challenge involves developing a cyberinfrastructure for a coordinated aggregate of software, hardware and other technologies, necessary to facilitate RPC experiments, as well as human expertise to provide an integrated, "end-to-end" platform to support the RPC objectives. Such aggregation is to be achieved through a horizontal integration of loosely coupled services. The cyberinfrastructure comprises several software layers. At the bottom, the Grid fabric encompasses network protocols, optical networks, computational resources, storage devices, and sensors. At the top, applications use workload managers to coordinate their access to physical resources. Applications are not tightly bounded to a single physical resource. Instead, they bind dynamically to resources (i.e., they are provisioned) via a common grid infrastructure layer. For the RPC system, the cyberinfrastructure must support organizing computations (or "data transformations" in general) into complex workflows with resource discovery, automatic resource allocation, monitoring, preserving provenance as well as to aggregate heterogeneous, distributed data into knowledge databases. Such service orchestration is the responsibility of the "collective services" layer. For RPC, this layer will be based on Java Business Integration (JBI, [JSR-208]) specification which is a standards-based integration platform that combines messaging, web services, data transformation, and intelligent routing to reliably connect and coordinate the interaction of significant numbers of diverse applications (plug-in components) across organizational boundaries. JBI concept is a new approach to integration that can provide the underpinnings for loosely coupled, highly distributed integration network that can scale beyond the limits of currently used hub-and-spoke brokers. This presentation discusses the requirements, design and early prototype of the NASA-sponsored RPC system under development at Mississippi State University, demonstrating the integration of data provisioning mechanisms, data transformation tools and computational models into a single interoperable system enabling rapid execution of RPC experiments.

  13. MIMIC II: a massive temporal ICU patient database to support research in intelligent patient monitoring

    NASA Technical Reports Server (NTRS)

    Saeed, M.; Lieu, C.; Raber, G.; Mark, R. G.

    2002-01-01

    Development and evaluation of Intensive Care Unit (ICU) decision-support systems would be greatly facilitated by the availability of a large-scale ICU patient database. Following our previous efforts with the MIMIC (Multi-parameter Intelligent Monitoring for Intensive Care) Database, we have leveraged advances in networking and storage technologies to develop a far more massive temporal database, MIMIC II. MIMIC II is an ongoing effort: data is continuously and prospectively archived from all ICU patients in our hospital. MIMIC II now consists of over 800 ICU patient records including over 120 gigabytes of data and is growing. A customized archiving system was used to store continuously up to four waveforms and 30 different parameters from ICU patient monitors. An integrated user-friendly relational database was developed for browsing of patients' clinical information (lab results, fluid balance, medications, nurses' progress notes). Based upon its unprecedented size and scope, MIMIC II will prove to be an important resource for intelligent patient monitoring research, and will support efforts in medical data mining and knowledge-discovery.

  14. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencingmore » projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.« less

  15. The Cancer Epidemiology Descriptive Cohort Database: A Tool to Support Population-Based Interdisciplinary Research.

    PubMed

    Kennedy, Amy E; Khoury, Muin J; Ioannidis, John P A; Brotzman, Michelle; Miller, Amy; Lane, Crystal; Lai, Gabriel Y; Rogers, Scott D; Harvey, Chinonye; Elena, Joanne W; Seminara, Daniela

    2016-10-01

    We report on the establishment of a web-based Cancer Epidemiology Descriptive Cohort Database (CEDCD). The CEDCD's goals are to enhance awareness of resources, facilitate interdisciplinary research collaborations, and support existing cohorts for the study of cancer-related outcomes. Comprehensive descriptive data were collected from large cohorts established to study cancer as primary outcome using a newly developed questionnaire. These included an inventory of baseline and follow-up data, biospecimens, genomics, policies, and protocols. Additional descriptive data extracted from publicly available sources were also collected. This information was entered in a searchable and publicly accessible database. We summarized the descriptive data across cohorts and reported the characteristics of this resource. As of December 2015, the CEDCD includes data from 46 cohorts representing more than 6.5 million individuals (29% ethnic/racial minorities). Overall, 78% of the cohorts have collected blood at least once, 57% at multiple time points, and 46% collected tissue samples. Genotyping has been performed by 67% of the cohorts, while 46% have performed whole-genome or exome sequencing in subsets of enrolled individuals. Information on medical conditions other than cancer has been collected in more than 50% of the cohorts. More than 600,000 incident cancer cases and more than 40,000 prevalent cases are reported, with 24 cancer sites represented. The CEDCD assembles detailed descriptive information on a large number of cancer cohorts in a searchable database. Information from the CEDCD may assist the interdisciplinary research community by facilitating identification of well-established population resources and large-scale collaborative and integrative research. Cancer Epidemiol Biomarkers Prev; 25(10); 1392-401. ©2016 AACR. ©2016 American Association for Cancer Research.

  16. Specification of parameters for development of a spatial database for drought monitoring and famine early warning in the African Sahel

    NASA Technical Reports Server (NTRS)

    Rochon, Gilbert L.

    1989-01-01

    Parameters were described for spatial database to facilitate drought monitoring and famine early warning in the African Sahel. The proposed system, referred to as the African Drought and Famine Information System (ADFIS) is ultimately recommended for implementation with the NASA/FEMA Spatial Analysis and Modeling System (SAMS), a GIS/Dymanic Modeling software package, currently under development. SAMS is derived from FEMA'S Integration Emergency Management Information System (IEMIS) and the Pacific Northwest Laborotory's/Engineering Topographic Laboratory's Airland Battlefield Environment (ALBE) GIS. SAMS is primarily intended for disaster planning and resource management applications with the developing countries. Sources of data for the system would include the Developing Economics Branch of the U.S. Dept. of Agriculture, the World Bank, Tulane University School of Public Health and Tropical Medicine's Famine Early Warning Systems (FEWS) Project, the USAID's Foreign Disaster Assistance Section, the World Resources Institute, the World Meterological Institute, the USGS, the UNFAO, UNICEF, and the United Nations Disaster Relief Organization (UNDRO). Satellite imagery would include decadal AVHRR imagery and Normalized Difference Vegetation Index (NDVI) values from 1981 to the present for the African continent and selected Landsat scenes for the Sudan pilot study. The system is initially conceived for the MicroVAX 2/GPX, running VMS. To facilitate comparative analysis, a global time-series database (1950 to 1987) is included for a basic set of 125 socio-economic variables per country per year. A more detailed database for the Sahelian countries includes soil type, water resources, agricultural production, agricultural import and export, food aid, and consumption. A pilot dataset for the Sudan with over 2,500 variables from the World Bank's ANDREX system, also includes epidemiological data on incidence of kwashiorkor, marasmus, other nutritional deficiencies, and synergistically-related infectious diseases.

  17. Discovering, Indexing and Interlinking Information Resources

    PubMed Central

    Celli, Fabrizio; Keizer, Johannes; Jaques, Yves; Konstantopoulos, Stasinos; Vudragović, Dušan

    2015-01-01

    The social media revolution is having a dramatic effect on the world of scientific publication. Scientists now publish their research interests, theories and outcomes across numerous channels, including personal blogs and other thematic web spaces where ideas, activities and partial results are discussed. Accordingly, information systems that facilitate access to scientific literature must learn to cope with this valuable and varied data, evolving to make this research easily discoverable and available to end users. In this paper we describe the incremental process of discovering web resources in the domain of agricultural science and technology. Making use of Linked Open Data methodologies, we interlink a wide array of custom-crawled resources with the AGRIS bibliographic database in order to enrich the user experience of the AGRIS website. We also discuss the SemaGrow Stack, a query federation and data integration infrastructure used to estimate the semantic distance between crawled web resources and AGRIS. PMID:26834982

  18. Mathematics and online learning experiences: a gateway site for engineering students

    NASA Astrophysics Data System (ADS)

    Masouros, Spyridon D.; Alpay, Esat

    2010-03-01

    This paper focuses on the preliminary design of a multifaceted computer-based mathematics resource for undergraduate and pre-entry engineering students. Online maths resources, while attractive in their flexibility of delivery, have seen variable interest from students and teachers alike. Through student surveys and wide consultations, guidelines have been developed for effectively collating and integrating learning, support, application and diagnostic tools to produce an Engineer's Mathematics Gateway. Specific recommendations include: the development of a shared database of engineering discipline-specific problems and examples; the identification of, and resource development for, troublesome mathematics topics which encompass ideas of threshold concepts and mastery components; the use of motivational and promotional material to raise student interest in learning mathematics in an engineering context; the use of general and lecture-specific concept maps and matrices to identify the needs and relevance of mathematics to engineering topics; and further exploration of the facilitation of peer-based learning through online resources.

  19. Development of an Intelligent Monitoring System for Geological Carbon Sequestration (GCS) Systems

    NASA Astrophysics Data System (ADS)

    Sun, A. Y.; Jeong, H.; Xu, W.; Hovorka, S. D.; Zhu, T.; Templeton, T.; Arctur, D. K.

    2016-12-01

    To provide stakeholders timely evidence that GCS repositories are operating safely and efficiently requires integrated monitoring to assess the performance of the storage reservoir as the CO2 plume moves within it. As a result, GCS projects can be data intensive, as a result of proliferation of digital instrumentation and smart-sensing technologies. GCS projects are also resource intensive, often requiring multidisciplinary teams performing different monitoring, verification, and accounting (MVA) tasks throughout the lifecycle of a project to ensure secure containment of injected CO2. How to correlate anomaly detected by a certain sensor to events observed by other devices to verify leakage incidents? How to optimally allocate resources for task-oriented monitoring if reservoir integrity is in question? These are issues that warrant further investigation before real integration can take place. In this work, we are building a web-based, data integration, assimilation, and learning framework for geologic carbon sequestration projects (DIAL-GCS). DIAL-GCS will be an intelligent monitoring system (IMS) for automating GCS closed-loop management by leveraging recent developments in high-throughput database, complex event processing, data assimilation, and machine learning technologies. Results will be demonstrated using realistic data and model derived from a GCS site.

  20. Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center

    DOE PAGES

    Davis, James J.; Brettin, Thomas; Dietrich, Emily M.; ...

    2016-11-28

    Here, the Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center. Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data.more » Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by `virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.« less

  1. IntegratedMap: a Web interface for integrating genetic map data.

    PubMed

    Yang, Hongyu; Wang, Hongyu; Gingle, Alan R

    2005-05-01

    IntegratedMap is a Web application and database schema for storing and interactively displaying genetic map data. Its Web interface includes a menu for direct chromosome/linkage group selection, a search form for selection based on mapped object location and linkage group displays. An overview display provides convenient access to the full range of mapped and anchored object types with genetic locus details, such as numbers, types and names of mapped/anchored objects displayed in a compact scrollable list box that automatically updates based on selected map location and object type. Also, multilinkage group and localized map views are available along with links that can be configured for integration with other Web resources. IntegratedMap is implemented in C#/ASP.NET and the package, including a MySQL schema creation script, is available from http://cggc.agtec.uga.edu/Data/download.asp

  2. Putative Microsatellite DNA Marker-Based Wheat Genomic Resource for Varietal Improvement and Management.

    PubMed

    Jaiswal, Sarika; Sheoran, Sonia; Arora, Vasu; Angadi, Ulavappa B; Iquebal, Mir A; Raghav, Nishu; Aneja, Bharti; Kumar, Deepender; Singh, Rajender; Sharma, Pradeep; Singh, G P; Rai, Anil; Tiwari, Ratan; Kumar, Dinesh

    2017-01-01

    Wheat fulfills 20% of global caloric requirement. World needs 60% more wheat for 9 billion population by 2050 but climate change with increasing temperature is projected to affect wheat productivity adversely. Trait improvement and management of wheat germplasm requires genomic resource. Simple Sequence Repeats (SSRs) being highly polymorphic and ubiquitously distributed in the genome, can be a marker of choice but there is no structured marker database with options to generate primer pairs for genotyping on desired chromosome/physical location. Previously associated markers with different wheat trait are also not available in any database. Limitations of in vitro SSR discovery can be overcome by genome-wide in silico mining of SSR. Triticum aestivum SSR database ( TaSSRDb ) is an integrated online database with three-tier architecture, developed using PHP and MySQL and accessible at http://webtom.cabgrid.res.in/wheatssr/. For genotyping, Primer3 standalone code computes primers on user request. Chromosome-wise SSR calling for all the three sub genomes along with choice of motif types is provided in addition to the primer generation for desired marker. We report here a database of highest number of SSRs (476,169) from complex, hexaploid wheat genome (~17 GB) along with previously reported 268 SSR markers associated with 11 traits. Highest (116.93 SSRs/Mb) and lowest (74.57 SSRs/Mb) SSR densities were found on 2D and 3A chromosome, respectively. To obtain homozygous locus, e-PCR was done. Such 30 loci were randomly selected for PCR validation in panel of 18 wheat Advance Varietal Trial (AVT) lines. TaSSRDb can be a valuable genomic resource tool for linkage mapping, gene/QTL (Quantitative trait locus) discovery, diversity analysis, traceability and variety identification. Varietal specific profiling and differentiation can supplement DUS (Distinctiveness, Uniformity, and Stability) testing, EDV (Essentially Derived Variety)/IV (Initial Variety) disputes, seed purity and hybrid wheat testing. All these are required in germplasm management as well as also in the endeavor of wheat productivity.

  3. HPIDB 2.0: a curated database for host–pathogen interactions

    PubMed Central

    Ammari, Mais G.; Gresham, Cathy R.; McCarthy, Fiona M.; Nanduri, Bindu

    2016-01-01

    Identification and analysis of host–pathogen interactions (HPI) is essential to study infectious diseases. However, HPI data are sparse in existing molecular interaction databases, especially for agricultural host–pathogen systems. Therefore, resources that annotate, predict and display the HPI that underpin infectious diseases are critical for developing novel intervention strategies. HPIDB 2.0 (http://www.agbase.msstate.edu/hpi/main.html) is a resource for HPI data, and contains 45, 238 manually curated entries in the current release. Since the first description of the database in 2010, multiple enhancements to HPIDB data and interface services were made that are described here. Notably, HPIDB 2.0 now provides targeted biocuration of molecular interaction data. As a member of the International Molecular Exchange consortium, annotations provided by HPIDB 2.0 curators meet community standards to provide detailed contextual experimental information and facilitate data sharing. Moreover, HPIDB 2.0 provides access to rapidly available community annotations that capture minimum molecular interaction information to address immediate researcher needs for HPI network analysis. In addition to curation, HPIDB 2.0 integrates HPI from existing external sources and contains tools to infer additional HPI where annotated data are scarce. Compared to other interaction databases, our data collection approach ensures HPIDB 2.0 users access the most comprehensive HPI data from a wide range of pathogens and their hosts (594 pathogen and 70 host species, as of February 2016). Improvements also include enhanced search capacity, addition of Gene Ontology functional information, and implementation of network visualization. The changes made to HPIDB 2.0 content and interface ensure that users, especially agricultural researchers, are able to easily access and analyse high quality, comprehensive HPI data. All HPIDB 2.0 data are updated regularly, are publically available for direct download, and are disseminated to other molecular interaction resources. Database URL: http://www.agbase.msstate.edu/hpi/main.html PMID:27374121

  4. Putative Microsatellite DNA Marker-Based Wheat Genomic Resource for Varietal Improvement and Management

    PubMed Central

    Jaiswal, Sarika; Sheoran, Sonia; Arora, Vasu; Angadi, Ulavappa B.; Iquebal, Mir A.; Raghav, Nishu; Aneja, Bharti; Kumar, Deepender; Singh, Rajender; Sharma, Pradeep; Singh, G. P.; Rai, Anil; Tiwari, Ratan; Kumar, Dinesh

    2017-01-01

    Wheat fulfills 20% of global caloric requirement. World needs 60% more wheat for 9 billion population by 2050 but climate change with increasing temperature is projected to affect wheat productivity adversely. Trait improvement and management of wheat germplasm requires genomic resource. Simple Sequence Repeats (SSRs) being highly polymorphic and ubiquitously distributed in the genome, can be a marker of choice but there is no structured marker database with options to generate primer pairs for genotyping on desired chromosome/physical location. Previously associated markers with different wheat trait are also not available in any database. Limitations of in vitro SSR discovery can be overcome by genome-wide in silico mining of SSR. Triticum aestivum SSR database (TaSSRDb) is an integrated online database with three-tier architecture, developed using PHP and MySQL and accessible at http://webtom.cabgrid.res.in/wheatssr/. For genotyping, Primer3 standalone code computes primers on user request. Chromosome-wise SSR calling for all the three sub genomes along with choice of motif types is provided in addition to the primer generation for desired marker. We report here a database of highest number of SSRs (476,169) from complex, hexaploid wheat genome (~17 GB) along with previously reported 268 SSR markers associated with 11 traits. Highest (116.93 SSRs/Mb) and lowest (74.57 SSRs/Mb) SSR densities were found on 2D and 3A chromosome, respectively. To obtain homozygous locus, e-PCR was done. Such 30 loci were randomly selected for PCR validation in panel of 18 wheat Advance Varietal Trial (AVT) lines. TaSSRDb can be a valuable genomic resource tool for linkage mapping, gene/QTL (Quantitative trait locus) discovery, diversity analysis, traceability and variety identification. Varietal specific profiling and differentiation can supplement DUS (Distinctiveness, Uniformity, and Stability) testing, EDV (Essentially Derived Variety)/IV (Initial Variety) disputes, seed purity and hybrid wheat testing. All these are required in germplasm management as well as also in the endeavor of wheat productivity. PMID:29234333

  5. RatMap--rat genome tools and data.

    PubMed

    Petersen, Greta; Johnson, Per; Andersson, Lars; Klinga-Levan, Karin; Gómez-Fabre, Pedro M; Ståhl, Fredrik

    2005-01-01

    The rat genome database RatMap (http://ratmap.org or http://ratmap.gen.gu.se) has been one of the main resources for rat genome information since 1994. The database is maintained by CMB-Genetics at Goteborg University in Sweden and provides information on rat genes, polymorphic rat DNA-markers and rat quantitative trait loci (QTLs), all curated at RatMap. The database is under the supervision of the Rat Gene and Nomenclature Committee (RGNC); thus much attention is paid to rat gene nomenclature. RatMap presents information on rat idiograms, karyotypes and provides a unified presentation of the rat genome sequence and integrated rat linkage maps. A set of tools is also available to facilitate the identification and characterization of rat QTLs, as well as the estimation of exon/intron number and sizes in individual rat genes. Furthermore, comparative gene maps of rat in regard to mouse and human are provided.

  6. RatMap—rat genome tools and data

    PubMed Central

    Petersen, Greta; Johnson, Per; Andersson, Lars; Klinga-Levan, Karin; Gómez-Fabre, Pedro M.; Ståhl, Fredrik

    2005-01-01

    The rat genome database RatMap (http://ratmap.org or http://ratmap.gen.gu.se) has been one of the main resources for rat genome information since 1994. The database is maintained by CMB–Genetics at Göteborg University in Sweden and provides information on rat genes, polymorphic rat DNA-markers and rat quantitative trait loci (QTLs), all curated at RatMap. The database is under the supervision of the Rat Gene and Nomenclature Committee (RGNC); thus much attention is paid to rat gene nomenclature. RatMap presents information on rat idiograms, karyotypes and provides a unified presentation of the rat genome sequence and integrated rat linkage maps. A set of tools is also available to facilitate the identification and characterization of rat QTLs, as well as the estimation of exon/intron number and sizes in individual rat genes. Furthermore, comparative gene maps of rat in regard to mouse and human are provided. PMID:15608244

  7. Gene regulation knowledge commons: community action takes care of DNA binding transcription factors

    PubMed Central

    Tripathi, Sushil; Vercruysse, Steven; Chawla, Konika; Christie, Karen R.; Blake, Judith A.; Huntley, Rachael P.; Orchard, Sandra; Hermjakob, Henning; Thommesen, Liv; Lægreid, Astrid; Kuiper, Martin

    2016-01-01

    A large gap remains between the amount of knowledge in scientific literature and the fraction that gets curated into standardized databases, despite many curation initiatives. Yet the availability of comprehensive knowledge in databases is crucial for exploiting existing background knowledge, both for designing follow-up experiments and for interpreting new experimental data. Structured resources also underpin the computational integration and modeling of regulatory pathways, which further aids our understanding of regulatory dynamics. We argue how cooperation between the scientific community and professional curators can increase the capacity of capturing precise knowledge from literature. We demonstrate this with a project in which we mobilize biological domain experts who curate large amounts of DNA binding transcription factors, and show that they, although new to the field of curation, can make valuable contributions by harvesting reported knowledge from scientific papers. Such community curation can enhance the scientific epistemic process. Database URL: http://www.tfcheckpoint.org PMID:27270715

  8. PGMapper: a web-based tool linking phenotype to genes.

    PubMed

    Xiong, Qing; Qiu, Yuhui; Gu, Weikuan

    2008-04-01

    With the availability of whole genome sequence in many species, linkage analysis, positional cloning and microarray are gradually becoming powerful tools for investigating the links between phenotype and genotype or genes. However, in these methods, causative genes underlying a quantitative trait locus, or a disease, are usually located within a large genomic region or a large set of genes. Examining the function of every gene is very time consuming and needs to retrieve and integrate the information from multiple databases or genome resources. PGMapper is a software tool for automatically matching phenotype to genes from a defined genome region or a group of given genes by combining the mapping information from the Ensembl database and gene function information from the OMIM and PubMed databases. PGMapper is currently available for candidate gene search of human, mouse, rat, zebrafish and 12 other species. Available online at http://www.genediscovery.org/pgmapper/index.jsp.

  9. MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution

    PubMed Central

    Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian

    2015-01-01

    Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. PMID:26286928

  10. MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution.

    PubMed

    Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian

    2015-01-01

    Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. © The Author(s) 2015. Published by Oxford University Press.

  11. Analysis and visualization of Arabidopsis thaliana GWAS using web 2.0 technologies.

    PubMed

    Huang, Yu S; Horton, Matthew; Vilhjálmsson, Bjarni J; Seren, Umit; Meng, Dazhe; Meyer, Christopher; Ali Amer, Muhammad; Borevitz, Justin O; Bergelson, Joy; Nordborg, Magnus

    2011-01-01

    With large-scale genomic data becoming the norm in biological studies, the storing, integrating, viewing and searching of such data have become a major challenge. In this article, we describe the development of an Arabidopsis thaliana database that hosts the geographic information and genetic polymorphism data for over 6000 accessions and genome-wide association study (GWAS) results for 107 phenotypes representing the largest collection of Arabidopsis polymorphism data and GWAS results to date. Taking advantage of a series of the latest web 2.0 technologies, such as Ajax (Asynchronous JavaScript and XML), GWT (Google-Web-Toolkit), MVC (Model-View-Controller) web framework and Object Relationship Mapper, we have created a web-based application (web app) for the database, that offers an integrated and dynamic view of geographic information, genetic polymorphism and GWAS results. Essential search functionalities are incorporated into the web app to aid reverse genetics research. The database and its web app have proven to be a valuable resource to the Arabidopsis community. The whole framework serves as an example of how biological data, especially GWAS, can be presented and accessed through the web. In the end, we illustrate the potential to gain new insights through the web app by two examples, showcasing how it can be used to facilitate forward and reverse genetics research. Database URL: http://arabidopsis.usc.edu/

  12. Information as Resoures ; A View toward the 2lst Century - Let's Construct Databases by Ourselves -

    NASA Astrophysics Data System (ADS)

    Ohmi, Akira

    A highly-developed information-oriented society based on “Information Network Technology” will be realized in the 21st century. In enterprises, fundamental research will be regarded as important more and more, and the effective use of information as resources will be indispensable. From the viewpoint of international distribution of information there is a criticism that Japan has been offering the information on science and technology insufficiently to the overseas countries, but, for example, in the steel industry lots of house-organ technical journals in English version has been offered overseas. And recently several information firms have started translating Japanese information into English and providing overseas. However, there are some problems to be taken into consideration; 1. The information is not integrated, 2. there is not any co-ordination among the firms, 3. others. Then the author proposes communal use of machine translation system and construction of database for overseas that integrate such firms” work preserving each individuality.

  13. ZikaBase: An integrated ZIKV- Human Interactome Map database.

    PubMed

    Gurumayum, Sanathoi; Brahma, Rahul; Naorem, Leimarembi Devi; Muthaiyan, Mathavan; Gopal, Jeyakodi; Venkatesan, Amouda

    2018-01-15

    Re-emergence of ZIKV has caused infections in more than 1.5 million people. The molecular mechanism and pathogenesis of ZIKV is not well explored due to unavailability of adequate model and lack of publically accessible resources to provide information of ZIKV-Human protein interactome map till today. This study made an attempt to curate the ZIKV-Human interaction proteins from published literatures and RNA-Seq data. 11 direct interaction, 12 associated genes are retrieved from literatures and 3742 Differentially Expressed Genes (DEGs) are obtained from RNA-Seq analysis. The genes have been analyzed to construct the ZIKV-Human Interactome Map. The importance of the study has been illustrated by the enrichment analysis and observed that direct interaction and associated genes are enriched in viral entry into host cell. Also, ZIKV infection modulates 32% signal and 27% immune system pathways. The integrated database, ZikaBase has been developed to help the virology research community and accessible at https://test5.bicpu.edu.in. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Integrated Approach to Reconstruction of Microbial Regulatory Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rodionov, Dmitry A; Novichkov, Pavel S

    2013-11-04

    This project had the goal(s) of development of integrated bioinformatics platform for genome-scale inference and visualization of transcriptional regulatory networks (TRNs) in bacterial genomes. The work was done in Sanford-Burnham Medical Research Institute (SBMRI, P.I. D.A. Rodionov) and Lawrence Berkeley National Laboratory (LBNL, co-P.I. P.S. Novichkov). The developed computational resources include: (1) RegPredict web-platform for TRN inference and regulon reconstruction in microbial genomes, and (2) RegPrecise database for collection, visualization and comparative analysis of transcriptional regulons reconstructed by comparative genomics. These analytical resources were selected as key components in the DOE Systems Biology KnowledgeBase (SBKB). The high-quality data accumulated inmore » RegPrecise will provide essential datasets of reference regulons in diverse microbes to enable automatic reconstruction of draft TRNs in newly sequenced genomes. We outline our progress toward the three aims of this grant proposal, which were: Develop integrated platform for genome-scale regulon reconstruction; Infer regulatory annotations in several groups of bacteria and building of reference collections of microbial regulons; and Develop KnowledgeBase on microbial transcriptional regulation.« less

  15. E-MSD: an integrated data resource for bioinformatics.

    PubMed

    Golovin, A; Oldfield, T J; Tate, J G; Velankar, S; Barton, G J; Boutselakis, H; Dimitropoulos, D; Fillon, J; Hussain, A; Ionides, J M C; John, M; Keller, P A; Krissinel, E; McNeil, P; Naim, A; Newman, R; Pajon, A; Pineda, J; Rachedi, A; Copeland, J; Sitnov, A; Sobhany, S; Suarez-Uruena, A; Swaminathan, G J; Tagari, M; Tromm, S; Vranken, W; Henrick, K

    2004-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the Protein Data Bank (PDB) and to work towards the integration of various bioinformatics data resources. We have implemented a simple form-based interface that allows users to query the MSD directly. The MSD 'atlas pages' show all of the information in the MSD for a particular PDB entry. The group has designed new search interfaces aimed at specific areas of interest, such as the environment of ligands and the secondary structures of proteins. We have also implemented a novel search interface that begins to integrate separate MSD search services in a single graphical tool. We have worked closely with collaborators to build a new visualization tool that can present both structure and sequence data in a unified interface, and this data viewer is now used throughout the MSD services for the visualization and presentation of search results. Examples showcasing the functionality and power of these tools are available from tutorial webpages (http://www. ebi.ac.uk/msd-srv/docs/roadshow_tutorial/).

  16. E-MSD: an integrated data resource for bioinformatics

    PubMed Central

    Golovin, A.; Oldfield, T. J.; Tate, J. G.; Velankar, S.; Barton, G. J.; Boutselakis, H.; Dimitropoulos, D.; Fillon, J.; Hussain, A.; Ionides, J. M. C.; John, M.; Keller, P. A.; Krissinel, E.; McNeil, P.; Naim, A.; Newman, R.; Pajon, A.; Pineda, J.; Rachedi, A.; Copeland, J.; Sitnov, A.; Sobhany, S.; Suarez-Uruena, A.; Swaminathan, G. J.; Tagari, M.; Tromm, S.; Vranken, W.; Henrick, K.

    2004-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the Protein Data Bank (PDB) and to work towards the integration of various bioinformatics data resources. We have implemented a simple form-based interface that allows users to query the MSD directly. The MSD ‘atlas pages’ show all of the information in the MSD for a particular PDB entry. The group has designed new search interfaces aimed at specific areas of interest, such as the environment of ligands and the secondary structures of proteins. We have also implemented a novel search interface that begins to integrate separate MSD search services in a single graphical tool. We have worked closely with collaborators to build a new visualization tool that can present both structure and sequence data in a unified interface, and this data viewer is now used throughout the MSD services for the visualization and presentation of search results. Examples showcasing the functionality and power of these tools are available from tutorial webpages (http://www.ebi.ac.uk/msd-srv/docs/roadshow_tutorial/). PMID:14681397

  17. EPA Facility Registry Service (FRS): RCRA

    EPA Pesticide Factsheets

    This web feature service contains location and facility identification information from EPA's Facility Registry Service (FRS) for the subset of hazardous waste facilities that link to the Resource Conservation and Recovery Act Information System (RCRAInfo). EPA's comprehensive information system in support of the Resource Conservation and Recovery Act (RCRA) of 1976 and the Hazardous and Solid Waste Amendments (HSWA) of 1984, RCRAInfo tracks many types of information about generators, transporters, treaters, storers, and disposers of hazardous waste. FRS identifies and geospatially locates facilities, sites or places subject to environmental regulations or of environmental interest. Using vigorous verification and data management procedures, FRS integrates facility data from EPA's national program systems, other federal agencies, and State and tribal master facility records and provides EPA with a centrally managed, single source of comprehensive and authoritative information on facilities. This data set contains the subset of FRS integrated facilities that link to RCRAInfo hazardous waste facilities once the RCRAInfo data has been integrated into the FRS database. Additional information on FRS is available at the EPA website https://www.epa.gov/enviro/facility-registry-service-frs

  18. The salinity tolerant poplar database (STPD): a comprehensive database for studying tree salt-tolerant adaption and poplar genomics.

    PubMed

    Ma, Yazhen; Xu, Ting; Wan, Dongshi; Ma, Tao; Shi, Sheng; Liu, Jianquan; Hu, Quanjun

    2015-03-17

    Soil salinity is a significant factor that impairs plant growth and agricultural productivity, and numerous efforts are underway to enhance salt tolerance of economically important plants. Populus species are widely cultivated for diverse uses. Especially, they grow in different habitats, from salty soil to mesophytic environment, and are therefore used as a model genus for elucidating physiological and molecular mechanisms of stress tolerance in woody plants. The Salinity Tolerant Poplar Database (STPD) is an integrative database for salt-tolerant poplar genome biology. Currently the STPD contains Populus euphratica genome and its related genetic resources. P. euphratica, with a preference of the salty habitats, has become a valuable genetic resource for the exploitation of tolerance characteristics in trees. This database contains curated data including genomic sequence, genes and gene functional information, non-coding RNA sequences, transposable elements, simple sequence repeats and single nucleotide polymorphisms information of P. euphratica, gene expression data between P. euphratica and Populus tomentosa, and whole-genome alignments between Populus trichocarpa, P. euphratica and Salix suchowensis. The STPD provides useful searching and data mining tools, including GBrowse genome browser, BLAST servers and genome alignments viewer, which can be used to browse genome regions, identify similar sequences and visualize genome alignments. Datasets within the STPD can also be downloaded to perform local searches. A new Salinity Tolerant Poplar Database has been developed to assist studies of salt tolerance in trees and poplar genomics. The database will be continuously updated to incorporate new genome-wide data of related poplar species. This database will serve as an infrastructure for researches on the molecular function of genes, comparative genomics, and evolution in closely related species as well as promote advances in molecular breeding within Populus. The STPD can be accessed at http://me.lzu.edu.cn/stpd/ .

  19. The Integrated Medical Model: A Probabilistic Simulation Model for Predicting In-Flight Medical Risks

    NASA Technical Reports Server (NTRS)

    Keenan, Alexandra; Young, Millennia; Saile, Lynn; Boley, Lynn; Walton, Marlei; Kerstman, Eric; Shah, Ronak; Goodenow, Debra A.; Myers, Jerry G.

    2015-01-01

    The Integrated Medical Model (IMM) is a probabilistic model that uses simulation to predict mission medical risk. Given a specific mission and crew scenario, medical events are simulated using Monte Carlo methodology to provide estimates of resource utilization, probability of evacuation, probability of loss of crew, and the amount of mission time lost due to illness. Mission and crew scenarios are defined by mission length, extravehicular activity (EVA) schedule, and crew characteristics including: sex, coronary artery calcium score, contacts, dental crowns, history of abdominal surgery, and EVA eligibility. The Integrated Medical Evidence Database (iMED) houses the model inputs for one hundred medical conditions using in-flight, analog, and terrestrial medical data. Inputs include incidence, event durations, resource utilization, and crew functional impairment. Severity of conditions is addressed by defining statistical distributions on the dichotomized best and worst-case scenarios for each condition. The outcome distributions for conditions are bounded by the treatment extremes of the fully treated scenario in which all required resources are available and the untreated scenario in which no required resources are available. Upon occurrence of a simulated medical event, treatment availability is assessed, and outcomes are generated depending on the status of the affected crewmember at the time of onset, including any pre-existing functional impairments or ongoing treatment of concurrent conditions. The main IMM outcomes, including probability of evacuation and loss of crew life, time lost due to medical events, and resource utilization, are useful in informing mission planning decisions. To date, the IMM has been used to assess mission-specific risks with and without certain crewmember characteristics, to determine the impact of eliminating certain resources from the mission medical kit, and to design medical kits that maximally benefit crew health while meeting mass and volume constraints.

  20. The Integrated Medical Model: A Probabilistic Simulation Model Predicting In-Flight Medical Risks

    NASA Technical Reports Server (NTRS)

    Keenan, Alexandra; Young, Millennia; Saile, Lynn; Boley, Lynn; Walton, Marlei; Kerstman, Eric; Shah, Ronak; Goodenow, Debra A.; Myers, Jerry G., Jr.

    2015-01-01

    The Integrated Medical Model (IMM) is a probabilistic model that uses simulation to predict mission medical risk. Given a specific mission and crew scenario, medical events are simulated using Monte Carlo methodology to provide estimates of resource utilization, probability of evacuation, probability of loss of crew, and the amount of mission time lost due to illness. Mission and crew scenarios are defined by mission length, extravehicular activity (EVA) schedule, and crew characteristics including: sex, coronary artery calcium score, contacts, dental crowns, history of abdominal surgery, and EVA eligibility. The Integrated Medical Evidence Database (iMED) houses the model inputs for one hundred medical conditions using in-flight, analog, and terrestrial medical data. Inputs include incidence, event durations, resource utilization, and crew functional impairment. Severity of conditions is addressed by defining statistical distributions on the dichotomized best and worst-case scenarios for each condition. The outcome distributions for conditions are bounded by the treatment extremes of the fully treated scenario in which all required resources are available and the untreated scenario in which no required resources are available. Upon occurrence of a simulated medical event, treatment availability is assessed, and outcomes are generated depending on the status of the affected crewmember at the time of onset, including any pre-existing functional impairments or ongoing treatment of concurrent conditions. The main IMM outcomes, including probability of evacuation and loss of crew life, time lost due to medical events, and resource utilization, are useful in informing mission planning decisions. To date, the IMM has been used to assess mission-specific risks with and without certain crewmember characteristics, to determine the impact of eliminating certain resources from the mission medical kit, and to design medical kits that maximally benefit crew health while meeting mass and volume constraints.

  1. Database resources of the National Center for Biotechnology Information: 2002 update

    PubMed Central

    Wheeler, David L.; Church, Deanna M.; Lash, Alex E.; Leipe, Detlef D.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Tatusova, Tatiana A.; Wagner, Lukas; Rapp, Barbara A.

    2002-01-01

    In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, Human¡VMouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. PMID:11752242

  2. ESTree db: a Tool for Peach Functional Genomics

    PubMed Central

    Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

    2005-01-01

    Background The ESTree db represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. Results The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. Conclusion The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig. PMID:16351742

  3. ESTree db: a tool for peach functional genomics.

    PubMed

    Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

    2005-12-01

    The ESTree db http://www.itb.cnr.it/estree/ represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig.

  4. Distributed spatial information integration based on web service

    NASA Astrophysics Data System (ADS)

    Tong, Hengjian; Zhang, Yun; Shao, Zhenfeng

    2008-10-01

    Spatial information systems and spatial information in different geographic locations usually belong to different organizations. They are distributed and often heterogeneous and independent from each other. This leads to the fact that many isolated spatial information islands are formed, reducing the efficiency of information utilization. In order to address this issue, we present a method for effective spatial information integration based on web service. The method applies asynchronous invocation of web service and dynamic invocation of web service to implement distributed, parallel execution of web map services. All isolated information islands are connected by the dispatcher of web service and its registration database to form a uniform collaborative system. According to the web service registration database, the dispatcher of web services can dynamically invoke each web map service through an asynchronous delegating mechanism. All of the web map services can be executed at the same time. When each web map service is done, an image will be returned to the dispatcher. After all of the web services are done, all images are transparently overlaid together in the dispatcher. Thus, users can browse and analyze the integrated spatial information. Experiments demonstrate that the utilization rate of spatial information resources is significantly raised thought the proposed method of distributed spatial information integration.

  5. Distributed spatial information integration based on web service

    NASA Astrophysics Data System (ADS)

    Tong, Hengjian; Zhang, Yun; Shao, Zhenfeng

    2009-10-01

    Spatial information systems and spatial information in different geographic locations usually belong to different organizations. They are distributed and often heterogeneous and independent from each other. This leads to the fact that many isolated spatial information islands are formed, reducing the efficiency of information utilization. In order to address this issue, we present a method for effective spatial information integration based on web service. The method applies asynchronous invocation of web service and dynamic invocation of web service to implement distributed, parallel execution of web map services. All isolated information islands are connected by the dispatcher of web service and its registration database to form a uniform collaborative system. According to the web service registration database, the dispatcher of web services can dynamically invoke each web map service through an asynchronous delegating mechanism. All of the web map services can be executed at the same time. When each web map service is done, an image will be returned to the dispatcher. After all of the web services are done, all images are transparently overlaid together in the dispatcher. Thus, users can browse and analyze the integrated spatial information. Experiments demonstrate that the utilization rate of spatial information resources is significantly raised thought the proposed method of distributed spatial information integration.

  6. Databases, Repositories and Other Data Resources in Structural Biology

    PubMed Central

    Zheng, Heping; Porebski, Przemyslaw J.; Grabowski, Marek; Cooper, David R.; Minor, Wladek

    2017-01-01

    Structural biology, like many other areas of modern science, produces an enormous amount of primary, derived, and “meta” data with a high demand on data storage and manipulations. Primary data comes from various steps of sample preparation, diffraction experiments, and functional studies. These data are not only used to obtain tangible results, like macromolecular structural models, but also to enrich and guide our analysis and interpretation of existing biomedical studies. Herein we define several categories of data resources, (a) Archives, (b) Repositories, (c) “Databases” and (d) Advanced Information Systems, that can accommodate primary, derived, or reference data. Data resources may be used either as web portals or internally by structural biology software. To be useful, each resource must be maintained, curated, and be integrated with other resources. Ideally, the system of interconnected resources should evolve toward comprehensive “hubs” or Advanced Information Systems. Such systems, encompassing the PDB and UniProt, are indispensable not only for structural biology, but for many related fields of science. The categories of data resources described herein are applicable well beyond our usual scientific endeavors. PMID:28573593

  7. Strategic Integration of Multiple Bioinformatics Resources for System Level Analysis of Biological Networks.

    PubMed

    D'Souza, Mark; Sulakhe, Dinanath; Wang, Sheng; Xie, Bing; Hashemifar, Somaye; Taylor, Andrew; Dubchak, Inna; Conrad Gilliam, T; Maltsev, Natalia

    2017-01-01

    Recent technological advances in genomics allow the production of biological data at unprecedented tera- and petabyte scales. Efficient mining of these vast and complex datasets for the needs of biomedical research critically depends on a seamless integration of the clinical, genomic, and experimental information with prior knowledge about genotype-phenotype relationships. Such experimental data accumulated in publicly available databases should be accessible to a variety of algorithms and analytical pipelines that drive computational analysis and data mining.We present an integrated computational platform Lynx (Sulakhe et al., Nucleic Acids Res 44:D882-D887, 2016) ( http://lynx.cri.uchicago.edu ), a web-based database and knowledge extraction engine. It provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization. It gives public access to the Lynx integrated knowledge base (LynxKB) and its analytical tools via user-friendly web services and interfaces. The Lynx service-oriented architecture supports annotation and analysis of high-throughput experimental data. Lynx tools assist the user in extracting meaningful knowledge from LynxKB and experimental data, and in the generation of weighted hypotheses regarding the genes and molecular mechanisms contributing to human phenotypes or conditions of interest. The goal of this integrated platform is to support the end-to-end analytical needs of various translational projects.

  8. [Bio-Resources and Database for Preemptive Medicine.

    PubMed

    Saito, Kuniaki

    2016-05-01

    Establishing a primary defense for the improvement of individual quality of life by epidemiology and various clinical studies applying bio-resources/database analysis is very important. Furthermore, recent studies on understanding the epigenetic regulatory mechanisms of developmental origins of health and diseases are attracting increasing interest. Therefore, the storing of not only bio-fluid (i.e., blood, urine) but also certain tissues (i.e., placenta, cord) is very important for research. The Resource Center for Health Science (RECHS) and Bio-databases Institute of Reproductive and Developmental Medicine (BIRD) have estab- lished Bio-bank and initiated a project based on the development and utilization of bio-resources/database, comprising personal health records (PHR), such as health/medical records including individual records of daily diet and exercise, physically consolidated with bio-resources, taken from the same individuals. These Bio-Resources/Database projects are very important for the establishment of preemptive medicine and un- derstanding the mechanisms of the developmental origins of health and diseases.

  9. An Integrated Intranet and Dynamic Database Application for the Security Manager at Naval Postgraduate School

    DTIC Science & Technology

    2002-09-01

    Basic for Applications ( VBA ) 6.0 as macros may not be supported in 8 future versions of Access. Access 2000 offers Internet- related features for...security features from Microsoft’s SQL Server. [1] 3. System Requirements Access 2000 is a resource-intensive application as are all Office 2000...1] • Modules – Functions and procedures written in the Visual Basic for Applications ( VBA ) programming language. The capabilities of modules

  10. Semantic Web repositories for genomics data using the eXframe platform

    PubMed Central

    2014-01-01

    Background With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. Methods To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Conclusions Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge. PMID:25093072

  11. e-GRASP: an integrated evolutionary and GRASP resource for exploring disease associations.

    PubMed

    Karim, Sajjad; NourEldin, Hend Fakhri; Abusamra, Heba; Salem, Nada; Alhathli, Elham; Dudley, Joel; Sanderford, Max; Scheinfeldt, Laura B; Chaudhary, Adeel G; Al-Qahtani, Mohammed H; Kumar, Sudhir

    2016-10-17

    Genome-wide association studies (GWAS) have become a mainstay of biological research concerned with discovering genetic variation linked to phenotypic traits and diseases. Both discrete and continuous traits can be analyzed in GWAS to discover associations between single nucleotide polymorphisms (SNPs) and traits of interest. Associations are typically determined by estimating the significance of the statistical relationship between genetic loci and the given trait. However, the prioritization of bona fide, reproducible genetic associations from GWAS results remains a central challenge in identifying genomic loci underlying common complex diseases. Evolutionary-aware meta-analysis of the growing GWAS literature is one way to address this challenge and to advance from association to causation in the discovery of genotype-phenotype relationships. We have created an evolutionary GWAS resource to enable in-depth query and exploration of published GWAS results. This resource uses the publically available GWAS results annotated in the GRASP2 database. The GRASP2 database includes results from 2082 studies, 177 broad phenotype categories, and ~8.87 million SNP-phenotype associations. For each SNP in e-GRASP, we present information from the GRASP2 database for convenience as well as evolutionary information (e.g., rate and timespan). Users can, therefore, identify not only SNPs with highly significant phenotype-association P-values, but also SNPs that are highly replicated and/or occur at evolutionarily conserved sites that are likely to be functionally important. Additionally, we provide an evolutionary-adjusted SNP association ranking (E-rank) that uses cross-species evolutionary conservation scores and population allele frequencies to transform P-values in an effort to enhance the discovery of SNPs with a greater probability of biologically meaningful disease associations. By adding an evolutionary dimension to the GWAS results available in the GRASP2 database, our e-GRASP resource will enable a more effective exploration of SNPs not only by the statistical significance of trait associations, but also by the number of studies in which associations have been replicated, and the evolutionary context of the associated mutations. Therefore, e-GRASP will be a valuable resource for aiding researchers in the identification of bona fide, reproducible genetic associations from GWAS results. This resource is freely available at http://www.mypeg.info/egrasp .

  12. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease.

    PubMed

    Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T; van Oven, Mannis; Wallace, Douglas C; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J; Gai, Xiaowu

    2016-06-01

    MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. © 2016 WILEY PERIODICALS, INC.

  13. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease

    PubMed Central

    Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T.; van Oven, Mannis; Wallace, Douglas C.; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F.; Attimonelli, Marcella; Zuchner, Stephan

    2016-01-01

    MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and disease. MSeqDR-LSDB is a locus specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar-compliant variant annotations. PhenoTips is used for phenotypic data submission on de-identified patients using human phenotype ontology terminology. Development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. PMID:26919060

  14. New tools and methods for direct programmatic access to the dbSNP relational database

    PubMed Central

    Saccone, Scott F.; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A.; Rice, John P.

    2011-01-01

    Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale. PMID:21037260

  15. Lessons Learned Implementing DOORS in a Citrix Environment

    NASA Technical Reports Server (NTRS)

    Bussman, Marie

    2005-01-01

    NASA's James Web Space Telescope (JWST) Project is a large multi-national project with geographically dispersed contractors that all need access to the Projects requirement database. Initially, the project utilized multiple DOORS databases with the built-in partitions feature to exchange modules amongst the various contractor sites. As the requirements databases matured the use of partitions became extremely difficult. There have been many issues such as incompatible versions of DOORS, inefficient mechanism for sharing modules, security concerns, performance issues, and inconsistent document import and export formats. Deployment of the client software with limited IT resources available was also an issue. The solution chosen by JWST was to integrate the use of a Citrix environment with the DOORS database to address most of the project concerns. The use of the Citrix solution allowed a single Requirements database in a secure environment via a web interface. The Citrix environment allows JWST to upgrade to the most current version of DOORS without having to coordinate multiple sites and user upgrades. The single requirements database eliminates a multitude of Configuration Management concerns and facilitated the standardization of documentation formats. This paper discusses the obstacles and the lessons learned throughout the installation, implementation, usage and deployment process of a centralized DOORS database solution.

  16. A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework.

    PubMed

    Bandrowski, A E; Cachat, J; Li, Y; Müller, H M; Sternberg, P W; Ciccarese, P; Clark, T; Marenco, L; Wang, R; Astakhov, V; Grethe, J S; Martone, M E

    2012-01-01

    The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is 'hidden' from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to reduce the workload of the curators, it has resulted in valuable analytic by-products that address accessibility, use and citation of resources that can now be shared with resource owners and the larger scientific community. DATABASE URL: http://neuinfo.org.

  17. Design of Community Resource Inventories as a Component of Scalable Earth Science Infrastructure: Experience of the Earthcube CINERGI Project

    NASA Astrophysics Data System (ADS)

    Zaslavsky, I.; Richard, S. M.; Valentine, D. W., Jr.; Grethe, J. S.; Hsu, L.; Malik, T.; Bermudez, L. E.; Gupta, A.; Lehnert, K. A.; Whitenack, T.; Ozyurt, I. B.; Condit, C.; Calderon, R.; Musil, L.

    2014-12-01

    EarthCube is envisioned as a cyberinfrastructure that fosters new, transformational geoscience by enabling sharing, understanding and scientifically-sound and efficient re-use of formerly unconnected data resources, software, models, repositories, and computational power. Its purpose is to enable science enterprise and workforce development via an extensible and adaptable collaboration and resource integration framework. A key component of this vision is development of comprehensive inventories supporting resource discovery and re-use across geoscience domains. The goal of the EarthCube CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability) project is to create a methodology and assemble a large inventory of high-quality information resources with standard metadata descriptions and traceable provenance. The inventory is compiled from metadata catalogs maintained by geoscience data facilities, as well as from user contributions. The latter mechanism relies on community resource viewers: online applications that support update and curation of metadata records. Once harvested into CINERGI, metadata records from domain catalogs and community resource viewers are loaded into a staging database implemented in MongoDB, and validated for compliance with ISO 19139 metadata schema. Several types of metadata defects detected by the validation engine are automatically corrected with help of several information extractors or flagged for manual curation. The metadata harvesting, validation and processing components generate provenance statements using W3C PROV notation, which are stored in a Neo4J database. Thus curated metadata, along with the provenance information, is re-published and accessed programmatically and via a CINERGI online application. This presentation focuses on the role of resource inventories in a scalable and adaptable information infrastructure, and on the CINERGI metadata pipeline and its implementation challenges. Key project components are described at the project's website (http://workspace.earthcube.org/cinergi), which also provides access to the initial resource inventory, the inventory metadata model, metadata entry forms and a collection of the community resource viewers.

  18. A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework

    PubMed Central

    Bandrowski, A. E.; Cachat, J.; Li, Y.; Müller, H. M.; Sternberg, P. W.; Ciccarese, P.; Clark, T.; Marenco, L.; Wang, R.; Astakhov, V.; Grethe, J. S.; Martone, M. E.

    2012-01-01

    The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is ‘hidden’ from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to reduce the workload of the curators, it has resulted in valuable analytic by-products that address accessibility, use and citation of resources that can now be shared with resource owners and the larger scientific community. Database URL: http://neuinfo.org PMID:22434839

  19. PlasmoGEM, a database supporting a community resource for large-scale experimental genetics in malaria parasites.

    PubMed

    Schwach, Frank; Bushell, Ellen; Gomes, Ana Rita; Anar, Burcu; Girling, Gareth; Herd, Colin; Rayner, Julian C; Billker, Oliver

    2015-01-01

    The Plasmodium Genetic Modification (PlasmoGEM) database (http://plasmogem.sanger.ac.uk) provides access to a resource of modular, versatile and adaptable vectors for genome modification of Plasmodium spp. parasites. PlasmoGEM currently consists of >2000 plasmids designed to modify the genome of Plasmodium berghei, a malaria parasite of rodents, which can be requested by non-profit research organisations free of charge. PlasmoGEM vectors are designed with long homology arms for efficient genome integration and carry gene specific barcodes to identify individual mutants. They can be used for a wide array of applications, including protein localisation, gene interaction studies and high-throughput genetic screens. The vector production pipeline is supported by a custom software suite that automates both the vector design process and quality control by full-length sequencing of the finished vectors. The PlasmoGEM web interface allows users to search a database of finished knock-out and gene tagging vectors, view details of their designs, download vector sequence in different formats and view available quality control data as well as suggested genotyping strategies. We also make gDNA library clones and intermediate vectors available for researchers to produce vectors for themselves. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data

    PubMed Central

    Köhler, Sebastian; Doelken, Sandra C.; Mungall, Christopher J.; Bauer, Sebastian; Firth, Helen V.; Bailleul-Forestier, Isabelle; Black, Graeme C. M.; Brown, Danielle L.; Brudno, Michael; Campbell, Jennifer; FitzPatrick, David R.; Eppig, Janan T.; Jackson, Andrew P.; Freson, Kathleen; Girdea, Marta; Helbig, Ingo; Hurst, Jane A.; Jähn, Johanna; Jackson, Laird G.; Kelly, Anne M.; Ledbetter, David H.; Mansour, Sahar; Martin, Christa L.; Moss, Celia; Mumford, Andrew; Ouwehand, Willem H.; Park, Soo-Mi; Riggs, Erin Rooney; Scott, Richard H.; Sisodiya, Sanjay; Vooren, Steven Van; Wapner, Ronald J.; Wilkie, Andrew O. M.; Wright, Caroline F.; Vulto-van Silfhout, Anneke T.; de Leeuw, Nicole; de Vries, Bert B. A.; Washingthon, Nicole L.; Smith, Cynthia L.; Westerfield, Monte; Schofield, Paul; Ruef, Barbara J.; Gkoutos, Georgios V.; Haendel, Melissa; Smedley, Damian; Lewis, Suzanna E.; Robinson, Peter N.

    2014-01-01

    The Human Phenotype Ontology (HPO) project, available at http://www.human-phenotype-ontology.org, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online. PMID:24217912

  1. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers

    PubMed Central

    Ning, Shangwei; Zhang, Jizhou; Wang, Peng; Zhi, Hui; Wang, Jianjian; Liu, Yue; Gao, Yue; Guo, Maoni; Yue, Ming; Wang, Lihua; Li, Xia

    2016-01-01

    Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) is a manually curated database of cancer-associated long non-coding RNAs (lncRNAs) with experimental support that aims to provide a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. LncRNAs represent a large category of functional RNA molecules that play a significant role in human cancers. A curated collection and summary of deregulated lncRNAs in cancer is essential to thoroughly understand the mechanisms and functions of lncRNAs. Here, we developed the Lnc2Cancer database, which contains 1057 manually curated associations between 531 lncRNAs and 86 human cancers. Each association includes lncRNA and cancer name, the lncRNA expression pattern, experimental techniques, a brief functional description, the original reference and additional annotation information. Lnc2Cancer provides a user-friendly interface to conveniently browse, retrieve and download data. Lnc2Cancer also offers a submission page for researchers to submit newly validated lncRNA-cancer associations. With the rapidly increasing interest in lncRNAs, Lnc2Cancer will significantly improve our understanding of lncRNA deregulation in cancer and has the potential to be a timely and valuable resource. PMID:26481356

  2. Reference manual for data base on Nevada water-rights permits

    USGS Publications Warehouse

    Cartier, K.D.; Bauer, E.M.; Farnham, J.L.

    1995-01-01

    The U.S. Geological Survey and Nevada Division of Water Resources have cooperatively developed and implemented a data-base system for managing water-rights permit information for the State of Nevada. The Water-Rights Permit data base is part of an integrated system of computer data bases using the Ingres Relational Data-Base Manage-ment System, which allows efficient storage and access to water information from the State Engineer's office. The data base contains a main table, three ancillary tables, and five lookup tables, as well as a menu-driven system for entering, updating, and reporting on the data. This reference guide outlines the general functions of the system and provides a brief description of data tables and data-entry screens.

  3. An improved global wind resource estimate for integrated assessment models

    DOE PAGES

    Eurek, Kelly; Sullivan, Patrick; Gleason, Michael; ...

    2017-11-25

    This study summarizes initial steps to improving the robustness and accuracy of global renewable resource and techno-economic assessments for use in integrated assessment models. We outline a method to construct country-level wind resource supply curves, delineated by resource quality and other parameters. Using mesoscale reanalysis data, we generate estimates for wind quality, both terrestrial and offshore, across the globe. Because not all land or water area is suitable for development, appropriate database layers provide exclusions to reduce the total resource to its technical potential. We expand upon estimates from related studies by: using a globally consistent data source of uniquelymore » detailed wind speed characterizations; assuming a non-constant coefficient of performance for adjusting power curves for altitude; categorizing the distance from resource sites to the electric power grid; and characterizing offshore exclusions on the basis of sea ice concentrations. The product, then, is technical potential by country, classified by resource quality as determined by net capacity factor. Additional classifications dimensions are available, including distance to transmission networks for terrestrial wind and distance to shore and water depth for offshore. We estimate the total global wind generation potential of 560 PWh for terrestrial wind with 90% of resource classified as low-to-mid quality, and 315 PWh for offshore wind with 67% classified as mid-to-high quality. These estimates are based on 3.5 MW composite wind turbines with 90 m hub heights, 0.95 availability, 90% array efficiency, and 5 MW/km 2 deployment density in non-excluded areas. We compare the underlying technical assumption and results with other global assessments.« less

  4. An improved global wind resource estimate for integrated assessment models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eurek, Kelly; Sullivan, Patrick; Gleason, Michael

    This study summarizes initial steps to improving the robustness and accuracy of global renewable resource and techno-economic assessments for use in integrated assessment models. We outline a method to construct country-level wind resource supply curves, delineated by resource quality and other parameters. Using mesoscale reanalysis data, we generate estimates for wind quality, both terrestrial and offshore, across the globe. Because not all land or water area is suitable for development, appropriate database layers provide exclusions to reduce the total resource to its technical potential. We expand upon estimates from related studies by: using a globally consistent data source of uniquelymore » detailed wind speed characterizations; assuming a non-constant coefficient of performance for adjusting power curves for altitude; categorizing the distance from resource sites to the electric power grid; and characterizing offshore exclusions on the basis of sea ice concentrations. The product, then, is technical potential by country, classified by resource quality as determined by net capacity factor. Additional classifications dimensions are available, including distance to transmission networks for terrestrial wind and distance to shore and water depth for offshore. We estimate the total global wind generation potential of 560 PWh for terrestrial wind with 90% of resource classified as low-to-mid quality, and 315 PWh for offshore wind with 67% classified as mid-to-high quality. These estimates are based on 3.5 MW composite wind turbines with 90 m hub heights, 0.95 availability, 90% array efficiency, and 5 MW/km 2 deployment density in non-excluded areas. We compare the underlying technical assumption and results with other global assessments.« less

  5. Integration of Multidisciplinary Sensory Data:

    PubMed Central

    Miller, Perry L.; Nadkarni, Prakash; Singer, Michael; Marenco, Luis; Hines, Michael; Shepherd, Gordon

    2001-01-01

    The paper provides an overview of neuroinformatics research at Yale University being performed as part of the national Human Brain Project. This research is exploring the integration of multidisciplinary sensory data, using the olfactory system as a model domain. The neuroinformatics activities fall into three main areas: 1) building databases and related tools that support experimental olfactory research at Yale and can also serve as resources for the field as a whole, 2) using computer models (molecular models and neuronal models) to help understand data being collected experimentally and to help guide further laboratory experiments, 3) performing basic neuroinformatics research to develop new informatics technologies, including a flexible data model (EAV/CR, entity-attribute-value with classes and relationships) designed to facilitate the integration of diverse heterogeneous data within a single unifying framework. PMID:11141511

  6. Geo-Semantic Framework for Integrating Long-Tail Data and Model Resources for Advancing Earth System Science

    NASA Astrophysics Data System (ADS)

    Elag, M.; Kumar, P.

    2014-12-01

    Often, scientists and small research groups collect data, which target to address issues and have limited geographic or temporal range. A large number of such collections together constitute a large database that is of immense value to Earth Science studies. Complexity of integrating these data include heterogeneity in dimensions, coordinate systems, scales, variables, providers, users and contexts. They have been defined as long-tail data. Similarly, we use "long-tail models" to characterize a heterogeneous collection of models and/or modules developed for targeted problems by individuals and small groups, which together provide a large valuable collection. Complexity of integrating across these models include differing variable names and units for the same concept, model runs at different time steps and spatial resolution, use of differing naming and reference conventions, etc. Ability to "integrate long-tail models and data" will provide an opportunity for the interoperability and reusability of communities' resources, where not only models can be combined in a workflow, but each model will be able to discover and (re)use data in application specific context of space, time and questions. This capability is essential to represent, understand, predict, and manage heterogeneous and interconnected processes and activities by harnessing the complex, heterogeneous, and extensive set of distributed resources. Because of the staggering production rate of long-tail models and data resulting from the advances in computational, sensing, and information technologies, an important challenge arises: how can geoinformatics bring together these resources seamlessly, given the inherent complexity among model and data resources that span across various domains. We will present a semantic-based framework to support integration of "long-tail" models and data. This builds on existing technologies including: (i) SEAD (Sustainable Environmental Actionable Data) which supports curation and preservation of long-tail data during its life-cycle; (ii) BrownDog, which enhances the machine interpretability of large unstructured and uncurated data; and (iii) CSDMS (Community Surface Dynamics Modeling System), which "componentizes" models by providing plug-and-play environment for models integration.

  7. Community Engagement to Drive Best Practices and Scientific Advancement

    NASA Astrophysics Data System (ADS)

    Goring, S. J.; Williams, J. W.; Uhen, M. D.; McClennen, M.; Jenkins, J.; Peters, S. E.; Grimm, E. C.; Anderson, M.; Fils, D.; Lehnert, K.; Carter, M.

    2016-12-01

    The development of databases, data models, and tools around Earth Science data requires constant feedback from user communities. Users must be engaged in all aspects of data upload and access, curation and governance, and, particularly, in highlighting future opportunities for scientific discovery using the data resources. A challenge for data repositories, many of which have evolved organically and independently, is moving from Systems of Record - data silos with only limited input and output options - to Systems of Engagement, that respond to users and interact with other user communities and data repositories across the geosciences and beyond. The Cyber4Paleo Community Development Workshop (http://cyber4paleo.github.io), held June 20 & 21st in Boulder, CO, was organized by the EarthCube Research Coordination Network C4P (Cyber4Paleo) to bring together disciplinary researchers and Principles within data collectives in an effort to drive scientific applications of the collective data resources. C4P focuses on coordinating data and user groups within the allied paleogeoscientific disciplines. Over the course of two days researchers developed research projects that examined standards of 210Pb dating in the published literature, a framework for implementing a common geological time scale across resources, the continued development of underlying data resources, tools to integrate climate and occupation data from paleoecological resources, and the implementation of harmonizing standards across databases. Scientific outcomes of the workshop serve to underpin our understanding of the interrelations between paleoecological data and geophysical components of the Earth System at short and long time scales. These tools enhance our ability to understand connections between and among proxies, across space and time, the serve as outreach tools for training and education, and, importantly, they help to define and improve best practices within the databases, by engaging directly with user communities to fill unanticipated needs.

  8. A comparative study of six European databases of medically oriented Web resources.

    PubMed

    Abad García, Francisca; González Teruel, Aurora; Bayo Calduch, Patricia; de Ramón Frias, Rosa; Castillo Blasco, Lourdes

    2005-10-01

    The paper describes six European medically oriented databases of Web resources, pertaining to five quality-controlled subject gateways, and compares their performance. The characteristics, coverage, procedure for selecting Web resources, record structure, searching possibilities, and existence of user assistance were described for each database. Performance indicators for each database were obtained by means of searches carried out using the key words, "myocardial infarction." Most of the databases originated in the 1990s in an academic or library context and include all types of Web resources of an international nature. Five databases use Medical Subject Headings. The number of fields per record varies between three and nineteen. The language of the search interfaces is mostly English, and some of them allow searches in other languages. In some databases, the search can be extended to Pubmed. Organizing Medical Networked Information, Catalogue et Index des Sites Médicaux Francophones, and Diseases, Disorders and Related Topics produced the best results. The usefulness of these databases as quick reference resources is clear. In addition, their lack of content overlap means that, for the user, they complement each other. Their continued survival faces three challenges: the instability of the Internet, maintenance costs, and lack of use in spite of their potential usefulness.

  9. MNDR v2.0: an updated resource of ncRNA–disease associations in mammals

    PubMed Central

    Cui, Tianyu; Zhang, Lin; Huang, Yan; Yi, Ying; Tan, Puwen; Zhao, Yue; Hu, Yongfei

    2018-01-01

    Abstract Accumulating evidence suggests that diverse non-coding RNAs (ncRNAs) are involved in the progression of a wide variety of diseases. In recent years, abundant ncRNA–disease associations have been found and predicted according to experiments and prediction algorithms. Diverse ncRNA–disease associations are scattered over many resources and mammals, whereas a global view of diverse ncRNA–disease associations is not available for any mammals. Hence, we have updated the MNDR v2.0 database (www.rna-society.org/mndr/) by integrating experimental and prediction associations from manual literature curation and other resources under one common framework. The new developments in MNDR v2.0 include (i) an over 220-fold increase in ncRNA–disease associations enhancement compared with the previous version (including lncRNA, miRNA, piRNA, snoRNA and more than 1400 diseases); (ii) integrating experimental and prediction evidence from 14 resources and prediction algorithms for each ncRNA–disease association; (iii) mapping disease names to the Disease Ontology and Medical Subject Headings (MeSH); (iv) providing a confidence score for each ncRNA–disease association and (v) an increase of species coverage to six mammals. Finally, MNDR v2.0 intends to provide the scientific community with a resource for efficient browsing and extraction of the associations between diverse ncRNAs and diseases, including >260 000 ncRNA–disease associations. PMID:29106639

  10. Ontology based heterogeneous materials database integration and semantic query

    NASA Astrophysics Data System (ADS)

    Zhao, Shuai; Qian, Quan

    2017-10-01

    Materials digital data, high throughput experiments and high throughput computations are regarded as three key pillars of materials genome initiatives. With the fast growth of materials data, the integration and sharing of data is very urgent, that has gradually become a hot topic of materials informatics. Due to the lack of semantic description, it is difficult to integrate data deeply in semantic level when adopting the conventional heterogeneous database integration approaches such as federal database or data warehouse. In this paper, a semantic integration method is proposed to create the semantic ontology by extracting the database schema semi-automatically. Other heterogeneous databases are integrated to the ontology by means of relational algebra and the rooted graph. Based on integrated ontology, semantic query can be done using SPARQL. During the experiments, two world famous First Principle Computational databases, OQMD and Materials Project are used as the integration targets, which show the availability and effectiveness of our method.

  11. BioM2MetDisease: a manually curated database for associations between microRNAs, metabolites, small molecules and metabolic diseases

    PubMed Central

    Xu, Yanjun; Yang, Haixiu; Wu, Tan; Dong, Qun; Sun, Zeguo; Shang, Desi; Li, Feng; Xu, Yingqi; Su, Fei; Liu, Siyao

    2017-01-01

    Abstract BioM2MetDisease is a manually curated database that aims to provide a comprehensive and experimentally supported resource of associations between metabolic diseases and various biomolecules. Recently, metabolic diseases such as diabetes have become one of the leading threats to people’s health. Metabolic disease associated with alterations of multiple types of biomolecules such as miRNAs and metabolites. An integrated and high-quality data source that collection of metabolic disease associated biomolecules is essential for exploring the underlying molecular mechanisms and discovering novel therapeutics. Here, we developed the BioM2MetDisease database, which currently documents 2681 entries of relationships between 1147 biomolecules (miRNAs, metabolites and small molecules/drugs) and 78 metabolic diseases across 14 species. Each entry includes biomolecule category, species, biomolecule name, disease name, dysregulation pattern, experimental technique, a brief description of metabolic disease-biomolecule relationships, the reference, additional annotation information etc. BioM2MetDisease provides a user-friendly interface to explore and retrieve all data conveniently. A submission page was also offered for researchers to submit new associations between biomolecules and metabolic diseases. BioM2MetDisease provides a comprehensive resource for studying biology molecules act in metabolic diseases, and it is helpful for understanding the molecular mechanisms and developing novel therapeutics for metabolic diseases. Database URL: http://www.bio-bigdata.com/BioM2MetDisease/ PMID:28605773

  12. Workflow and web application for annotating NCBI BioProject transcriptome data

    PubMed Central

    Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A.; Barrero, Luz S.; Landsman, David

    2017-01-01

    Abstract The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database URL: http://www.ncbi.nlm.nih.gov/projects/physalis/ PMID:28605765

  13. NONATObase: a database for Polychaeta (Annelida) from the Southwestern Atlantic Ocean.

    PubMed

    Pagliosa, Paulo R; Doria, João G; Misturini, Dairana; Otegui, Mariana B P; Oortman, Mariana S; Weis, Wilson A; Faroni-Perez, Larisse; Alves, Alexandre P; Camargo, Maurício G; Amaral, A Cecília Z; Marques, Antonio C; Lana, Paulo C

    2014-01-01

    Networks can greatly advance data sharing attitudes by providing organized and useful data sets on marine biodiversity in a friendly and shared scientific environment. NONATObase, the interactive database on polychaetes presented herein, will provide new macroecological and taxonomic insights of the Southwestern Atlantic region. The database was developed by the NONATO network, a team of South American researchers, who integrated available information on polychaetes from between 5°N and 80°S in the Atlantic Ocean and near the Antarctic. The guiding principle of the database is to keep free and open access to data based on partnerships. Its architecture consists of a relational database integrated in the MySQL and PHP framework. Its web application allows access to the data from three different directions: species (qualitative data), abundance (quantitative data) and data set (reference data). The database has built-in functionality, such as the filter of data on user-defined taxonomic levels, characteristics of site, sample, sampler, and mesh size used. Considering that there are still many taxonomic issues related to poorly known regional fauna, a scientific committee was created to work out consistent solutions to current misidentifications and equivocal taxonomy status of some species. Expertise from this committee will be incorporated by NONATObase continually. The use of quantitative data was possible by standardization of a sample unit. All data, maps of distribution and references from a data set or a specified query can be visualized and exported to a commonly used data format in statistical analysis or reference manager software. The NONATO network has initialized with NONATObase, a valuable resource for marine ecologists and taxonomists. The database is expected to grow in functionality as it comes in useful, particularly regarding the challenges of dealing with molecular genetic data and tools to assess the effects of global environment change. Database URL: http://nonatobase.ufsc.br/.

  14. NONATObase: a database for Polychaeta (Annelida) from the Southwestern Atlantic Ocean

    PubMed Central

    Pagliosa, Paulo R.; Doria, João G.; Misturini, Dairana; Otegui, Mariana B. P.; Oortman, Mariana S.; Weis, Wilson A.; Faroni-Perez, Larisse; Alves, Alexandre P.; Camargo, Maurício G.; Amaral, A. Cecília Z.; Marques, Antonio C.; Lana, Paulo C.

    2014-01-01

    Networks can greatly advance data sharing attitudes by providing organized and useful data sets on marine biodiversity in a friendly and shared scientific environment. NONATObase, the interactive database on polychaetes presented herein, will provide new macroecological and taxonomic insights of the Southwestern Atlantic region. The database was developed by the NONATO network, a team of South American researchers, who integrated available information on polychaetes from between 5°N and 80°S in the Atlantic Ocean and near the Antarctic. The guiding principle of the database is to keep free and open access to data based on partnerships. Its architecture consists of a relational database integrated in the MySQL and PHP framework. Its web application allows access to the data from three different directions: species (qualitative data), abundance (quantitative data) and data set (reference data). The database has built-in functionality, such as the filter of data on user-defined taxonomic levels, characteristics of site, sample, sampler, and mesh size used. Considering that there are still many taxonomic issues related to poorly known regional fauna, a scientific committee was created to work out consistent solutions to current misidentifications and equivocal taxonomy status of some species. Expertise from this committee will be incorporated by NONATObase continually. The use of quantitative data was possible by standardization of a sample unit. All data, maps of distribution and references from a data set or a specified query can be visualized and exported to a commonly used data format in statistical analysis or reference manager software. The NONATO network has initialized with NONATObase, a valuable resource for marine ecologists and taxonomists. The database is expected to grow in functionality as it comes in useful, particularly regarding the challenges of dealing with molecular genetic data and tools to assess the effects of global environment change. Database URL: http://nonatobase.ufsc.br/ PMID:24573879

  15. A design for the geoinformatics system

    NASA Astrophysics Data System (ADS)

    Allison, M. L.

    2002-12-01

    Informatics integrates and applies information technologies with scientific and technical disciplines. A geoinformatics system targets the spatially based sciences. The system is not a master database, but will collect pertinent information from disparate databases distributed around the world. Seamless interoperability of databases promises quantum leaps in productivity not only for scientific researchers but also for many areas of society including business and government. The system will incorporate: acquisition of analog and digital legacy data; efficient information and data retrieval mechanisms (via data mining and web services); accessibility to and application of visualization, analysis, and modeling capabilities; online workspace, software, and tutorials; GIS; integration with online scientific journal aggregates and digital libraries; access to real time data collection and dissemination; user-defined automatic notification and quality control filtering for selection of new resources; and application to field techniques such as mapping. In practical terms, such a system will provide the ability to gather data over the Web from a variety of distributed sources, regardless of computer operating systems, database formats, and servers. Search engines will gather data about any geographic location, above, on, or below ground, covering any geologic time, and at any scale or detail. A distributed network of digital geolibraries can archive permanent copies of databases at risk of being discontinued and those that continue to be maintained by the data authors. The geoinformatics system will generate results from widely distributed sources to function as a dynamic data network. Instead of posting a variety of pre-made tables, charts, or maps based on static databases, the interactive dynamic system creates these products on the fly, each time an inquiry is made, using the latest information in the appropriate databases. Thus, in the dynamic system, a map generated today may differ from one created yesterday and one to be created tomorrow, because the databases used to make it are constantly (and sometimes automatically) being updated.

  16. Database resources of the National Center for Biotechnology Information

    PubMed Central

    Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Miller, Vadim; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Shumway, Martin; Sequeira, Edwin; Sherry, Steven T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L.; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene

    2008-01-01

    In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:18045790

  17. Functional integration of automated system databases by means of artificial intelligence

    NASA Astrophysics Data System (ADS)

    Dubovoi, Volodymyr M.; Nikitenko, Olena D.; Kalimoldayev, Maksat; Kotyra, Andrzej; Gromaszek, Konrad; Iskakova, Aigul

    2017-08-01

    The paper presents approaches for functional integration of automated system databases by means of artificial intelligence. The peculiarities of turning to account the database in the systems with the usage of a fuzzy implementation of functions were analyzed. Requirements for the normalization of such databases were defined. The question of data equivalence in conditions of uncertainty and collisions in the presence of the databases functional integration is considered and the model to reveal their possible occurrence is devised. The paper also presents evaluation method of standardization of integrated database normalization.

  18. Using Commercially available Tools for multi-faceted health assessment: Data Integration Lessons Learned

    PubMed Central

    Wilamowska, Katarzyna; Le, Thai; Demiris, George; Thompson, Hilaire

    2013-01-01

    Health monitoring data collected from multiple available intake devices provide a rich resource to support older adult health and wellness. Though large amounts of data can be collected, there is currently a lack of understanding on integration of these various data sources using commercially available products. This article describes an inexpensive approach to integrating data from multiple sources from a recently completed pilot project that assessed older adult wellness, and demonstrates challenges and benefits in pursuing data integration using commercially available products. The data in this project were sourced from a) electronically captured participant intake surveys, and existing commercial software output for b) vital signs and c) cognitive function. All the software used for data integration in this project was freeware and was chosen because of its ease of comprehension by novice database users. The methods and results of this approach provide a model for researchers with similar data integration needs to easily replicate this effort at a low cost. PMID:23728444

  19. Freely Accessible Chemical Database Resources of Compounds for in Silico Drug Discovery.

    PubMed

    Yang, JingFang; Wang, Di; Jia, Chenyang; Wang, Mengyao; Hao, GeFei; Yang, GuangFu

    2018-05-07

    In silico drug discovery has been proved to be a solidly established key component in early drug discovery. However, this task is hampered by the limitation of quantity and quality of compound databases for screening. In order to overcome these obstacles, freely accessible database resources of compounds have bloomed in recent years. Nevertheless, how to choose appropriate tools to treat these freely accessible databases are crucial. To the best of our knowledge, this is the first systematic review on this issue. The existed advantages and drawbacks of chemical databases were analyzed and summarized based on the collected six categories of freely accessible chemical databases from literature in this review. Suggestions on how and in which conditions the usage of these databases could be reasonable were provided. Tools and procedures for building 3D structure chemical libraries were also introduced. In this review, we described the freely accessible chemical database resources for in silico drug discovery. In particular, the chemical information for building chemical database appears as attractive resources for drug design to alleviate experimental pressure. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  20. DWARF – a data warehouse system for analyzing protein families

    PubMed Central

    Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen

    2006-01-01

    Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801

  1. Development and implementation of a psychotherapy tracking database in primary care.

    PubMed

    Craner, Julia R; Sawchuk, Craig N; Mack, John D; LeRoy, Michelle A

    2017-06-01

    Although there is a rapid increase in the integration of behavioral health services in primary care, few studies have evaluated the effectiveness of these services in real-world clinical settings, in part due to the difficulty of translating traditional mental health research designs to this setting. Accordingly, innovative approaches are needed to fit the unique challenges of conducting research in primary care. The development and implementation of one such approach is described in this article. A continuously populating database for psychotherapy services was implemented across 5 primary care clinics in a large health system to assess several levels of patient care, including service utilization, symptomatic outcomes, and session-by-session use of psychotherapy principles by providers. Each phase of implementation revealed challenges, including clinician time, dissemination to clinics with different resources, and fidelity of data collection strategy across providers, as well as benefits, including the generation of useful data to inform clinical care, program development, and empirical research. The feasible and sustainable implementation of data collection for routine clinical practice in primary care has the potential to fuel the evidence base around integrated care. The current project describes the development of an innovative approach that, with further empirical study and refinement, could enable health care professionals and systems to understand their population and clinical process in a way that addresses essential gaps in the integrated care literature. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  2. Integrating evidence-based teaching into to clinical practice should improve outcomes.

    PubMed

    Richards, Derek

    2005-01-01

    Sources used were Medline, Embase, the Education Resources Information Centre , Cochrane Controlled Trials Register, Cochrane Database of Systematic Reviews, the Database of Abstracts of Reviews of Effects, Health Technology Assessment database, Best Evidence, Best Evidence Medical Education and Science Citation Index, along with reference lists of known systematic reviews. Studies were chosen for inclusion if they evaluated the effects of postgraduate evidence-based medicine (EBM) or critical appraisal teaching in comparison with a control group or baseline before teaching, using a measure of participants' learning achievements or patients' health gains as outcomes. Articles were graded as either level 1 (randomised controlled trials (RCT)) or level 2 (non-randomised studies that either had a comparison with a control group), or a before and after comparison without a control group. Learning achievement was assessed separately for knowledge, critical appraisal skills, attitudes and behaviour. Because of obvious heterogeneity in the features of individual studies, their quality and assessment tools used, a meta-analysis could not be carried out. Conclusions were weighted by methodological quality. Twenty-three relevant studies were identified, comprising four RCT, seven non-RCT, and 12 before and after comparison studies. Eighteen studies (including two RCT) evaluated a standalone teaching method and five studies (including two RCT) evaluated a clinically integrated teaching method. Standalone teaching improved knowledge but not skills, attitudes or behaviour. Clinically integrated teaching improved knowledge, skills, attitudes and behaviour. Teaching of EBM should be moved from classrooms to clinical practice to achieve improvements in substantial outcomes.

  3. Database resources of the National Center for Biotechnology Information

    PubMed Central

    Wheeler, David L.; Church, Deanna M.; Lash, Alex E.; Leipe, Detlef D.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Tatusova, Tatiana A.; Wagner, Lukas; Rapp, Barbara A.

    2001-01-01

    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap’99, Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheri­tance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov. PMID:11125038

  4. NASA Remote Sensing Observations for Water Resource and Infrastructure Management

    NASA Astrophysics Data System (ADS)

    Granger, S. L.; Armstrong, L.; Farr, T.; Geller, G.; Heath, E.; Hyon, J.; Lavoie, S.; McDonald, K.; Realmuto, V.; Stough, T.; Szana, K.

    2008-12-01

    Decision support tools employed by water resource and infrastructure managers often utilize data products obtained from local sources or national/regional databases of historic surveys and observations. Incorporation of data from these sources can be laborious and time consuming as new products must be identified, cleaned and archived for each new study site. Adding remote sensing observations to the list of sources holds promise for a timely, consistent, global product to aid decision support at regional and global scales by providing global observations of geophysical parameters including soil moisture, precipitation, atmospheric temperature, derived evapotranspiration, and snow extent needed for hydrologic models and decision support tools. However, issues such as spatial and temporal resolution arise when attempting to integrate remote sensing observations into existing decision support tools. We are working to overcome these and other challenges through partnerships with water resource managers, tool developers and other stakeholders. We are developing a new data processing framework, enabled by a core GIS server, to seamlessly pull together observations from disparate sources for synthesis into information products and visualizations useful to the water resources community. A case study approach is being taken to develop the system by working closely with water infrastructure and resource managers to integrate remote observations into infrastructure, hydrologic and water resource decision tools. We present the results of a case study utilizing observations from the PALS aircraft instrument as a proxy for NASA's upcoming Soil Moisture Active Passive (SMAP) mission and an existing commercial decision support tool.

  5. ReMatch: a web-based tool to construct, store and share stoichiometric metabolic models with carbon maps for metabolic flux analysis.

    PubMed

    Pitkänen, Esa; Akerlund, Arto; Rantanen, Ari; Jouhten, Paula; Ukkonen, Esko

    2008-08-25

    ReMatch is a web-based, user-friendly tool that constructs stoichiometric network models for metabolic flux analysis, integrating user-developed models into a database collected from several comprehensive metabolic data resources, including KEGG, MetaCyc and CheBI. Particularly, ReMatch augments the metabolic reactions of the model with carbon mappings to facilitate (13)C metabolic flux analysis. The construction of a network model consisting of biochemical reactions is the first step in most metabolic modelling tasks. This model construction can be a tedious task as the required information is usually scattered to many separate databases whose interoperability is suboptimal, due to the heterogeneous naming conventions of metabolites in different databases. Another, particularly severe data integration problem is faced in (13)C metabolic flux analysis, where the mappings of carbon atoms from substrates into products in the model are required. ReMatch has been developed to solve the above data integration problems. First, ReMatch matches the imported user-developed model against the internal ReMatch database while considering a comprehensive metabolite name thesaurus. This, together with wild card support, allows the user to specify the model quickly without having to look the names up manually. Second, ReMatch is able to augment reactions of the model with carbon mappings, obtained either from the internal database or given by the user with an easy-touse tool. The constructed models can be exported into 13C-FLUX and SBML file formats. Further, a stoichiometric matrix and visualizations of the network model can be generated. The constructed models of metabolic networks can be optionally made available to the other users of ReMatch. Thus, ReMatch provides a common repository for metabolic network models with carbon mappings for the needs of metabolic flux analysis community. ReMatch is freely available for academic use at http://www.cs.helsinki.fi/group/sysfys/software/rematch/.

  6. Advancements in web-database applications for rabies surveillance.

    PubMed

    Rees, Erin E; Gendron, Bruno; Lelièvre, Frédérick; Coté, Nathalie; Bélanger, Denise

    2011-08-02

    Protection of public health from rabies is informed by the analysis of surveillance data from human and animal populations. In Canada, public health, agricultural and wildlife agencies at the provincial and federal level are responsible for rabies disease control, and this has led to multiple agency-specific data repositories. Aggregation of agency-specific data into one database application would enable more comprehensive data analyses and effective communication among participating agencies. In Québec, RageDB was developed to house surveillance data for the raccoon rabies variant, representing the next generation in web-based database applications that provide a key resource for the protection of public health. RageDB incorporates data from, and grants access to, all agencies responsible for the surveillance of raccoon rabies in Québec. Technological advancements of RageDB to rabies surveillance databases include (1) automatic integration of multi-agency data and diagnostic results on a daily basis; (2) a web-based data editing interface that enables authorized users to add, edit and extract data; and (3) an interactive dashboard to help visualize data simply and efficiently, in table, chart, and cartographic formats. Furthermore, RageDB stores data from citizens who voluntarily report sightings of rabies suspect animals. We also discuss how sightings data can indicate public perception to the risk of racoon rabies and thus aid in directing the allocation of disease control resources for protecting public health. RageDB provides an example in the evolution of spatio-temporal database applications for the storage, analysis and communication of disease surveillance data. The database was fast and inexpensive to develop by using open-source technologies, simple and efficient design strategies, and shared web hosting. The database increases communication among agencies collaborating to protect human health from raccoon rabies. Furthermore, health agencies have real-time access to a wide assortment of data documenting new developments in the raccoon rabies epidemic and this enables a more timely and appropriate response.

  7. Advancements in web-database applications for rabies surveillance

    PubMed Central

    2011-01-01

    Background Protection of public health from rabies is informed by the analysis of surveillance data from human and animal populations. In Canada, public health, agricultural and wildlife agencies at the provincial and federal level are responsible for rabies disease control, and this has led to multiple agency-specific data repositories. Aggregation of agency-specific data into one database application would enable more comprehensive data analyses and effective communication among participating agencies. In Québec, RageDB was developed to house surveillance data for the raccoon rabies variant, representing the next generation in web-based database applications that provide a key resource for the protection of public health. Results RageDB incorporates data from, and grants access to, all agencies responsible for the surveillance of raccoon rabies in Québec. Technological advancements of RageDB to rabies surveillance databases include 1) automatic integration of multi-agency data and diagnostic results on a daily basis; 2) a web-based data editing interface that enables authorized users to add, edit and extract data; and 3) an interactive dashboard to help visualize data simply and efficiently, in table, chart, and cartographic formats. Furthermore, RageDB stores data from citizens who voluntarily report sightings of rabies suspect animals. We also discuss how sightings data can indicate public perception to the risk of racoon rabies and thus aid in directing the allocation of disease control resources for protecting public health. Conclusions RageDB provides an example in the evolution of spatio-temporal database applications for the storage, analysis and communication of disease surveillance data. The database was fast and inexpensive to develop by using open-source technologies, simple and efficient design strategies, and shared web hosting. The database increases communication among agencies collaborating to protect human health from raccoon rabies. Furthermore, health agencies have real-time access to a wide assortment of data documenting new developments in the raccoon rabies epidemic and this enables a more timely and appropriate response. PMID:21810215

  8. The NASA Science Internet: An integrated approach to networking

    NASA Technical Reports Server (NTRS)

    Rounds, Fred

    1991-01-01

    An integrated approach to building a networking infrastructure is an absolute necessity for meeting the multidisciplinary science networking requirements of the Office of Space Science and Applications (OSSA) science community. These networking requirements include communication connectivity between computational resources, databases, and library systems, as well as to other scientists and researchers around the world. A consolidated networking approach allows strategic use of the existing science networking within the Federal government, and it provides networking capability that takes into consideration national and international trends towards multivendor and multiprotocol service. It also offers a practical vehicle for optimizing costs and maximizing performance. Finally, and perhaps most important to the development of high speed computing is that an integrated network constitutes a focus for phasing to the National Research and Education Network (NREN). The NASA Science Internet (NSI) program, established in mid 1988, is structured to provide just such an integrated network. A description of the NSI is presented.

  9. Towards BioDBcore: a community-defined information specification for biological databases

    PubMed Central

    Gaudet, Pascale; Bairoch, Amos; Field, Dawn; Sansone, Susanna-Assunta; Taylor, Chris; Attwood, Teresa K.; Bateman, Alex; Blake, Judith A.; Bult, Carol J.; Cherry, J. Michael; Chisholm, Rex L.; Cochrane, Guy; Cook, Charles E.; Eppig, Janan T.; Galperin, Michael Y.; Gentleman, Robert; Goble, Carole A.; Gojobori, Takashi; Hancock, John M.; Howe, Douglas G.; Imanishi, Tadashi; Kelso, Janet; Landsman, David; Lewis, Suzanna E.; Mizrachi, Ilene Karsch; Orchard, Sandra; Ouellette, B. F. Francis; Ranganathan, Shoba; Richardson, Lorna; Rocca-Serra, Philippe; Schofield, Paul N.; Smedley, Damian; Southan, Christopher; Tan, Tin Wee; Tatusova, Tatiana; Whetzel, Patricia L.; White, Owen; Yamasaki, Chisato

    2011-01-01

    The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases. PMID:21097465

  10. Towards BioDBcore: a community-defined information specification for biological databases

    PubMed Central

    Gaudet, Pascale; Bairoch, Amos; Field, Dawn; Sansone, Susanna-Assunta; Taylor, Chris; Attwood, Teresa K.; Bateman, Alex; Blake, Judith A.; Bult, Carol J.; Cherry, J. Michael; Chisholm, Rex L.; Cochrane, Guy; Cook, Charles E.; Eppig, Janan T.; Galperin, Michael Y.; Gentleman, Robert; Goble, Carole A.; Gojobori, Takashi; Hancock, John M.; Howe, Douglas G.; Imanishi, Tadashi; Kelso, Janet; Landsman, David; Lewis, Suzanna E.; Karsch Mizrachi, Ilene; Orchard, Sandra; Ouellette, B.F. Francis; Ranganathan, Shoba; Richardson, Lorna; Rocca-Serra, Philippe; Schofield, Paul N.; Smedley, Damian; Southan, Christopher; Tan, Tin W.; Tatusova, Tatiana; Whetzel, Patricia L.; White, Owen; Yamasaki, Chisato

    2011-01-01

    The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases. PMID:21205783

  11. Network portal: a database for storage, analysis and visualization of biological networks

    PubMed Central

    Turkarslan, Serdar; Wurtmann, Elisabeth J.; Wu, Wei-Ju; Jiang, Ning; Bare, J. Christopher; Foley, Karen; Reiss, David J.; Novichkov, Pavel; Baliga, Nitin S.

    2014-01-01

    The ease of generating high-throughput data has enabled investigations into organismal complexity at the systems level through the inference of networks of interactions among the various cellular components (genes, RNAs, proteins and metabolites). The wider scientific community, however, currently has limited access to tools for network inference, visualization and analysis because these tasks often require advanced computational knowledge and expensive computing resources. We have designed the network portal (http://networks.systemsbiology.net) to serve as a modular database for the integration of user uploaded and public data, with inference algorithms and tools for the storage, visualization and analysis of biological networks. The portal is fully integrated into the Gaggle framework to seamlessly exchange data with desktop and web applications and to allow the user to create, save and modify workspaces, and it includes social networking capabilities for collaborative projects. While the current release of the database contains networks for 13 prokaryotic organisms from diverse phylogenetic clades (4678 co-regulated gene modules, 3466 regulators and 9291 cis-regulatory motifs), it will be rapidly populated with prokaryotic and eukaryotic organisms as relevant data become available in public repositories and through user input. The modular architecture, simple data formats and open API support community development of the portal. PMID:24271392

  12. Human Disease Insight: An integrated knowledge-based platform for disease-gene-drug information.

    PubMed

    Tasleem, Munazzah; Ishrat, Romana; Islam, Asimul; Ahmad, Faizan; Hassan, Md Imtaiyaz

    2016-01-01

    The scope of the Human Disease Insight (HDI) database is not limited to researchers or physicians as it also provides basic information to non-professionals and creates disease awareness, thereby reducing the chances of patient suffering due to ignorance. HDI is a knowledge-based resource providing information on human diseases to both scientists and the general public. Here, our mission is to provide a comprehensive human disease database containing most of the available useful information, with extensive cross-referencing. HDI is a knowledge management system that acts as a central hub to access information about human diseases and associated drugs and genes. In addition, HDI contains well-classified bioinformatics tools with helpful descriptions. These integrated bioinformatics tools enable researchers to annotate disease-specific genes and perform protein analysis, search for biomarkers and identify potential vaccine candidates. Eventually, these tools will facilitate the analysis of disease-associated data. The HDI provides two types of search capabilities and includes provisions for downloading, uploading and searching disease/gene/drug-related information. The logistical design of the HDI allows for regular updating. The database is designed to work best with Mozilla Firefox and Google Chrome and is freely accessible at http://humandiseaseinsight.com. Copyright © 2015 King Saud Bin Abdulaziz University for Health Sciences. Published by Elsevier Ltd. All rights reserved.

  13. PGSB PlantsDB: updates to the database framework for comparative plant genome research.

    PubMed

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C; Martis, Mihaela M; Seidel, Michael; Kugler, Karl G; Gundlach, Heidrun; Mayer, Klaus F X

    2016-01-04

    PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. PeptideDepot: Flexible Relational Database for Visual Analysis of Quantitative Proteomic Data and Integration of Existing Protein Information

    PubMed Central

    Yu, Kebing; Salomon, Arthur R.

    2010-01-01

    Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through tandem mass spectrometry (MS/MS). Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to a variety of experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our High Throughput Autonomous Proteomic Pipeline (HTAPP) used in the automated acquisition and post-acquisition analysis of proteomic data. PMID:19834895

  15. Making your database available through Wikipedia: the pros and cons.

    PubMed

    Finn, Robert D; Gardner, Paul P; Bateman, Alex

    2012-01-01

    Wikipedia, the online encyclopedia, is the most famous wiki in use today. It contains over 3.7 million pages of content; with many pages written on scientific subject matters that include peer-reviewed citations, yet are written in an accessible manner and generally reflect the consensus opinion of the community. In this, the 19th Annual Database Issue of Nucleic Acids Research, there are 11 articles that describe the use of a wiki in relation to a biological database. In this commentary, we discuss how biological databases can be integrated with Wikipedia, thereby utilising the pre-existing infrastructure, tools and above all, large community of authors (or Wikipedians). The limitations to the content that can be included in Wikipedia are highlighted, with examples drawn from articles found in this issue and other wiki-based resources, indicating why other wiki solutions are necessary. We discuss the merits of using open wikis, like Wikipedia, versus other models, with particular reference to potential vandalism. Finally, we raise the question about the future role of dedicated database biocurators in context of the thousands of crowdsourced, community annotations that are now being stored in wikis.

  16. Making your database available through Wikipedia: the pros and cons

    PubMed Central

    Finn, Robert D.; Gardner, Paul P.; Bateman, Alex

    2012-01-01

    Wikipedia, the online encyclopedia, is the most famous wiki in use today. It contains over 3.7 million pages of content; with many pages written on scientific subject matters that include peer-reviewed citations, yet are written in an accessible manner and generally reflect the consensus opinion of the community. In this, the 19th Annual Database Issue of Nucleic Acids Research, there are 11 articles that describe the use of a wiki in relation to a biological database. In this commentary, we discuss how biological databases can be integrated with Wikipedia, thereby utilising the pre-existing infrastructure, tools and above all, large community of authors (or Wikipedians). The limitations to the content that can be included in Wikipedia are highlighted, with examples drawn from articles found in this issue and other wiki-based resources, indicating why other wiki solutions are necessary. We discuss the merits of using open wikis, like Wikipedia, versus other models, with particular reference to potential vandalism. Finally, we raise the question about the future role of dedicated database biocurators in context of the thousands of crowdsourced, community annotations that are now being stored in wikis. PMID:22144683

  17. The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies.

    PubMed

    Katayama, Toshiaki; Wilkinson, Mark D; Micklem, Gos; Kawashima, Shuichi; Yamaguchi, Atsuko; Nakao, Mitsuteru; Yamamoto, Yasunori; Okamoto, Shinobu; Oouchida, Kenta; Chun, Hong-Woo; Aerts, Jan; Afzal, Hammad; Antezana, Erick; Arakawa, Kazuharu; Aranda, Bruno; Belleau, Francois; Bolleman, Jerven; Bonnal, Raoul Jp; Chapman, Brad; Cock, Peter Ja; Eriksson, Tore; Gordon, Paul Mk; Goto, Naohisa; Hayashi, Kazuhiro; Horn, Heiko; Ishiwata, Ryosuke; Kaminuma, Eli; Kasprzyk, Arek; Kawaji, Hideya; Kido, Nobuhiro; Kim, Young Joo; Kinjo, Akira R; Konishi, Fumikazu; Kwon, Kyung-Hoon; Labarga, Alberto; Lamprecht, Anna-Lena; Lin, Yu; Lindenbaum, Pierre; McCarthy, Luke; Morita, Hideyuki; Murakami, Katsuhiko; Nagao, Koji; Nishida, Kozo; Nishimura, Kunihiro; Nishizawa, Tatsuya; Ogishima, Soichi; Ono, Keiichiro; Oshita, Kazuki; Park, Keun-Joon; Prins, Pjotr; Saito, Taro L; Samwald, Matthias; Satagopam, Venkata P; Shigemoto, Yasumasa; Smith, Richard; Splendiani, Andrea; Sugawara, Hideaki; Taylor, James; Vos, Rutger A; Withers, David; Yamasaki, Chisato; Zmasek, Christian M; Kawamoto, Shoko; Okubo, Kosaku; Asai, Kiyoshi; Takagi, Toshihisa

    2013-02-11

    BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization. We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.

  18. The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies

    PubMed Central

    2013-01-01

    Background BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. Results The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization. Conclusion We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer. PMID:23398680

  19. Durand Neighbourhood Heritage Inventory: Toward a Digital Citywide Survey Approach to Heritage Planning in Hamilton

    NASA Astrophysics Data System (ADS)

    Angel, V.; Garvey, A.; Sydor, M.

    2017-08-01

    In the face of changing economies and patterns of development, the definition of heritage is diversifying, and the role of inventories in local heritage planning is coming to the fore. The Durand neighbourhood is a layered and complex area located in inner-city Hamilton, Ontario, Canada, and the second subject area in a set of pilot inventory studies to develop a new city-wide inventory strategy for the City of Hamilton,. This paper presents an innovative digital workflow developed to undertake the Durand Built Heritage Inventory project. An online database was developed to be at the centre of all processes, including digital documentation, record management, analysis and variable outputs. Digital tools were employed for survey work in the field and analytical work in the office, resulting in a GIS-based dataset that can be integrated into Hamilton's larger municipal planning system. Together with digital mapping and digitized historical resources, the Durand database has been leveraged to produce both digital and static outputs to shape recommendations for the protection of Hamilton's heritage resources.

  20. UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs

    PubMed Central

    Mignone, Flavio; Grillo, Giorgio; Licciulli, Flavio; Iacono, Michele; Liuni, Sabino; Kersey, Paul J.; Duarte, Jorge; Saccone, Cecilia; Pesole, Graziano

    2005-01-01

    The 5′ and 3′ untranslated regions of eukaryotic mRNAs play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and message stability. UTRdb is a curated database of 5′ and 3′ untranslated sequences of eukaryotic mRNAs, derived from several sources of primary data. Experimentally validated functional motifs are annotated (and also collated as the UTRsite database) and cross-links to genomic and protein data are provided. The integration of UTRdb with genomic and protein data has allowed the implementation of a powerful retrieval resource for the selection and extraction of UTR subsets based on their genomic coordinates and/or features of the protein encoded by the relevant mRNA (e.g. GO term, PFAM domain, etc.). All internet resources implemented for retrieval and functional analysis of 5′ and 3′ untranslated regions of eukaryotic mRNAs are accessible at http://www.ba.itb.cnr.it/UTR/. PMID:15608165

Top