Sample records for source database developer

  1. Mobile Source Observation Database (MSOD)

    EPA Pesticide Factsheets

    The Mobile Source Observation Database (MSOD) is a relational database developed by the Assessment and Standards Division (ASD) of the U.S. EPA Office of Transportation and Air Quality (formerly the Office of Mobile Sources).

  2. Freshwater Biological Traits Database (Data Sources)

    EPA Science Inventory

    When EPA release the final report, Freshwater Biological Traits Database, it referenced numerous data sources that are included below. The Traits Database report covers the development of a database of freshwater biological traits with additional traits that are relevan...

  3. Development Of New Databases For Tsunami Hazard Analysis In California

    NASA Astrophysics Data System (ADS)

    Wilson, R. I.; Barberopoulou, A.; Borrero, J. C.; Bryant, W. A.; Dengler, L. A.; Goltz, J. D.; Legg, M.; McGuire, T.; Miller, K. M.; Real, C. R.; Synolakis, C.; Uslu, B.

    2009-12-01

    The California Geological Survey (CGS) has partnered with other tsunami specialists to produce two statewide databases to facilitate the evaluation of tsunami hazard products for both emergency response and land-use planning and development. A robust, State-run tsunami deposit database is being developed that compliments and expands on existing databases from the National Geophysical Data Center (global) and the USGS (Cascadia). Whereas these existing databases focus on references or individual tsunami layers, the new State-maintained database concentrates on the location and contents of individual borings/trenches that sample tsunami deposits. These data provide an important observational benchmark for evaluating the results of tsunami inundation modeling. CGS is collaborating with and sharing the database entry form with other states to encourage its continued development beyond California’s coastline so that historic tsunami deposits can be evaluated on a regional basis. CGS is also developing an internet-based, tsunami source scenario database and forum where tsunami source experts and hydrodynamic modelers can discuss the validity of tsunami sources and their contribution to hazard assessments for California and other coastal areas bordering the Pacific Ocean. The database includes all distant and local tsunami sources relevant to California starting with the forty scenarios evaluated during the creation of the recently completed statewide series of tsunami inundation maps for emergency response planning. Factors germane to probabilistic tsunami hazard analyses (PTHA), such as event histories and recurrence intervals, are also addressed in the database and discussed in the forum. Discussions with other tsunami source experts will help CGS determine what additional scenarios should be considered in PTHA for assessing the feasibility of generating products of value to local land-use planning and development.

  4. Mobile Source Observation Database (MSOD)

    EPA Pesticide Factsheets

    The Mobile Source Observation Database (MSOD) is a relational database being developed by the Assessment and Standards Division (ASD) of the US Environmental Protection Agency Office of Transportation and Air Quality (formerly the Office of Mobile Sources). The MSOD contains emission test data from in-use mobile air- pollution sources such as cars, trucks, and engines from trucks and nonroad vehicles. Data in the database was collected from 1982 to the present. The data is intended to be representative of in-use vehicle emissions in the United States.

  5. The STEP (Safety and Toxicity of Excipients for Paediatrics) database: part 2 - the pilot version.

    PubMed

    Salunke, Smita; Brandys, Barbara; Giacoia, George; Tuleu, Catherine

    2013-11-30

    The screening and careful selection of excipients is a critical step in paediatric formulation development as certain excipients acceptable in adult formulations, may not be appropriate for paediatric use. While there is extensive toxicity data that could help in better understanding and highlighting the gaps in toxicity studies, the data are often scattered around the information sources and saddled with incompatible data types and formats. This paper is the second in a series that presents the update on the Safety and Toxicity of Excipients for Paediatrics ("STEP") database being developed by Eu-US PFIs, and describes the architecture data fields and functions of the database. The STEP database is a user designed resource that compiles the safety and toxicity data of excipients that is scattered over various sources and presents it in one freely accessible source. Currently, in the pilot database data from over 2000 references/10 excipients presenting preclinical, clinical, regulatory information and toxicological reviews, with references and source links. The STEP database allows searching "FOR" excipients and "BY" excipients. This dual nature of the STEP database, in which toxicity and safety information can be searched in both directions, makes it unique from existing sources. If the pilot is successful, the aim is to increase the number of excipients in the existing database so that a database large enough to be of practical research use will be available. It is anticipated that this source will prove to be a useful platform for data management and data exchange of excipient safety information. Copyright © 2013 Elsevier B.V. All rights reserved.

  6. Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces.

    PubMed

    Ezra Tsur, Elishai

    2017-01-01

    Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.

  7. An Update on Electronic Information Sources.

    ERIC Educational Resources Information Center

    Ackerman, Katherine

    1987-01-01

    This review of new developments and products in online services discusses trends in travel related services; full text databases; statistical source databases; an emphasis on regional and international business news; and user friendly systems. (Author/CLB)

  8. Development of a data entry auditing protocol and quality assurance for a tissue bank database.

    PubMed

    Khushi, Matloob; Carpenter, Jane E; Balleine, Rosemary L; Clarke, Christine L

    2012-03-01

    Human transcription error is an acknowledged risk when extracting information from paper records for entry into a database. For a tissue bank, it is critical that accurate data are provided to researchers with approved access to tissue bank material. The challenges of tissue bank data collection include manual extraction of data from complex medical reports that are accessed from a number of sources and that differ in style and layout. As a quality assurance measure, the Breast Cancer Tissue Bank (http:\\\\www.abctb.org.au) has implemented an auditing protocol and in order to efficiently execute the process, has developed an open source database plug-in tool (eAuditor) to assist in auditing of data held in our tissue bank database. Using eAuditor, we have identified that human entry errors range from 0.01% when entering donor's clinical follow-up details, to 0.53% when entering pathological details, highlighting the importance of an audit protocol tool such as eAuditor in a tissue bank database. eAuditor was developed and tested on the Caisis open source clinical-research database; however, it can be integrated in other databases where similar functionality is required.

  9. DEVELOPMENT AND APPLICATION OF THE DORIAN (DOSE-RESPONSE INFORMATION ANALYSIS) SYSTEM

    EPA Science Inventory

    • Migration of ArrayTrack from the proprietary Oracle database to open source Postgres database.
    • Making the public version of the ebKB available with provisions for soliciting input from collaborators and outside users.
    • Continued development ...

    • The Longhorn Array Database (LAD): An Open-Source, MIAME compliant implementation of the Stanford Microarray Database (SMD)

      PubMed Central

      Killion, Patrick J; Sherlock, Gavin; Iyer, Vishwanath R

      2003-01-01

      Background The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms. Description The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at Conclusions Our development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data. PMID:12930545

    • Resources | Division of Cancer Prevention

      Cancer.gov

      Manual of Operations Version 3, 12/13/2012 (PDF, 162KB) Database Sources Consortium for Functional Glycomics databases Design Studies Related to the Development of Distributed, Web-based European Carbohydrate Databases (EUROCarbDB) |

    • Oak Ridge Reservation Environmental Protection Rad Neshaps Radionuclide Inventory Web Database and Rad Neshaps Source and Dose Database

      DOE PAGES

      Scofield, Patricia A.; Smith, Linda Lenell; Johnson, David N.

      2017-07-01

      The U.S. Environmental Protection Agency promulgated national emission standards for emissions of radionuclides other than radon from US Department of Energy facilities in Chapter 40 of the Code of Federal Regulations (CFR) 61, Subpart H. This regulatory standard limits the annual effective dose that any member of the public can receive from Department of Energy facilities to 0.1 mSv. As defined in the preamble of the final rule, all of the facilities on the Oak Ridge Reservation, i.e., the Y–12 National Security Complex, Oak Ridge National Laboratory, East Tennessee Technology Park, and any other U.S. Department of Energy operations onmore » Oak Ridge Reservation, combined, must meet the annual dose limit of 0.1 mSv. At Oak Ridge National Laboratory, there are monitored sources and numerous unmonitored sources. To maintain radiological source and inventory information for these unmonitored sources, e.g., laboratory hoods, equipment exhausts, and room exhausts not currently venting to monitored stacks on the Oak Ridge National Laboratory campus, the Environmental Protection Rad NESHAPs Inventory Web Database was developed. This database is updated annually and is used to compile emissions data for the annual Radionuclide National Emission Standards for Hazardous Air Pollutants (Rad NESHAPs) report required by 40 CFR 61.94. It also provides supporting documentation for facility compliance audits. In addition, a Rad NESHAPs source and dose database was developed to import the source and dose summary data from Clean Air Act Assessment Package—1988 computer model files. As a result, this database provides Oak Ridge Reservation and facility-specific source inventory; doses associated with each source and facility; and total doses for the Oak Ridge Reservation dose.« less

    • Oak Ridge Reservation Environmental Protection Rad Neshaps Radionuclide Inventory Web Database and Rad Neshaps Source and Dose Database.

      PubMed

      Scofield, Patricia A; Smith, Linda L; Johnson, David N

      2017-07-01

      The U.S. Environmental Protection Agency promulgated national emission standards for emissions of radionuclides other than radon from US Department of Energy facilities in Chapter 40 of the Code of Federal Regulations (CFR) 61, Subpart H. This regulatory standard limits the annual effective dose that any member of the public can receive from Department of Energy facilities to 0.1 mSv. As defined in the preamble of the final rule, all of the facilities on the Oak Ridge Reservation, i.e., the Y-12 National Security Complex, Oak Ridge National Laboratory, East Tennessee Technology Park, and any other U.S. Department of Energy operations on Oak Ridge Reservation, combined, must meet the annual dose limit of 0.1 mSv. At Oak Ridge National Laboratory, there are monitored sources and numerous unmonitored sources. To maintain radiological source and inventory information for these unmonitored sources, e.g., laboratory hoods, equipment exhausts, and room exhausts not currently venting to monitored stacks on the Oak Ridge National Laboratory campus, the Environmental Protection Rad NESHAPs Inventory Web Database was developed. This database is updated annually and is used to compile emissions data for the annual Radionuclide National Emission Standards for Hazardous Air Pollutants (Rad NESHAPs) report required by 40 CFR 61.94. It also provides supporting documentation for facility compliance audits. In addition, a Rad NESHAPs source and dose database was developed to import the source and dose summary data from Clean Air Act Assessment Package-1988 computer model files. This database provides Oak Ridge Reservation and facility-specific source inventory; doses associated with each source and facility; and total doses for the Oak Ridge Reservation dose.

    • Oak Ridge Reservation Environmental Protection Rad Neshaps Radionuclide Inventory Web Database and Rad Neshaps Source and Dose Database

      DOE Office of Scientific and Technical Information (OSTI.GOV)

      Scofield, Patricia A.; Smith, Linda Lenell; Johnson, David N.

      The U.S. Environmental Protection Agency promulgated national emission standards for emissions of radionuclides other than radon from US Department of Energy facilities in Chapter 40 of the Code of Federal Regulations (CFR) 61, Subpart H. This regulatory standard limits the annual effective dose that any member of the public can receive from Department of Energy facilities to 0.1 mSv. As defined in the preamble of the final rule, all of the facilities on the Oak Ridge Reservation, i.e., the Y–12 National Security Complex, Oak Ridge National Laboratory, East Tennessee Technology Park, and any other U.S. Department of Energy operations onmore » Oak Ridge Reservation, combined, must meet the annual dose limit of 0.1 mSv. At Oak Ridge National Laboratory, there are monitored sources and numerous unmonitored sources. To maintain radiological source and inventory information for these unmonitored sources, e.g., laboratory hoods, equipment exhausts, and room exhausts not currently venting to monitored stacks on the Oak Ridge National Laboratory campus, the Environmental Protection Rad NESHAPs Inventory Web Database was developed. This database is updated annually and is used to compile emissions data for the annual Radionuclide National Emission Standards for Hazardous Air Pollutants (Rad NESHAPs) report required by 40 CFR 61.94. It also provides supporting documentation for facility compliance audits. In addition, a Rad NESHAPs source and dose database was developed to import the source and dose summary data from Clean Air Act Assessment Package—1988 computer model files. As a result, this database provides Oak Ridge Reservation and facility-specific source inventory; doses associated with each source and facility; and total doses for the Oak Ridge Reservation dose.« less

    • OrChem - An open source chemistry search engine for Oracle(R).

      PubMed

      Rijnbeek, Mark; Steinbeck, Christoph

      2009-10-22

      Registration, indexing and searching of chemical structures in relational databases is one of the core areas of cheminformatics. However, little detail has been published on the inner workings of search engines and their development has been mostly closed-source. We decided to develop an open source chemistry extension for Oracle, the de facto database platform in the commercial world. Here we present OrChem, an extension for the Oracle 11G database that adds registration and indexing of chemical structures to support fast substructure and similarity searching. The cheminformatics functionality is provided by the Chemistry Development Kit. OrChem provides similarity searching with response times in the order of seconds for databases with millions of compounds, depending on a given similarity cut-off. For substructure searching, it can make use of multiple processor cores on today's powerful database servers to provide fast response times in equally large data sets. OrChem is free software and can be redistributed and/or modified under the terms of the GNU Lesser General Public License as published by the Free Software Foundation. All software is available via http://orchem.sourceforge.net.

    • Database Entity Persistence with Hibernate for the Network Connectivity Analysis Model

      DTIC Science & Technology

      2014-04-01

      time savings in the Java coding development process. Appendices A and B describe address setup procedures for installing the MySQL database...development environment is required: • The open source MySQL Database Management System (DBMS) from Oracle, which is a Java Database Connectivity (JDBC...compliant DBMS • MySQL JDBC Driver library that comes as a plug-in with the Netbeans distribution • The latest Java Development Kit with the latest

    • Comprehensive Routing Security Development and Deployment for the Internet

      DTIC Science & Technology

      2015-02-01

      feature enhancement and bug fixes. • MySQL : MySQL is a widely used and popular open source database package. It was chosen for database support in the...RPSTIR depends on several other open source packages. • MySQL : MySQL is used for the the local RPKI database cache. • OpenSSL: OpenSSL is used for...cryptographic libraries for X.509 certificates. • ODBC mySql Connector: ODBC (Open Database Connectivity) is a standard programming interface (API) for

    • Implementing a Dynamic Database-Driven Course Using LAMP

      ERIC Educational Resources Information Center

      Laverty, Joseph Packy; Wood, David; Turchek, John

      2011-01-01

      This paper documents the formulation of a database driven open source architecture web development course. The design of a web-based curriculum faces many challenges: a) relative emphasis of client and server-side technologies, b) choice of a server-side language, and c) the cost and efficient delivery of a dynamic web development, database-driven…

    • Implementation of the Regulatory Authority Information System in Egypt

      DOE Office of Scientific and Technical Information (OSTI.GOV)

      Carson, S.D.; Schetnan, R.; Hasan, A.

      2006-07-01

      As part of the implementation of a bar-code-based system to track radioactive sealed sources (RSS) in Egypt, the Regulatory Authority Information System Personal Digital Assistant (RAIS PDA) Application was developed to extend the functionality of the International Atomic Energy Agency's (IAEA's) RAIS database by allowing users to download RSS data from the database to a portable PDA equipped with a bar-code scanner. [1, 4] The system allows users in the field to verify radioactive sealed source data, gather radioactive sealed source audit information, and upload that data to the RAIS database. This paper describes the development of the RAIS PDAmore » Application, its features, and how it will be implemented in Egypt. (authors)« less

    • OrChem - An open source chemistry search engine for Oracle®

      PubMed Central

      2009-01-01

      Background Registration, indexing and searching of chemical structures in relational databases is one of the core areas of cheminformatics. However, little detail has been published on the inner workings of search engines and their development has been mostly closed-source. We decided to develop an open source chemistry extension for Oracle, the de facto database platform in the commercial world. Results Here we present OrChem, an extension for the Oracle 11G database that adds registration and indexing of chemical structures to support fast substructure and similarity searching. The cheminformatics functionality is provided by the Chemistry Development Kit. OrChem provides similarity searching with response times in the order of seconds for databases with millions of compounds, depending on a given similarity cut-off. For substructure searching, it can make use of multiple processor cores on today's powerful database servers to provide fast response times in equally large data sets. Availability OrChem is free software and can be redistributed and/or modified under the terms of the GNU Lesser General Public License as published by the Free Software Foundation. All software is available via http://orchem.sourceforge.net. PMID:20298521

  1. Open source hardware and software platform for robotics and artificial intelligence applications

    NASA Astrophysics Data System (ADS)

    Liang, S. Ng; Tan, K. O.; Lai Clement, T. H.; Ng, S. K.; Mohammed, A. H. Ali; Mailah, Musa; Azhar Yussof, Wan; Hamedon, Zamzuri; Yussof, Zulkifli

    2016-02-01

    Recent developments in open source hardware and software platforms (Android, Arduino, Linux, OpenCV etc.) have enabled rapid development of previously expensive and sophisticated system within a lower budget and flatter learning curves for developers. Using these platform, we designed and developed a Java-based 3D robotic simulation system, with graph database, which is integrated in online and offline modes with an Android-Arduino based rubbish picking remote control car. The combination of the open source hardware and software system created a flexible and expandable platform for further developments in the future, both in the software and hardware areas, in particular in combination with graph database for artificial intelligence, as well as more sophisticated hardware, such as legged or humanoid robots.

  2. International Data on Radiological Sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martha Finck; Margaret Goldberg

    2010-07-01

    ABSTRACT The mission of radiological dispersal device (RDD) nuclear forensics is to identify the provenance of nuclear and radiological materials used in RDDs and to aid law enforcement in tracking nuclear materials and routes. The application of databases to radiological forensics is to match RDD source material to a source model in the database, provide guidance regarding a possible second device, and aid the FBI by providing a short list of manufacturers and distributors, and ultimately to the last legal owner of the source. The Argonne/Idaho National Laboratory RDD attribution database is a powerful technical tool in radiological forensics. Themore » database (1267 unique vendors) includes all sealed sources and a device registered in the U.S., is complemented by data from the IAEA Catalogue, and is supported by rigorous in-lab characterization of selected sealed sources regarding physical form, radiochemical composition, and age-dating profiles. Close working relationships with global partners in the commercial sealed sources industry provide invaluable technical information and expertise in the development of signature profiles. These profiles are critical to the down-selection of potential candidates in either pre- or post- event RDD attribution. The down-selection process includes a match between an interdicted (or detonated) source and a model in the database linked to one or more manufacturers and distributors.« less

  3. Identifying the effective evidence sources to use in developing Clinical Guidelines for Acute Stroke Management: lived experiences of the search specialist and project manager.

    PubMed

    Parkhill, Anne; Hill, Kelvin

    2009-03-01

    The Australian National Stroke Foundation appointed a search specialist to find the best available evidence for the second edition of its Clinical Guidelines for Acute Stroke Management. To identify the relative effectiveness of differing evidence sources for the guideline update. We searched and reviewed references from five valid evidence sources for clinical and economic questions: (i) electronic databases; (ii) reference lists of relevant systematic reviews, guidelines, and/or primary studies; (iii) table of contents of a number of key journals for the last 6 months; (iv) internet/grey literature; and (v) experts. Reference sources were recorded, quantified, and analysed. In the clinical portion of the guidelines document, there was a greater use of previous knowledge and sources other than electronic databases for evidence, while there was a greater use of electronic databases for the economic section. The results confirmed that searchers need to be aware of the context and range of sources for evidence searches. For best available evidence, searchers cannot rely solely on electronic databases and need to encompass many different media and sources.

  4. Modernizing the MagIC Paleomagnetic and Rock Magnetic Database Technology Stack to Encourage Code Reuse and Reproducible Science

    NASA Astrophysics Data System (ADS)

    Minnett, R.; Koppers, A. A. P.; Jarboe, N.; Jonestrask, L.; Tauxe, L.; Constable, C.

    2016-12-01

    The Magnetics Information Consortium (https://earthref.org/MagIC/) develops and maintains a database and web application for supporting the paleo-, geo-, and rock magnetic scientific community. Historically, this objective has been met with an Oracle database and a Perl web application at the San Diego Supercomputer Center (SDSC). The Oracle Enterprise Cluster at SDSC, however, was decommissioned in July of 2016 and the cost for MagIC to continue using Oracle became prohibitive. This provided MagIC with a unique opportunity to reexamine the entire technology stack and data model. MagIC has developed an open-source web application using the Meteor (http://meteor.com) framework and a MongoDB database. The simplicity of the open-source full-stack framework that Meteor provides has improved MagIC's development pace and the increased flexibility of the data schema in MongoDB encouraged the reorganization of the MagIC Data Model. As a result of incorporating actively developed open-source projects into the technology stack, MagIC has benefited from their vibrant software development communities. This has translated into a more modern web application that has significantly improved the user experience for the paleo-, geo-, and rock magnetic scientific community.

  5. EPA’s Drinking Water Treatability Database: A Tool for All Drinking Water Professionals

    EPA Science Inventory

    The Drinking Water Treatability Database (TDB) is being developed by the USEPA Office of Research and Development to allow drinking water professionals and others to access referenced information gathered from thousands of literature sources and assembled on one site. Currently, ...

  6. New Zealand's National Landslide Database

    NASA Astrophysics Data System (ADS)

    Rosser, B.; Dellow, S.; Haubrook, S.; Glassey, P.

    2016-12-01

    Since 1780, landslides have caused an average of about 3 deaths a year in New Zealand and have cost the economy an average of at least NZ$250M/a (0.1% GDP). To understand the risk posed by landslide hazards to society, a thorough knowledge of where, when and why different types of landslides occur is vital. The main objective for establishing the database was to provide a centralised national-scale, publically available database to collate landslide information that could be used for landslide hazard and risk assessment. Design of a national landslide database for New Zealand required consideration of both existing landslide data stored in a variety of digital formats, and future data, yet to be collected. Pre-existing databases were developed and populated with data reflecting the needs of the landslide or hazard project, and the database structures of the time. Bringing these data into a single unified database required a new structure capable of storing and delivering data at a variety of scales and accuracy and with different attributes. A "unified data model" was developed to enable the database to hold old and new landslide data irrespective of scale and method of capture. The database contains information on landslide locations and where available: 1) the timing of landslides and the events that may have triggered them; 2) the type of landslide movement; 3) the volume and area; 4) the source and debris tail; and 5) the impacts caused by the landslide. Information from a variety of sources including aerial photographs (and other remotely sensed data), field reconnaissance and media accounts has been collated and is presented for each landslide along with metadata describing the data sources and quality. There are currently nearly 19,000 landslide records in the database that include point locations, polygons of landslide source and deposit areas, and linear features. Several large datasets are awaiting upload which will bring the total number of landslides to over 100,000. The geo-spatial database is publicly available via the Internet. Software components, including the underlying database (PostGIS), Web Map Server (GeoServer) and web application use open-source software. The hope is that others will add relevant information to the database as well as download the data contained in it.

  7. CROPPER: a metagene creator resource for cross-platform and cross-species compendium studies.

    PubMed

    Paananen, Jussi; Storvik, Markus; Wong, Garry

    2006-09-22

    Current genomic research methods provide researchers with enormous amounts of data. Combining data from different high-throughput research technologies commonly available in biological databases can lead to novel findings and increase research efficiency. However, combining data from different heterogeneous sources is often a very arduous task. These sources can be different microarray technology platforms, genomic databases, or experiments performed on various species. Our aim was to develop a software program that could facilitate the combining of data from heterogeneous sources, and thus allow researchers to perform genomic cross-platform/cross-species studies and to use existing experimental data for compendium studies. We have developed a web-based software resource, called CROPPER that uses the latest genomic information concerning different data identifiers and orthologous genes from the Ensembl database. CROPPER can be used to combine genomic data from different heterogeneous sources, allowing researchers to perform cross-platform/cross-species compendium studies without the need for complex computational tools or the requirement of setting up one's own in-house database. We also present an example of a simple cross-platform/cross-species compendium study based on publicly available Parkinson's disease data derived from different sources. CROPPER is a user-friendly and freely available web-based software resource that can be successfully used for cross-species/cross-platform compendium studies.

  8. SPECIATE 4.0: SPECIATION DATABASE DEVELOPMENT DOCUMENTATION--FINAL REPORT

    EPA Science Inventory

    SPECIATE is the U.S. EPA's repository of total organic compounds (TOC) and particulate matter (PM) speciation profiles of air pollution sources. This report documents how EPA developed the SPECIATE 4.0 database that replaces the prior version, SPECIATE 3.2. SPECIATE 4.0 includes ...

  9. CardioTF, a database of deconstructing transcriptional circuits in the heart system

    PubMed Central

    2016-01-01

    Background: Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. Methods: The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Results: Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. Discussion: The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Availability and Implementation: Database URL: http://www.cardiosignal.org/database/cardiotf.html. PMID:27635320

  10. CardioTF, a database of deconstructing transcriptional circuits in the heart system.

    PubMed

    Zhen, Yisong

    2016-01-01

    Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Database URL: http://www.cardiosignal.org/database/cardiotf.html.

  11. Developing a Large Lexical Database for Information Retrieval, Parsing, and Text Generation Systems.

    ERIC Educational Resources Information Center

    Conlon, Sumali Pin-Ngern; And Others

    1993-01-01

    Important characteristics of lexical databases and their applications in information retrieval and natural language processing are explained. An ongoing project using various machine-readable sources to build a lexical database is described, and detailed designs of individual entries with examples are included. (Contains 66 references.) (EAM)

  12. An experiment in big data: storage, querying and visualisation of data taken from the Liverpool Telescope's wide field cameras

    NASA Astrophysics Data System (ADS)

    Barnsley, R. M.; Steele, Iain A.; Smith, R. J.; Mawson, Neil R.

    2014-07-01

    The Small Telescopes Installed at the Liverpool Telescope (STILT) project has been in operation since March 2009, collecting data with three wide field unfiltered cameras: SkycamA, SkycamT and SkycamZ. To process the data, a pipeline was developed to automate source extraction, catalogue cross-matching, photometric calibration and database storage. In this paper, modifications and further developments to this pipeline will be discussed, including a complete refactor of the pipeline's codebase into Python, migration of the back-end database technology from MySQL to PostgreSQL, and changing the catalogue used for source cross-matching from USNO-B1 to APASS. In addition to this, details will be given relating to the development of a preliminary front-end to the source extracted database which will allow a user to perform common queries such as cone searches and light curve comparisons of catalogue and non-catalogue matched objects. Some next steps and future ideas for the project will also be presented.

  13. ACToR Chemical Structure processing using Open Source ...

    EPA Pesticide Factsheets

    ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from over 1,950 public sources. ACToR contains chemical structure information and toxicological data for over 558,000 unique chemicals. The database primarily includes data from NCCT research programs, in vivo toxicity data from ToxRef, human exposure data from ExpoCast, high-throughput screening data from ToxCast and high quality chemical structure information from the EPA DSSTox program. The DSSTox database is a chemical structure inventory for the NCCT programs and currently has about 16,000 unique structures. Included are also data from PubChem, ChemSpider, USDA, FDA, NIH and several other public data sources. ACToR has been a resource to various international and national research groups. Most of our recent efforts on ACToR are focused on improving the structural identifiers and Physico-Chemical properties of the chemicals in the database. Organizing this huge collection of data and improving the chemical structure quality of the database has posed some major challenges. Workflows have been developed to process structures, calculate chemical properties and identify relationships between CAS numbers. The Structure processing workflow integrates web services (PubChem and NIH NCI Cactus) to d

  14. Implementation of the FAA research and development electromagnetic database

    NASA Technical Reports Server (NTRS)

    Mcdowall, R. L.; Grush, D. J.; Cook, D. M.; Glynn, M. S.

    1991-01-01

    The Idaho National Engineering Laboratory (INEL) has been assisting the FAA in developing a database of information about lightning. The FAA Research and Development Electromagnetic Database (FRED) will ultimately contain data from a variety of airborne and ground-based lightning research projects. An outline of the data currently available in FRED is presented. The data sources which the FAA intends to incorporate into FRED are listed. In addition, it describes how the researchers may access and use the FRED menu system.

  15. Building a Database for a Quantitative Model

    NASA Technical Reports Server (NTRS)

    Kahn, C. Joseph; Kleinhammer, Roger

    2014-01-01

    A database can greatly benefit a quantitative analysis. The defining characteristic of a quantitative risk, or reliability, model is the use of failure estimate data. Models can easily contain a thousand Basic Events, relying on hundreds of individual data sources. Obviously, entering so much data by hand will eventually lead to errors. Not so obviously entering data this way does not aid linking the Basic Events to the data sources. The best way to organize large amounts of data on a computer is with a database. But a model does not require a large, enterprise-level database with dedicated developers and administrators. A database built in Excel can be quite sufficient. A simple spreadsheet database can link every Basic Event to the individual data source selected for them. This database can also contain the manipulations appropriate for how the data is used in the model. These manipulations include stressing factors based on use and maintenance cycles, dormancy, unique failure modes, the modeling of multiple items as a single "Super component" Basic Event, and Bayesian Updating based on flight and testing experience. A simple, unique metadata field in both the model and database provides a link from any Basic Event in the model to its data source and all relevant calculations. The credibility for the entire model often rests on the credibility and traceability of the data.

  16. Development of a statewide trauma registry using multiple linked sources of data.

    PubMed Central

    Clark, D. E.

    1993-01-01

    In order to develop a cost-effective method of injury surveillance and trauma system evaluation in a rural state, computer programs were written linking records from two major hospital trauma registries, a statewide trauma tracking study, hospital discharge abstracts, death certificates, and ambulance run reports. A general-purpose database management system, programming language, and operating system were used. Data from 1991 appeared to be successfully linked using only indirect identifying information. Familiarity with local geography and the idiosyncracies of each data source were helpful in programming for effective matching of records. For each individual case identified in this way, data from all available sources were then merged and imported into a standard database format. This inexpensive, population-based approach, maintaining flexibility for end-users with some database training, may be adaptable for other regions. There is a need for further improvement and simplification of the record-linkage process for this and similar purposes. PMID:8130556

  17. Comparing Global Influence: China’s and U.S. Diplomacy, Foreign Aid, Trade, and Investment in the Developing World

    DTIC Science & Technology

    2008-08-15

    from 2006-2010 and about 7.4% for Source: United Nations, COMTRADE Database . 95 96 97 98 99 2000 1 2 3 4 5 6 7 Year 0 500 1000 1500 2000 2500 3000...49 Source: United Nations, COMTRADE Database . 95 96 97 98 99 2000 1 2 3 4 5 6 7 Year 0 200 400 600 800 1000 1200 1400 $Billion China Exports 149 151...U.S. and China’s Exports of Goods to the World Source: United Nations, COMTRADE Database . 95 96 97 98 99 2000 1 2 3 4 5 6 7 Year 0 500 1000 1500

  18. [Exploration and construction of the full-text database of acupuncture literature in the Republic of China].

    PubMed

    Fei, Lin; Zhao, Jing; Leng, Jiahao; Zhang, Shujian

    2017-10-12

    The ALIPORC full-text database is targeted at a specific full-text database of acupuncture literature in the Republic of China. Starting in 2015, till now, the database has been getting completed, focusing on books relevant with acupuncture, articles and advertising documents, accomplished or published in the Republic of China. The construction of this database aims to achieve the source sharing of acupuncture medical literature in the Republic of China through the retrieval approaches to diversity and accurate content presentation, contributes to the exchange of scholars, reduces the paper damage caused by paging and simplify the retrieval of the rare literature. The writers have made the explanation of the database in light of sources, characteristics and current situation of construction; and have discussed on improving the efficiency and integrity of the database and deepening the development of acupuncture literature in the Republic of China.

  19. Healthcare Databases in Thailand and Japan: Potential Sources for Health Technology Assessment Research.

    PubMed

    Saokaew, Surasak; Sugimoto, Takashi; Kamae, Isao; Pratoomsoot, Chayanin; Chaiyakunapruk, Nathorn

    2015-01-01

    Health technology assessment (HTA) has been continuously used for value-based healthcare decisions over the last decade. Healthcare databases represent an important source of information for HTA, which has seen a surge in use in Western countries. Although HTA agencies have been established in Asia-Pacific region, application and understanding of healthcare databases for HTA is rather limited. Thus, we reviewed existing databases to assess their potential for HTA in Thailand where HTA has been used officially and Japan where HTA is going to be officially introduced. Existing healthcare databases in Thailand and Japan were compiled and reviewed. Databases' characteristics e.g. name of database, host, scope/objective, time/sample size, design, data collection method, population/sample, and variables were described. Databases were assessed for its potential HTA use in terms of safety/efficacy/effectiveness, social/ethical, organization/professional, economic, and epidemiological domains. Request route for each database was also provided. Forty databases- 20 from Thailand and 20 from Japan-were included. These comprised of national censuses, surveys, registries, administrative data, and claimed databases. All databases were potentially used for epidemiological studies. In addition, data on mortality, morbidity, disability, adverse events, quality of life, service/technology utilization, length of stay, and economics were also found in some databases. However, access to patient-level data was limited since information about the databases was not available on public sources. Our findings have shown that existing databases provided valuable information for HTA research with limitation on accessibility. Mutual dialogue on healthcare database development and usage for HTA among Asia-Pacific region is needed.

  20. Development of a One-Stop Data Search and Discovery Engine using Ontologies for Semantic Mappings (HydroSeek)

    NASA Astrophysics Data System (ADS)

    Piasecki, M.; Beran, B.

    2007-12-01

    Search engines have changed the way we see the Internet. The ability to find the information by just typing in keywords was a big contribution to the overall web experience. While the conventional search engine methodology worked well for textual documents, locating scientific data remains a problem since they are stored in databases not readily accessible by search engine bots. Considering different temporal, spatial and thematic coverage of different databases, especially for interdisciplinary research it is typically necessary to work with multiple data sources. These sources can be federal agencies which generally offer national coverage or regional sources which cover a smaller area with higher detail. However for a given geographic area of interest there often exists more than one database with relevant data. Thus being able to query multiple databases simultaneously is a desirable feature that would be tremendously useful for scientists. Development of such a search engine requires dealing with various heterogeneity issues. In scientific databases, systems often impose controlled vocabularies which ensure that they are generally homogeneous within themselves but are semantically heterogeneous when moving between different databases. This defines the boundaries of possible semantic related problems making it easier to solve than with the conventional search engines that deal with free text. We have developed a search engine that enables querying multiple data sources simultaneously and returns data in a standardized output despite the aforementioned heterogeneity issues between the underlying systems. This application relies mainly on metadata catalogs or indexing databases, ontologies and webservices with virtual globe and AJAX technologies for the graphical user interface. Users can trigger a search of dozens of different parameters over hundreds of thousands of stations from multiple agencies by providing a keyword, a spatial extent, i.e. a bounding box, and a temporal bracket. As part of this development we have also added an environment that allows users to do some of the semantic tagging, i.e. the linkage of a variable name (which can be anything they desire) to defined concepts in the ontology structure which in turn provides the backbone of the search engine.

  1. Quantifying Data Quality for Clinical Trials Using Electronic Data Capture

    PubMed Central

    Nahm, Meredith L.; Pieper, Carl F.; Cunningham, Maureen M.

    2008-01-01

    Background Historically, only partial assessments of data quality have been performed in clinical trials, for which the most common method of measuring database error rates has been to compare the case report form (CRF) to database entries and count discrepancies. Importantly, errors arising from medical record abstraction and transcription are rarely evaluated as part of such quality assessments. Electronic Data Capture (EDC) technology has had a further impact, as paper CRFs typically leveraged for quality measurement are not used in EDC processes. Methods and Principal Findings The National Institute on Drug Abuse Treatment Clinical Trials Network has developed, implemented, and evaluated methodology for holistically assessing data quality on EDC trials. We characterize the average source-to-database error rate (14.3 errors per 10,000 fields) for the first year of use of the new evaluation method. This error rate was significantly lower than the average of published error rates for source-to-database audits, and was similar to CRF-to-database error rates reported in the published literature. We attribute this largely to an absence of medical record abstraction on the trials we examined, and to an outpatient setting characterized by less acute patient conditions. Conclusions Historically, medical record abstraction is the most significant source of error by an order of magnitude, and should be measured and managed during the course of clinical trials. Source-to-database error rates are highly dependent on the amount of structured data collection in the clinical setting and on the complexity of the medical record, dependencies that should be considered when developing data quality benchmarks. PMID:18725958

  2. DISTRIBUTED STRUCTURE-SEARCHABLE TOXICITY (DSSTOX) DATABASE NETWORK: MAKING PUBLIC TOXICITY DATA RESOURCES MORE ACCESSIBLE AND USABLE FOR DATA EXPLORATION AND SAR DEVELOPMENT

    EPA Science Inventory


    Distributed Structure-Searchable Toxicity (DSSTox) Database Network: Making Public Toxicity Data Resources More Accessible and U sable for Data Exploration and SAR Development

    Many sources of public toxicity data are not currently linked to chemical structure, are not ...

  3. ARACHNID: A prototype object-oriented database tool for distributed systems

    NASA Technical Reports Server (NTRS)

    Younger, Herbert; Oreilly, John; Frogner, Bjorn

    1994-01-01

    This paper discusses the results of a Phase 2 SBIR project sponsored by NASA and performed by MIMD Systems, Inc. A major objective of this project was to develop specific concepts for improved performance in accessing large databases. An object-oriented and distributed approach was used for the general design, while a geographical decomposition was used as a specific solution. The resulting software framework is called ARACHNID. The Faint Source Catalog developed by NASA was the initial database testbed. This is a database of many giga-bytes, where an order of magnitude improvement in query speed is being sought. This database contains faint infrared point sources obtained from telescope measurements of the sky. A geographical decomposition of this database is an attractive approach to dividing it into pieces. Each piece can then be searched on individual processors with only a weak data linkage between the processors being required. As a further demonstration of the concepts implemented in ARACHNID, a tourist information system is discussed. This version of ARACHNID is the commercial result of the project. It is a distributed, networked, database application where speed, maintenance, and reliability are important considerations. This paper focuses on the design concepts and technologies that form the basis for ARACHNID.

  4. FishTraits Database

    USGS Publications Warehouse

    Angermeier, Paul L.; Frimpong, Emmanuel A.

    2009-01-01

    The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. FishTraits is a database of >100 traits for 809 (731 native and 78 exotic) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database contains information on four major categories of traits: (1) trophic ecology, (2) body size and reproductive ecology (life history), (3) habitat associations, and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status is also included. Together, we refer to the traits, distribution, and conservation status information as attributes. Descriptions of attributes are available here. Many sources were consulted to compile attributes, including state and regional species accounts and other databases.

  5. [Status of libraries and databases for natural products at abroad].

    PubMed

    Zhao, Li-Mei; Tan, Ning-Hua

    2015-01-01

    For natural products are one of the important sources for drug discovery, libraries and databases of natural products are significant for the development and research of natural products. At present, most of compound libraries at abroad are synthetic or combinatorial synthetic molecules, resulting to access natural products difficult; for information of natural products are scattered with different standards, it is difficult to construct convenient, comprehensive and large-scale databases for natural products. This paper reviewed the status of current accessing libraries and databases for natural products at abroad and provided some important information for the development of libraries and database for natural products.

  6. Study of an External Neutron Source for an Accelerator-Driven System using the PHITS Code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sugawara, Takanori; Iwasaki, Tomohiko; Chiba, Takashi

    A code system for the Accelerator Driven System (ADS) has been under development for analyzing dynamic behaviors of a subcritical core coupled with an accelerator. This code system named DSE (Dynamics calculation code system for a Subcritical system with an External neutron source) consists of an accelerator part and a reactor part. The accelerator part employs a database, which is calculated by using PHITS, for investigating the effect related to the accelerator such as the changes of beam energy, beam diameter, void generation, and target level. This analysis method using the database may introduce some errors into dynamics calculations sincemore » the neutron source data derived from the database has some errors in fitting or interpolating procedures. In this study, the effects of various events are investigated to confirm that the method based on the database is appropriate.« less

  7. New data sources and derived products for the SRER digital spatial database

    Treesearch

    Craig Wissler; Deborah Angell

    2003-01-01

    The Santa Rita Experimental Range (SRER) digital database was developed to automate and preserve ecological data and increase their accessibility. The digital data holdings include a spatial database that is used to integrate ecological data in a known reference system and to support spatial analyses. Recently, the Advanced Resource Technology (ART) facility has added...

  8. Implementation of the FAA research and development electromagnetic database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McDowall, R.L.; Grush, D.J.; Cook, D.M.

    1991-01-01

    The Idaho National Engineering Laboratory (INEL) has been assisting the Federal Aviation Administration (FAA) in developing a database of information about lightning. The FAA Research and Development Electromagnetic Database (FRED) will ultimately contain data from a variety of airborne and groundbased lightning research projects. This paper contains an outline of the data currently available in FRED. It also lists the data sources which the FAA intends to incorporate into FRED. In addition, it describes how the researcher may access and use the FRED menu system. 2 refs., 12 figs.

  9. Healthcare Databases in Thailand and Japan: Potential Sources for Health Technology Assessment Research

    PubMed Central

    Saokaew, Surasak; Sugimoto, Takashi; Kamae, Isao; Pratoomsoot, Chayanin; Chaiyakunapruk, Nathorn

    2015-01-01

    Background Health technology assessment (HTA) has been continuously used for value-based healthcare decisions over the last decade. Healthcare databases represent an important source of information for HTA, which has seen a surge in use in Western countries. Although HTA agencies have been established in Asia-Pacific region, application and understanding of healthcare databases for HTA is rather limited. Thus, we reviewed existing databases to assess their potential for HTA in Thailand where HTA has been used officially and Japan where HTA is going to be officially introduced. Method Existing healthcare databases in Thailand and Japan were compiled and reviewed. Databases’ characteristics e.g. name of database, host, scope/objective, time/sample size, design, data collection method, population/sample, and variables were described. Databases were assessed for its potential HTA use in terms of safety/efficacy/effectiveness, social/ethical, organization/professional, economic, and epidemiological domains. Request route for each database was also provided. Results Forty databases– 20 from Thailand and 20 from Japan—were included. These comprised of national censuses, surveys, registries, administrative data, and claimed databases. All databases were potentially used for epidemiological studies. In addition, data on mortality, morbidity, disability, adverse events, quality of life, service/technology utilization, length of stay, and economics were also found in some databases. However, access to patient-level data was limited since information about the databases was not available on public sources. Conclusion Our findings have shown that existing databases provided valuable information for HTA research with limitation on accessibility. Mutual dialogue on healthcare database development and usage for HTA among Asia-Pacific region is needed. PMID:26560127

  10. The National Landslide Database and GIS for Great Britain: construction, development, data acquisition, application and communication

    NASA Astrophysics Data System (ADS)

    Pennington, Catherine; Dashwood, Claire; Freeborough, Katy

    2014-05-01

    The National Landslide Database has been developed by the British Geological Survey (BGS) and is the focus for national geohazard research for landslides in Great Britain. The history and structure of the geospatial database and associated Geographical Information System (GIS) are explained, along with the future developments of the database and its applications. The database is the most extensive source of information on landslides in Great Britain with over 16,500 records of landslide events, each documented as fully as possible. Data are gathered through a range of procedures, including: incorporation of other databases; automated trawling of current and historical scientific literature and media reports; new field- and desk-based mapping technologies with digital data capture, and crowd-sourcing information through social media and other online resources. This information is invaluable for the investigation, prevention and mitigation of areas of unstable ground in accordance with Government planning policy guidelines. The national landslide susceptibility map (GeoSure) and a national landslide domain map currently under development rely heavily on the information contained within the landslide database. Assessing susceptibility to landsliding requires knowledge of the distribution of failures and an understanding of causative factors and their spatial distribution, whilst understanding the frequency and types of landsliding present is integral to modelling how rainfall will influence the stability of a region. Communication of landslide data through the Natural Hazard Partnership (NHP) contributes to national hazard mitigation and disaster risk reduction with respect to weather and climate. Daily reports of landslide potential are published by BGS through the NHP and data collected for the National Landslide Database is used widely for the creation of these assessments. The National Landslide Database is freely available via an online GIS and is used by a variety of stakeholders for research purposes.

  11. Vehicle noise source heights & sub-source spectra

    DOT National Transportation Integrated Search

    1996-12-01

    This report describes a turn-key system that was developed and implemented to collect the vehicle source height database for incorporation into the new Traffic Noise Model (TNM). A total of 2500 individual vehicle pass-bys were measured with this sys...

  12. The Multiple-Institution Database for Investigating Engineering Longitudinal Development: An Experiential Case Study of Data Sharing and Reuse

    ERIC Educational Resources Information Center

    Ohland, Matthew W.; Long, Russell A.

    2016-01-01

    Sharing longitudinal student record data and merging data from different sources is critical to addressing important questions being asked of higher education. The Multiple-Institution Database for Investigating Engineering Longitudinal Development (MIDFIELD) is a multi-institution, longitudinal, student record level dataset that is used to answer…

  13. Generation of a Database of Laboratory Laser-Induced Breakdown Spectroscopy (LIBS) Spectra and Associated Analysis Software

    NASA Astrophysics Data System (ADS)

    Anderson, R. B.; Clegg, S. M.; Graff, T.; Morris, R. V.; Laura, J.

    2015-06-01

    We describe plans to generate a database of LIBS spectra of planetary analog materials and develop free, open-source software to enable the planetary community to analyze LIBS (and other spectral) data.

  14. THE ART OF DATA MINING THE MINEFIELDS OF TOXICITY ...

    EPA Pesticide Factsheets

    Toxicity databases have a special role in predictive toxicology, providing ready access to historical information throughout the workflow of discovery, development, and product safety processes in drug development as well as in review by regulatory agencies. To provide accurate information within a hypothesesbuilding environment, the content of the databases needs to be rigorously modeled using standards and controlled vocabulary. The utilitarian purposes of databases widely vary, ranging from a source for (Q)SAR datasets for modelers to a basis for

  15. Scale out databases for CERN use cases

    NASA Astrophysics Data System (ADS)

    Baranowski, Zbigniew; Grzybek, Maciej; Canali, Luca; Lanza Garcia, Daniel; Surdy, Kacper

    2015-12-01

    Data generation rates are expected to grow very fast for some database workloads going into LHC run 2 and beyond. In particular this is expected for data coming from controls, logging and monitoring systems. Storing, administering and accessing big data sets in a relational database system can quickly become a very hard technical challenge, as the size of the active data set and the number of concurrent users increase. Scale-out database technologies are a rapidly developing set of solutions for deploying and managing very large data warehouses on commodity hardware and with open source software. In this paper we will describe the architecture and tests on database systems based on Hadoop and the Cloudera Impala engine. We will discuss the results of our tests, including tests of data loading and integration with existing data sources and in particular with relational databases. We will report on query performance tests done with various data sets of interest at CERN, notably data from the accelerator log database.

  16. Intrinsic Radiation Source Generation with the ISC Package: Data Comparisons and Benchmarking

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solomon, Clell J. Jr.

    The characterization of radioactive emissions from unstable isotopes (intrinsic radiation) is necessary for shielding and radiological-dose calculations from radioactive materials. While most radiation transport codes, e.g., MCNP [X-5 Monte Carlo Team, 2003], provide the capability to input user prescribed source definitions, such as radioactive emissions, they do not provide the capability to calculate the correct radioactive-source definition given the material compositions. Special modifications to MCNP have been developed in the past to allow the user to specify an intrinsic source, but these modification have not been implemented into the primary source base [Estes et al., 1988]. To facilitate the descriptionmore » of the intrinsic radiation source from a material with a specific composition, the Intrinsic Source Constructor library (LIBISC) and MCNP Intrinsic Source Constructor (MISC) utility have been written. The combination of LIBISC and MISC will be herein referred to as the ISC package. LIBISC is a statically linkable C++ library that provides the necessary functionality to construct the intrinsic-radiation source generated by a material. Furthermore, LIBISC provides the ability use different particle-emission databases, radioactive-decay databases, and natural-abundance databases allowing the user flexibility in the specification of the source, if one database is preferred over others. LIBISC also provides functionality for aging materials and producing a thick-target bremsstrahlung photon source approximation from the electron emissions. The MISC utility links to LIBISC and facilitates the description of intrinsic-radiation sources into a format directly usable with the MCNP transport code. Through a series of input keywords and arguments the MISC user can specify the material, age the material if desired, and produce a source description of the radioactive emissions from the material in an MCNP readable format. Further details of using the MISC utility can be obtained from the user guide [Solomon, 2012]. The remainder of this report presents a discussion of the databases available to LIBISC and MISC, a discussion of the models employed by LIBISC, a comparison of the thick-target bremsstrahlung model employed, a benchmark comparison to plutonium and depleted-uranium spheres, and a comparison of the available particle-emission databases.« less

  17. The database on transgenic luminescent microorganisms as an instrument of studying a microbial component of closed ecosystems

    NASA Astrophysics Data System (ADS)

    Boyandin, A. N.; Lankin, Y. P.; Kargatova, T. V.; Popova, L. Y.; Pechurkin, N. S.

    Luminescent transgenic microorganisms are widely used for study of microbial communities' functioning including closed ones. Bioluminescence is of high sensitive to effects of different environmental factors. Integration of lux-genes into different metabolic ways allows studying many aspects of microorganisms' life permitting to carry out measurements in situ. There is much information about applications of bioluminescent bacteria in different researches. But for effective using these data their summarizing and accumulation in common source is required. Therefore an information system on characteristics of transgenic microorganisms with cloned lux-genes was created. The database and client software related were developed. A database structure includes information on common characteristics of cloned lux-genes, their sources and properties, on regulation of gene expression in bacterial cells, on dependence of bioluminescence manifestation on biotic, abiotic and anthropogenic environmental factors. The database also can store description of changes in bacterial populations depending on environmental changes. The database created allows storing and using bibliographic information and also links to web sites of world collections of microorganisms. Internet publishing software permitting to open access to the database through the Internet is developed.

  18. Pan European Phenological database (PEP725): a single point of access for European data

    NASA Astrophysics Data System (ADS)

    Templ, Barbara; Koch, Elisabeth; Bolmgren, Kjell; Ungersböck, Markus; Paul, Anita; Scheifinger, Helfried; Rutishauser, This; Busto, Montserrat; Chmielewski, Frank-M.; Hájková, Lenka; Hodzić, Sabina; Kaspar, Frank; Pietragalla, Barbara; Romero-Fresneda, Ramiro; Tolvanen, Anne; Vučetič, Višnja; Zimmermann, Kirsten; Zust, Ana

    2018-06-01

    The Pan European Phenology (PEP) project is a European infrastructure to promote and facilitate phenological research, education, and environmental monitoring. The main objective is to maintain and develop a Pan European Phenological database (PEP725) with an open, unrestricted data access for science and education. PEP725 is the successor of the database developed through the COST action 725 "Establishing a European phenological data platform for climatological applications" working as a single access point for European-wide plant phenological data. So far, 32 European meteorological services and project partners from across Europe have joined and supplied data collected by volunteers from 1868 to the present for the PEP725 database. Most of the partners actively provide data on a regular basis. The database presently holds almost 12 million records, about 46 growing stages and 265 plant species (including cultivars), and can be accessed via http://www.pep725.eu/ . Users of the PEP725 database have studied a diversity of topics ranging from climate change impact, plant physiological question, phenological modeling, and remote sensing of vegetation to ecosystem productivity.

  19. Sources and Trends of Nitrogen Loading to New England Estuaries

    EPA Science Inventory

    A database of nitrogen (N) loading components to estuaries of the conterminous United States has been developed through application of regional SPARROW models. The original SPARROW models predict average detrended loads by source based on average flow conditions and 2002 source t...

  20. Distribution System Upgrade Unit Cost Database

    DOE Data Explorer

    Horowitz, Kelsey

    2017-11-30

    This database contains unit cost information for different components that may be used to integrate distributed photovotaic (D-PV) systems onto distribution systems. Some of these upgrades and costs may also apply to integration of other distributed energy resources (DER). Which components are required, and how many of each, is system-specific and should be determined by analyzing the effects of distributed PV at a given penetration level on the circuit of interest in combination with engineering assessments on the efficacy of different solutions to increase the ability of the circuit to host additional PV as desired. The current state of the distribution system should always be considered in these types of analysis. The data in this database was collected from a variety of utilities, PV developers, technology vendors, and published research reports. Where possible, we have included information on the source of each data point and relevant notes. In some cases where data provided is sensitive or proprietary, we were not able to specify the source, but provide other information that may be useful to the user (e.g. year, location where equipment was installed). NREL has carefully reviewed these sources prior to inclusion in this database. Additional information about the database, data sources, and assumptions is included in the "Unit_cost_database_guide.doc" file included in this submission. This guide provides important information on what costs are included in each entry. Please refer to this guide before using the unit cost database for any purpose.

  1. Databases and Networking for Development. The Organization of Information in Europe in the Field of Policy and Planning for Developing Countries.

    ERIC Educational Resources Information Center

    Lindsay, John

    This work suggests that better organization of existing sources of information available in Europe and better application of these sources to training can result in improved understanding of how information systems work, and it provides an annotated list of some of these sources. The guide opens with an introduction to public policy and urban…

  2. Databases for LDEF results

    NASA Technical Reports Server (NTRS)

    Bohnhoff-Hlavacek, Gail

    1992-01-01

    One of the objectives of the team supporting the LDEF Systems and Materials Special Investigative Groups is to develop databases of experimental findings. These databases identify the hardware flown, summarize results and conclusions, and provide a system for acknowledging investigators, tracing sources of data, and future design suggestions. To date, databases covering the optical experiments, and thermal control materials (chromic acid anodized aluminum, silverized Teflon blankets, and paints) have been developed at Boeing. We used the Filemaker Pro software, the database manager for the Macintosh computer produced by the Claris Corporation. It is a flat, text-retrievable database that provides access to the data via an intuitive user interface, without tedious programming. Though this software is available only for the Macintosh computer at this time, copies of the databases can be saved to a format that is readable on a personal computer as well. Further, the data can be exported to more powerful relational databases, capabilities, and use of the LDEF databases and describe how to get copies of the database for your own research.

  3. EPA’s SPECIATE 4.4 Database:Development and Uses

    EPA Science Inventory

    SPECIATE is the U.S. Environmental Protection Agency's (EPA)repository of volatile organic gas and particulate matter (PM) speciation profiles for air pollution sources. EPA released SPECIATE 4.4 in early 2014 and, in total, the SPECIATE 4.4 database includes 5,728 PM, VOC, total...

  4. EPA’s SPECIATE 4.4 Database: Development and Uses

    EPA Science Inventory

    SPECIATE is the U.S. Environmental Protection Agency's (EPA)repository of volatile organic gas and particulate matter (PM) speciation profiles for air pollution sources. EPA released SPECIATE 4.4 in early 2014 and, in total, the SPECIATE 4.4 database includes 5,728 PM, VOC, total...

  5. Developing a Near Real-time System for Earthquake Slip Distribution Inversion

    NASA Astrophysics Data System (ADS)

    Zhao, Li; Hsieh, Ming-Che; Luo, Yan; Ji, Chen

    2016-04-01

    Advances in observational and computational seismology in the past two decades have enabled completely automatic and real-time determinations of the focal mechanisms of earthquake point sources. However, seismic radiations from moderate and large earthquakes often exhibit strong finite-source directivity effect, which is critically important for accurate ground motion estimations and earthquake damage assessments. Therefore, an effective procedure to determine earthquake rupture processes in near real-time is in high demand for hazard mitigation and risk assessment purposes. In this study, we develop an efficient waveform inversion approach for the purpose of solving for finite-fault models in 3D structure. Full slip distribution inversions are carried out based on the identified fault planes in the point-source solutions. To ensure efficiency in calculating 3D synthetics during slip distribution inversions, a database of strain Green tensors (SGT) is established for 3D structural model with realistic surface topography. The SGT database enables rapid calculations of accurate synthetic seismograms for waveform inversion on a regular desktop or even a laptop PC. We demonstrate our source inversion approach using two moderate earthquakes (Mw~6.0) in Taiwan and in mainland China. Our results show that 3D velocity model provides better waveform fitting with more spatially concentrated slip distributions. Our source inversion technique based on the SGT database is effective for semi-automatic, near real-time determinations of finite-source solutions for seismic hazard mitigation purposes.

  6. Dietary intake and main sources of plant lignans in five European countries

    PubMed Central

    Tetens, Inge; Turrini, Aida; Tapanainen, Heli; Christensen, Tue; Lampe, Johanna W.; Fagt, Sisse; Håkansson, Niclas; Lundquist, Annamari; Hallund, Jesper; Valsta, Liisa M.

    2013-01-01

    Background Dietary intakes of plant lignans have been hypothesized to be inversely associated with the risk of developing cardiovascular disease and cancer. Earlier studies were based on a Finnish lignan database (Fineli®) with two lignan precursors, secoisolariciresinol (SECO) and matairesinol (MAT). More recently, a Dutch database, including SECO and MAT and the newly recognized lignan precursors lariciresinol (LARI) and pinoresinol (PINO), was compiled. The objective was to re-estimate and re-evaluate plant lignan intakes and to identify the main sources of plant lignans in five European countries using the Finnish and Dutch lignan databases, respectively. Methods Forty-two food groups known to contribute to the total lignan intake were selected and attributed a value for SECO and MAT from the Finnish lignan database (Fineli®) or for SECO, MAT, LARI, and PINO from the Dutch database. Total intake of lignans was estimated from food consumption data for adult men and women (19–79 years) from Denmark, Finland, Italy, Sweden, United Kingdom, and the contribution of aggregated food groups calculated using the Dutch lignin database. Results Mean dietary lignan intakes estimated using the Dutch database ranged from 1 to 2 mg/day, which was approximately four-fold higher than the intakes estimated from the Fineli® database. When LARI and PINO were included in the estimation of the total lignan intakes, cereals, grain products, vegetables, fruit and berries were the most important dietary sources of lignans. Conclusion Total lignin intake was approximately four-fold higher in the Dutch lignin database, which includes the lignin precursors LARI and PINO, compared to estimates based on the Finnish database based only on SECO and MAT. The main sources of lignans according to the Dutch database in the five countries studied were cereals and grain products, vegetables, fruit, berries, and beverages. PMID:23766759

  7. The open-source movement: an introduction for forestry professionals

    Treesearch

    Patrick Proctor; Paul C. Van Deusen; Linda S. Heath; Jeffrey H. Gove

    2005-01-01

    In recent years, the open-source movement has yielded a generous and powerful suite of software and utilities that rivals those developed by many commercial software companies. Open-source programs are available for many scientific needs: operating systems, databases, statistical analysis, Geographic Information System applications, and object-oriented programming....

  8. An international database of radionuclide concentration ratios for wildlife: development and uses.

    PubMed

    Copplestone, D; Beresford, N A; Brown, J E; Yankovich, T

    2013-12-01

    A key element of most systems for assessing the impact of radionuclides on the environment is a means to estimate the transfer of radionuclides to organisms. To facilitate this, an international wildlife transfer database has been developed to provide an online, searchable compilation of transfer parameters in the form of equilibrium-based whole-organism to media concentration ratios. This paper describes the derivation of the wildlife transfer database, the key data sources it contains and highlights the applications for the data. Copyright © 2013 Elsevier Ltd. All rights reserved.

  9. ELISA-BASE: An Integrated Bioinformatics Tool for Analyzing and Tracking ELISA Microarray Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    White, Amanda M.; Collett, James L.; Seurynck-Servoss, Shannon L.

    ELISA-BASE is an open-source database for capturing, organizing and analyzing protein enzyme-linked immunosorbent assay (ELISA) microarray data. ELISA-BASE is an extension of the BioArray Soft-ware Environment (BASE) database system, which was developed for DNA microarrays. In order to make BASE suitable for protein microarray experiments, we developed several plugins for importing and analyzing quantitative ELISA microarray data. Most notably, our Protein Microarray Analysis Tool (ProMAT) for processing quantita-tive ELISA data is now available as a plugin to the database.

  10. US EPA's SPECIATE 4.4 Database: Development and Uses

    EPA Science Inventory

    SPECIATE is the U.S. Environmental Protection Agency’s (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. EPA released SPECIATE 4.4 in early 2014 and, in total, the SPECIATE 4.4 database includes 5,728 PM, volatile o...

  11. Development of a Consumer Product Ingredient Database for Chemical ExposureScreening and Prioritization

    EPA Science Inventory

    Consumer products are a primary source of chemical exposures, yet little structured information is available on the chemical ingredients of these products and the concentrations at which ingredients are present. To address this data gap, we created a database of chemicals in cons...

  12. A structured vocabulary for indexing dietary supplements in databases in the United States

    PubMed Central

    Saldanha, Leila G; Dwyer, Johanna T; Holden, Joanne M; Ireland, Jayne D.; Andrews, Karen W; Bailey, Regan L; Gahche, Jaime J.; Hardy, Constance J; Møller, Anders; Pilch, Susan M.; Roseland, Janet M

    2011-01-01

    Food composition databases are critical to assess and plan dietary intakes. Dietary supplement databases are also needed because dietary supplements make significant contributions to total nutrient intakes. However, no uniform system exists for classifying dietary supplement products and indexing their ingredients in such databases. Differing approaches to classifying these products make it difficult to retrieve or link information effectively. A consistent approach to classifying information within food composition databases led to the development of LanguaL™, a structured vocabulary. LanguaL™ is being adapted as an interface tool for classifying and retrieving product information in dietary supplement databases. This paper outlines proposed changes to the LanguaL™ thesaurus for indexing dietary supplement products and ingredients in databases. The choice of 12 of the original 14 LanguaL™ facets pertinent to dietary supplements, modifications to their scopes, and applications are described. The 12 chosen facets are: Product Type; Source; Part of Source; Physical State, Shape or Form; Ingredients; Preservation Method, Packing Medium, Container or Wrapping; Contact Surface; Consumer Group/Dietary Use/Label Claim; Geographic Places and Regions; and Adjunct Characteristics of food. PMID:22611303

  13. Method applied to the background analysis of energy data to be considered for the European Reference Life Cycle Database (ELCD).

    PubMed

    Fazio, Simone; Garraín, Daniel; Mathieux, Fabrice; De la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda

    2015-01-01

    Under the framework of the European Platform on Life Cycle Assessment, the European Reference Life-Cycle Database (ELCD - developed by the Joint Research Centre of the European Commission), provides core Life Cycle Inventory (LCI) data from front-running EU-level business associations and other sources. The ELCD contains energy-related data on power and fuels. This study describes the methods to be used for the quality analysis of energy data for European markets (available in third-party LC databases and from authoritative sources) that are, or could be, used in the context of the ELCD. The methodology was developed and tested on the energy datasets most relevant for the EU context, derived from GaBi (the reference database used to derive datasets for the ELCD), Ecoinvent, E3 and Gemis. The criteria for the database selection were based on the availability of EU-related data, the inclusion of comprehensive datasets on energy products and services, and the general approval of the LCA community. The proposed approach was based on the quality indicators developed within the International Reference Life Cycle Data System (ILCD) Handbook, further refined to facilitate their use in the analysis of energy systems. The overall Data Quality Rating (DQR) of the energy datasets can be calculated by summing up the quality rating (ranging from 1 to 5, where 1 represents very good, and 5 very poor quality) of each of the quality criteria indicators, divided by the total number of indicators considered. The quality of each dataset can be estimated for each indicator, and then compared with the different databases/sources. The results can be used to highlight the weaknesses of each dataset and can be used to guide further improvements to enhance the data quality with regard to the established criteria. This paper describes the application of the methodology to two exemplary datasets, in order to show the potential of the methodological approach. The analysis helps LCA practitioners to evaluate the usefulness of the ELCD datasets for their purposes, and dataset developers and reviewers to derive information that will help improve the overall DQR of databases.

  14. DOE technology information management system database study report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Widing, M.A.; Blodgett, D.W.; Braun, M.D.

    1994-11-01

    To support the missions of the US Department of Energy (DOE) Special Technologies Program, Argonne National Laboratory is defining the requirements for an automated software system that will search electronic databases on technology. This report examines the work done and results to date. Argonne studied existing commercial and government sources of technology databases in five general areas: on-line services, patent database sources, government sources, aerospace technology sources, and general technology sources. First, it conducted a preliminary investigation of these sources to obtain information on the content, cost, frequency of updates, and other aspects of their databases. The Laboratory then performedmore » detailed examinations of at least one source in each area. On this basis, Argonne recommended which databases should be incorporated in DOE`s Technology Information Management System.« less

  15. Data dictionary and discussion for the midnite mine GIS database. Report of investigations/1996

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peters, D.C.; Smith, M.A.; Ferderer, D.A.

    1996-01-18

    A geographic information system (GIS) database has been developed by the U.S. Bureau of Mines (USBM) for the Midnite Mine and surroundings in northeastern Washington State (Stevens County) on the Spokane Indian Reservation. The GIS database was compiled to serve as a repository and source of historical and research information on the mine site. The database also will be used by the Bureau of Land Management and the Bureau of Indian Affairs (as well as others) for environmental assessment and reclamation planning for future remediation and reclamation of the site. This report describes the data in the GIS database andmore » their characteristics. The report also discusses known backgrounds on the data sets and any special considerations encountered by the USBM in developing the database.« less

  16. KaBOB: ontology-based semantic integration of biomedical databases.

    PubMed

    Livingston, Kevin M; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E

    2015-04-23

    The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.

  17. Atlas - a data warehouse for integrative bioinformatics.

    PubMed

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis

    2005-02-21

    We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/

  18. Atlas – a data warehouse for integrative bioinformatics

    PubMed Central

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire MS; Ling, John; Ouellette, BF Francis

    2005-01-01

    Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: PMID:15723693

  19. Improved Dust Forecast Products for Southwest Asia Forecasters through Dust Source Database Advancements

    NASA Astrophysics Data System (ADS)

    Brooks, G. R.

    2011-12-01

    Dust storm forecasting is a critical part of military theater operations in Afghanistan and Iraq as well as other strategic areas of the globe. The Air Force Weather Agency (AFWA) has been using the Dust Transport Application (DTA) as a forecasting tool since 2001. Initially developed by The Johns Hopkins University Applied Physics Laboratory (JHUAPL), output products include dust concentration and reduction of visibility due to dust. The performance of the products depends on several factors including the underlying dust source database, treatment of soil moisture, parameterization of dust processes, and validity of the input atmospheric model data. Over many years of analysis, seasonal dust forecast biases of the DTA have been observed and documented. As these products are unique and indispensible for U.S. and NATO forces, amendments were required to provide the best forecasts possible. One of the quickest ways to scientifically address the dust concentration biases noted over time was to analyze the weaknesses in, and adjust the dust source database. Dust source database strengths and weaknesses, the satellite analysis and adjustment process, and tests which confirmed the resulting improvements in the final dust concentration and visibility products will be shown.

  20. Great Basin paleontological database

    USGS Publications Warehouse

    Zhang, N.; Blodgett, R.B.; Hofstra, A.H.

    2008-01-01

    The U.S. Geological Survey has constructed a paleontological database for the Great Basin physiographic province that can be served over the World Wide Web for data entry, queries, displays, and retrievals. It is similar to the web-database solution that we constructed for Alaskan paleontological data (www.alaskafossil.org). The first phase of this effort was to compile a paleontological bibliography for Nevada and portions of adjacent states in the Great Basin that has recently been completed. In addition, we are also compiling paleontological reports (Known as E&R reports) of the U.S. Geological Survey, which are another extensive source of l,egacy data for this region. Initial population of the database benefited from a recently published conodont data set and is otherwise focused on Devonian and Mississippian localities because strata of this age host important sedimentary exhalative (sedex) Au, Zn, and barite resources and enormons Carlin-type An deposits. In addition, these strata are the most important petroleum source rocks in the region, and record the transition from extension to contraction associated with the Antler orogeny, the Alamo meteorite impact, and biotic crises associated with global oceanic anoxic events. The finished product will provide an invaluable tool for future geologic mapping, paleontological research, and mineral resource investigations in the Great Basin, making paleontological data acquired over nearly the past 150 yr readily available over the World Wide Web. A description of the structure of the database and the web interface developed for this effort are provided herein. This database is being used ws a model for a National Paleontological Database (which we am currently developing for the U.S. Geological Survey) as well as for other paleontological databases now being developed in other parts of the globe. ?? 2008 Geological Society of America.

  1. Sources of Free and Open Source Spatial Data for Natural Disasters and Principles for Use in Developing Country Contexts

    NASA Astrophysics Data System (ADS)

    Taylor, Faith E.; Malamud, Bruce D.; Millington, James D. A.

    2016-04-01

    Access to reliable spatial and quantitative datasets (e.g., infrastructure maps, historical observations, environmental variables) at regional and site specific scales can be a limiting factor for understanding hazards and risks in developing country settings. Here we present a 'living database' of >75 freely available data sources relevant to hazard and risk in Africa (and more globally). Data sources include national scientific foundations, non-governmental bodies, crowd-sourced efforts, academic projects, special interest groups and others. The database is available at http://tinyurl.com/africa-datasets and is continually being updated, particularly in the context of broader natural hazards research we are doing in the context of Malawi and Kenya. For each data source, we review the spatiotemporal resolution and extent and make our own assessments of reliability and usability of datasets. Although such freely available datasets are sometimes presented as a panacea to improving our understanding of hazards and risk in developing countries, there are both pitfalls and opportunities unique to using this type of freely available data. These include factors such as resolution, homogeneity, uncertainty, access to metadata and training for usage. Based on our experience, use in the field and grey/peer-review literature, we present a suggested set of guidelines for using these free and open source data in developing country contexts.

  2. U.S. Army Research Laboratory (ARL) multimodal signatures database

    NASA Astrophysics Data System (ADS)

    Bennett, Kelly

    2008-04-01

    The U.S. Army Research Laboratory (ARL) Multimodal Signatures Database (MMSDB) is a centralized collection of sensor data of various modalities that are co-located and co-registered. The signatures include ground and air vehicles, personnel, mortar, artillery, small arms gunfire from potential sniper weapons, explosives, and many other high value targets. This data is made available to Department of Defense (DoD) and DoD contractors, Intel agencies, other government agencies (OGA), and academia for use in developing target detection, tracking, and classification algorithms and systems to protect our Soldiers. A platform independent Web interface disseminates the signatures to researchers and engineers within the scientific community. Hierarchical Data Format 5 (HDF5) signature models provide an excellent solution for the sharing of complex multimodal signature data for algorithmic development and database requirements. Many open source tools for viewing and plotting HDF5 signatures are available over the Web. Seamless integration of HDF5 signatures is possible in both proprietary computational environments, such as MATLAB, and Free and Open Source Software (FOSS) computational environments, such as Octave and Python, for performing signal processing, analysis, and algorithm development. Future developments include extending the Web interface into a portal system for accessing ARL algorithms and signatures, High Performance Computing (HPC) resources, and integrating existing database and signature architectures into sensor networking environments.

  3. SPECIATE Version 4.4 Database Development Documentation

    EPA Science Inventory

    SPECIATE is the U.S. Environmental Protection Agency’s (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. Some of the many uses of these source profiles include: (1) creating speciated emissions inventories for regi...

  4. SPECIATE 4.2: speciation Database Development Documentation

    EPA Science Inventory

    SPECIATE is the U.S. Environmental Protection Agency's (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. Among the many uses of speciation data, these source profiles are used to: (1) create speciated emissions inve...

  5. Charting a Path to Location Intelligence for STD Control.

    PubMed

    Gerber, Todd M; Du, Ping; Armstrong-Brown, Janelle; McNutt, Louise-Anne; Coles, F Bruce

    2009-01-01

    This article describes the New York State Department of Health's GeoDatabase project, which developed new methods and techniques for designing and building a geocoding and mapping data repository for sexually transmitted disease (STD) control. The GeoDatabase development was supported through the Centers for Disease Control and Prevention's Outcome Assessment through Systems of Integrated Surveillance workgroup. The design and operation of the GeoDatabase relied upon commercial-off-the-shelf tools that other public health programs may also use for disease-control systems. This article provides a blueprint of the structure and software used to build the GeoDatabase and integrate location data from multiple data sources into the everyday activities of STD control programs.

  6. Development of a consumer product ingredient database for chemical exposure screening and prioritization.

    PubMed

    Goldsmith, M-R; Grulke, C M; Brooks, R D; Transue, T R; Tan, Y M; Frame, A; Egeghy, P P; Edwards, R; Chang, D T; Tornero-Velez, R; Isaacs, K; Wang, A; Johnson, J; Holm, K; Reich, M; Mitchell, J; Vallero, D A; Phillips, L; Phillips, M; Wambaugh, J F; Judson, R S; Buckley, T J; Dary, C C

    2014-03-01

    Consumer products are a primary source of chemical exposures, yet little structured information is available on the chemical ingredients of these products and the concentrations at which ingredients are present. To address this data gap, we created a database of chemicals in consumer products using product Material Safety Data Sheets (MSDSs) publicly provided by a large retailer. The resulting database represents 1797 unique chemicals mapped to 8921 consumer products and a hierarchy of 353 consumer product "use categories" within a total of 15 top-level categories. We examine the utility of this database and discuss ways in which it will support (i) exposure screening and prioritization, (ii) generic or framework formulations for several indoor/consumer product exposure modeling initiatives, (iii) candidate chemical selection for monitoring near field exposure from proximal sources, and (iv) as activity tracers or ubiquitous exposure sources using "chemical space" map analyses. Chemicals present at high concentrations and across multiple consumer products and use categories that hold high exposure potential are identified. Our database is publicly available to serve regulators, retailers, manufacturers, and the public for predictive screening of chemicals in new and existing consumer products on the basis of exposure and risk. Published by Elsevier Ltd.

  7. Multi-source and ontology-based retrieval engine for maize mutant phenotypes

    PubMed Central

    Green, Jason M.; Harnsomburana, Jaturon; Schaeffer, Mary L.; Lawrence, Carolyn J.; Shyu, Chi-Ren

    2011-01-01

    Model Organism Databases, including the various plant genome databases, collect and enable access to massive amounts of heterogeneous information, including sequence data, gene product information, images of mutant phenotypes, etc, as well as textual descriptions of many of these entities. While a variety of basic browsing and search capabilities are available to allow researchers to query and peruse the names and attributes of phenotypic data, next-generation search mechanisms that allow querying and ranking of text descriptions are much less common. In addition, the plant community needs an innovative way to leverage the existing links in these databases to search groups of text descriptions simultaneously. Furthermore, though much time and effort have been afforded to the development of plant-related ontologies, the knowledge embedded in these ontologies remains largely unused in available plant search mechanisms. Addressing these issues, we have developed a unique search engine for mutant phenotypes from MaizeGDB. This advanced search mechanism integrates various text description sources in MaizeGDB to aid a user in retrieving desired mutant phenotype information. Currently, descriptions of mutant phenotypes, loci and gene products are utilized collectively for each search, though expansion of the search mechanism to include other sources is straightforward. The retrieval engine, to our knowledge, is the first engine to exploit the content and structure of available domain ontologies, currently the Plant and Gene Ontologies, to expand and enrich retrieval results in major plant genomic databases. Database URL: http:www.PhenomicsWorld.org/QBTA.php PMID:21558151

  8. Human health risk assessment database, "the NHSRC toxicity value database": supporting the risk assessment process at US EPA's National Homeland Security Research Center.

    PubMed

    Moudgal, Chandrika J; Garrahan, Kevin; Brady-Roberts, Eletha; Gavrelis, Naida; Arbogast, Michelle; Dun, Sarah

    2008-11-15

    The toxicity value database of the United States Environmental Protection Agency's (EPA) National Homeland Security Research Center has been in development since 2004. The toxicity value database includes a compilation of agent property, toxicity, dose-response, and health effects data for 96 agents: 84 chemical and radiological agents and 12 biotoxins. The database is populated with multiple toxicity benchmark values and agent property information from secondary sources, with web links to the secondary sources, where available. A selected set of primary literature citations and associated dose-response data are also included. The toxicity value database offers a powerful means to quickly and efficiently gather pertinent toxicity and dose-response data for a number of agents that are of concern to the nation's security. This database, in conjunction with other tools, will play an important role in understanding human health risks, and will provide a means for risk assessors and managers to make quick and informed decisions on the potential health risks and determine appropriate responses (e.g., cleanup) to agent release. A final, stand alone MS ACESSS working version of the toxicity value database was completed in November, 2007.

  9. MODEL-BASED HYDROACOUSTIC BLOCKAGE ASSESSMENT AND DEVELOPMENT OF AN EXPLOSIVE SOURCE DATABASE

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matzel, E; Ramirez, A; Harben, P

    2005-07-11

    We are continuing the development of the Hydroacoustic Blockage Assessment Tool (HABAT) which is designed for use by analysts to predict which hydroacoustic monitoring stations can be used in discrimination analysis for any particular event. The research involves two approaches (1) model-based assessment of blockage, and (2) ground-truth data-based assessment of blockage. The tool presents the analyst with a map of the world, and plots raypath blockages from stations to sources. The analyst inputs source locations and blockage criteria, and the tool returns a list of blockage status from all source locations to all hydroacoustic stations. We are currently usingmore » the tool in an assessment of blockage criteria for simple direct-path arrivals. Hydroacoustic data, predominantly from earthquake sources, are read in and assessed for blockage at all available stations. Several measures are taken. First, can the event be observed at a station above background noise? Second, can we establish backazimuth from the station to the source. Third, how large is the decibel drop at one station relative to other stations. These observational results are then compared with model estimates to identify the best set of blockage criteria and used to create a set of blockage maps for each station. The model-based estimates are currently limited by the coarse bathymetry of existing databases and by the limitations inherent in the raytrace method. In collaboration with BBN Inc., the Hydroacoustic Coverage Assessment Model (HydroCAM) that generates the blockage files that serve as input to HABAT, is being extended to include high-resolution bathymetry databases in key areas that increase model-based blockage assessment reliability. An important aspect of this capability is to eventually include reflected T-phases where they reliably occur and to identify the associated reflectors. To assess how well any given hydroacoustic discriminant works in separating earthquake and in-water explosion populations it is necessary to have both a database of reference earthquake events and of reference in-water explosive events. Although reference earthquake events are readily available, explosive reference events are not. Consequently, building an in-water explosion reference database requires the compilation of events from many sources spanning a long period of time. We have developed a database of small implosive and explosive reference events from the 2003 Indian Ocean Cruise data. These events were recorded at some or all of the IMS Indian Ocean hydroacoustic stations: Diego Garcia, Cape Leeuwin, and Crozet Island. We have also reviewed many historical large in-water explosions and identified five that have adequate source information and can be positively associated to the hydrophone recordings. The five events are: Cannekin, Longshot, CHASE-3, CHASE-5, and IITRI-1. Of these, the first two are nuclear tests on land but near water. The latter three are in-water conventional explosive events with yields from ten to hundreds of tons TNT equivalent. The objective of this research is to enhance discrimination capabilities for events located in the world's oceans. Two research and development efforts are needed to achieve this: (1) improvement in discrimination algorithms and their joint statistical application to events, and (2) development of an automated and accurate blockage prediction capability that will identify all stations and phases (direct and reflected) from a given event that will have adequate signal to be used in a discrimination analysis. The strategy for improving blockage prediction in the world's oceans is to improve model-based prediction of blockage and to develop a ground-truth database of reference events to assess blockage. Currently, research is focused on the development of a blockage assessment software tool. The tool is envisioned to develop into a sophisticated and unifying package that optimally and automatically assesses both model and data based blockage predictions in all ocean basins, for all NDC stations, and accounting for reflected phases (Pulli et al., 2000). Currently, we have focused our efforts on the Diego Garcia, Cape Leeuwin and Crozet Island hydroacoustic stations in the Indian Ocean.« less

  10. Development of SRS.php, a Simple Object Access Protocol-based library for data acquisition from integrated biological databases.

    PubMed

    Barbosa-Silva, A; Pafilis, E; Ortega, J M; Schneider, R

    2007-12-11

    Data integration has become an important task for biological database providers. The current model for data exchange among different sources simplifies the manner that distinct information is accessed by users. The evolution of data representation from HTML to XML enabled programs, instead of humans, to interact with biological databases. We present here SRS.php, a PHP library that can interact with the data integration Sequence Retrieval System (SRS). The library has been written using SOAP definitions, and permits the programmatic communication through webservices with the SRS. The interactions are possible by invoking the methods described in WSDL by exchanging XML messages. The current functions available in the library have been built to access specific data stored in any of the 90 different databases (such as UNIPROT, KEGG and GO) using the same query syntax format. The inclusion of the described functions in the source of scripts written in PHP enables them as webservice clients to the SRS server. The functions permit one to query the whole content of any SRS database, to list specific records in these databases, to get specific fields from the records, and to link any record among any pair of linked databases. The case study presented exemplifies the library usage to retrieve information regarding registries of a Plant Defense Mechanisms database. The Plant Defense Mechanisms database is currently being developed, and the proposal of SRS.php library usage is to enable the data acquisition for the further warehousing tasks related to its setup and maintenance.

  11. Column Store for GWAC: A High-cadence, High-density, Large-scale Astronomical Light Curve Pipeline and Distributed Shared-nothing Database

    NASA Astrophysics Data System (ADS)

    Wan, Meng; Wu, Chao; Wang, Jing; Qiu, Yulei; Xin, Liping; Mullender, Sjoerd; Mühleisen, Hannes; Scheers, Bart; Zhang, Ying; Nes, Niels; Kersten, Martin; Huang, Yongpan; Deng, Jinsong; Wei, Jianyan

    2016-11-01

    The ground-based wide-angle camera array (GWAC), a part of the SVOM space mission, will search for various types of optical transients by continuously imaging a field of view (FOV) of 5000 degrees2 every 15 s. Each exposure consists of 36 × 4k × 4k pixels, typically resulting in 36 × ˜175,600 extracted sources. For a modern time-domain astronomy project like GWAC, which produces massive amounts of data with a high cadence, it is challenging to search for short timescale transients in both real-time and archived data, and to build long-term light curves for variable sources. Here, we develop a high-cadence, high-density light curve pipeline (HCHDLP) to process the GWAC data in real-time, and design a distributed shared-nothing database to manage the massive amount of archived data which will be used to generate a source catalog with more than 100 billion records during 10 years of operation. First, we develop HCHDLP based on the column-store DBMS of MonetDB, taking advantage of MonetDB’s high performance when applied to massive data processing. To realize the real-time functionality of HCHDLP, we optimize the pipeline in its source association function, including both time and space complexity from outside the database (SQL semantic) and inside (RANGE-JOIN implementation), as well as in its strategy of building complex light curves. The optimized source association function is accelerated by three orders of magnitude. Second, we build a distributed database using a two-level time partitioning strategy via the MERGE TABLE and REMOTE TABLE technology of MonetDB. Intensive tests validate that our database architecture is able to achieve both linear scalability in response time and concurrent access by multiple users. In summary, our studies provide guidance for a solution to GWAC in real-time data processing and management of massive data.

  12. Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices.

    PubMed

    Dececchi, T Alex; Mabee, Paula M; Blackburn, David C

    2016-01-01

    Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications ('monographs') and those used in phylogenetic analyses ('matrices'). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life.

  13. Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices

    PubMed Central

    Dececchi, T. Alex; Mabee, Paula M.; Blackburn, David C.

    2016-01-01

    Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications (‘monographs’) and those used in phylogenetic analyses (‘matrices’). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life. PMID:27191170

  14. Development of Databases on Iodine in Foods and Dietary Supplements

    PubMed Central

    Ershow, Abby G.; Skeaff, Sheila A.; Merkel, Joyce M.; Pehrsson, Pamela R.

    2018-01-01

    Iodine is an essential micronutrient required for normal growth and neurodevelopment; thus, an adequate intake of iodine is particularly important for pregnant and lactating women, and throughout childhood. Low levels of iodine in the soil and groundwater are common in many parts of the world, often leading to diets that are low in iodine. Widespread salt iodization has eradicated severe iodine deficiency, but mild-to-moderate deficiency is still prevalent even in many developed countries. To understand patterns of iodine intake and to develop strategies for improving intake, it is important to characterize all sources of dietary iodine, and national databases on the iodine content of major dietary contributors (including foods, beverages, water, salts, and supplements) provide a key information resource. This paper discusses the importance of well-constructed databases on the iodine content of foods, beverages, and dietary supplements; the availability of iodine databases worldwide; and factors related to variability in iodine content that should be considered when developing such databases. We also describe current efforts in iodine database development in the United States, the use of iodine composition data to develop food fortification policies in New Zealand, and how iodine content databases might be used when considering the iodine intake and status of individuals and populations. PMID:29342090

  15. IMPROVING EMISSIONS ESTIMATES WITH COMPUTATIONAL INTELLIGENCE, DATABASE EXPANSION, AND COMPREHENSIVE VALIDATION

    EPA Science Inventory

    The report discusses an EPA investigation of techniques to improve methods for estimating volatile organic compound (VOC) emissions from area sources. Using the automobile refinishing industry for a detailed area source case study, an emission estimation method is being developed...

  16. The Development and Uses of EPA's SPECIATE Database

    EPA Science Inventory

    SPECIATE is the U.S. Environmental Protection Agency's (EPA) repository of volatile organic compounds (VOC) and particulate matter (PM) speciation profiles of air pollution sources. These source profiles can be used to (l) provide input to chemical mass balance (CMB) receptor mod...

  17. International patent analysis of water source heat pump based on orbit database

    NASA Astrophysics Data System (ADS)

    Li, Na

    2018-02-01

    Using orbit database, this paper analysed the international patents of water source heat pump (WSHP) industry with patent analysis methods such as analysis of publication tendency, geographical distribution, technology leaders and top assignees. It is found that the beginning of the 21st century is a period of rapid growth of the patent application of WSHP. Germany and the United States had done researches and development of WSHP in an early time, but now Japan and China have become important countries of patent applications. China has been developing faster and faster in recent years, but the patents are concentrated in universities and urgent to be transferred. Through an objective analysis, this paper aims to provide appropriate decision references for the development of domestic WSHP industry.

  18. Initiation of a Database of CEUS Ground Motions for NGA East

    NASA Astrophysics Data System (ADS)

    Cramer, C. H.

    2007-12-01

    The Nuclear Regulatory Commission has funded the first stage of development of a database of central and eastern US (CEUS) broadband and accelerograph records, along the lines of the existing Next Generation Attenuation (NGA) database for active tectonic areas. This database will form the foundation of an NGA East project for the development of CEUS ground-motion prediction equations that include the effects of soils. This initial effort covers the development of a database design and the beginning of data collection to populate the database. It also includes some processing for important source parameters (Brune corner frequency and stress drop) and site parameters (kappa, Vs30). Besides collecting appropriate earthquake recordings and information, existing information about site conditions at recording sites will also be gathered, including geology and geotechnical information. The long-range goal of the database development is to complete the database and make it available in 2010. The database design is centered on CEUS ground motion information needs but is built on the Pacific Earthquake Engineering Research Center's (PEER) NGA experience. Documentation from the PEER NGA website was reviewed and relevant fields incorporated into the CEUS database design. CEUS database tables include ones for earthquake, station, component, record, and references. As was done for NGA, a CEUS ground- motion flat file of key information will be extracted from the CEUS database for use in attenuation relation development. A short report on the CEUS database and several initial design-definition files are available at https://umdrive.memphis.edu:443/xythoswfs/webui/_xy-7843974_docstore1. Comments and suggestions on the database design can be sent to the author. More details will be presented in a poster at the meeting.

  19. Defense Against National Vulnerabilities in Public Data

    DTIC Science & Technology

    2017-02-28

    ingestion of subscription based precision data sources ( Business Intelligence Databases, Monster, others).  Flexible data architecture that allows for... Architecture Objective: Develop a data acquisition architecture that can successfully ingest 1,000,000 records per hour from up to 100 different open...data sources.  Developed and operate a data acquisition architecture comprised of the four following major components:  Robust website

  20. NeMedPlant: a database of therapeutic applications and chemical constituents of medicinal plants from north-east region of India

    PubMed Central

    Meetei, Potshangbam Angamba; Singh, Pankaj; Nongdam, Potshangbam; Prabhu, N Prakash; Rathore, RS; Vindal, Vaibhav

    2012-01-01

    The North-East region of India is one of the twelve mega biodiversity region, containing many rare and endangered species. A curated database of medicinal and aromatic plants from the regions called NeMedPlant is developed. The database contains traditional, scientific and medicinal information about plants and their active constituents, obtained from scholarly literature and local sources. The database is cross-linked with major biochemical databases and analytical tools. The integrated database provides resource for investigations into hitherto unexplored medicinal plants and serves to speed up the discovery of natural productsbased drugs. Availability The database is available for free at http://bif.uohyd.ac.in/nemedplant/orhttp://202.41.85.11/nemedplant/ PMID:22419844

  1. Greenhouse Gas Mitigation Options Database and Tool - Data ...

    EPA Pesticide Factsheets

    Industry and electricity production facilities generate over 50 percent of greenhouse gas (GHG) emissions in the United States. There is a growing consensus among scientists that the primary cause of climate change is anthropogenic greenhouse gas (GHG) emissions. Reducing GHG emissions from these sources is a key part of the United States’ strategy to reduce the impacts of these global-warming emissions. As a result of the recent focus on GHG emissions, the U.S. Environmental Protection Agency (EPA) and state agencies are implementing policies and programs to quantify and regulate GHG emissions from key emitting sources in the United States. These policies and programs have generated a need for a reliable source of information regarding GHG mitigation options for both industry and regulators. In response to this need, EPA developed a comprehensive GHG mitigation options database (GMOD) that was compiled based on information from industry, government research agencies, and academia. The GMOD and Tool (GMODT) is a comprehensive data repository and analytical tool being developed by EPA to evaluate alternative GHG mitigation options for several high-emitting industry sectors, including electric power plants, cement plants, refineries, landfills and other industrial sources of GHGs. The data is collected from credible sources including peer-reviewed journals, reports, and others government and academia data sources which include performance, applicability, develop

  2. Desiderata for a Computer-Assisted Audit Tool for Clinical Data Source Verification Audits

    PubMed Central

    Duda, Stephany N.; Wehbe, Firas H.; Gadd, Cynthia S.

    2013-01-01

    Clinical data auditing often requires validating the contents of clinical research databases against source documents available in health care settings. Currently available data audit software, however, does not provide features necessary to compare the contents of such databases to source data in paper medical records. This work enumerates the primary weaknesses of using paper forms for clinical data audits and identifies the shortcomings of existing data audit software, as informed by the experiences of an audit team evaluating data quality for an international research consortium. The authors propose a set of attributes to guide the development of a computer-assisted clinical data audit tool to simplify and standardize the audit process. PMID:20841814

  3. Development of a database system for operational use in the selection of titanium alloys

    NASA Astrophysics Data System (ADS)

    Han, Yuan-Fei; Zeng, Wei-Dong; Sun, Yu; Zhao, Yong-Qing

    2011-08-01

    The selection of titanium alloys has become a complex decision-making task due to the growing number of creation and utilization for titanium alloys, with each having its own characteristics, advantages, and limitations. In choosing the most appropriate titanium alloys, it is very essential to offer a reasonable and intelligent service for technical engineers. One possible solution of this problem is to develop a database system (DS) to help retrieve rational proposals from different databases and information sources and analyze them to provide useful and explicit information. For this purpose, a design strategy of the fuzzy set theory is proposed, and a distributed database system is developed. Through ranking of the candidate titanium alloys, the most suitable material is determined. It is found that the selection results are in good agreement with the practical situation.

  4. [Construction and application of special analysis database of geoherbs based on 3S technology].

    PubMed

    Guo, Lan-ping; Huang, Lu-qi; Lv, Dong-mei; Shao, Ai-juan; Wang, Jian

    2007-09-01

    In this paper,the structures, data sources, data codes of "the spacial analysis database of geoherbs" based 3S technology are introduced, and the essential functions of the database, such as data management, remote sensing, spacial interpolation, spacial statistics, spacial analysis and developing are described. At last, two examples for database usage are given, the one is classification and calculating of NDVI index of remote sensing image in geoherbal area of Atractylodes lancea, the other one is adaptation analysis of A. lancea. These indicate that "the spacial analysis database of geoherbs" has bright prospect in spacial analysis of geoherbs.

  5. Online Databases in Physics.

    ERIC Educational Resources Information Center

    Sievert, MaryEllen C.; Verbeck, Alison F.

    1984-01-01

    This overview of 47 online sources for physics information available in the United States--including sub-field databases, transdisciplinary databases, and multidisciplinary databases-- notes content, print source, language, time coverage, and databank. Two discipline-specific databases (SPIN and PHYSICS BRIEFS) are also discussed. (EJS)

  6. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases

    PubMed Central

    Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B

    2015-01-01

    Objectives To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Materials and methods Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Results Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. Discussion The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Conclusion Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. PMID:25670757

  7. FCDD: A Database for Fruit Crops Diseases.

    PubMed

    Chauhan, Rupal; Jasrai, Yogesh; Pandya, Himanshu; Chaudhari, Suman; Samota, Chand Mal

    2014-01-01

    Fruit Crops Diseases Database (FCDD) requires a number of biotechnology and bioinformatics tools. The FCDD is a unique bioinformatics resource that compiles information about 162 details on fruit crops diseases, diseases type, its causal organism, images, symptoms and their control. The FCDD contains 171 phytochemicals from 25 fruits, their 2D images and their 20 possible sequences. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, textbooks and scientific journals. FCDD is fully searchable and supports extensive text search. The main focus of the FCDD is on providing possible information of fruit crops diseases, which will help in discovery of potential drugs from one of the common bioresource-fruits. The database was developed using MySQL. The database interface is developed in PHP, HTML and JAVA. FCDD is freely available. http://www.fruitcropsdd.com/

  8. The Role of Free/Libre and Open Source Software in Learning Health Systems.

    PubMed

    Paton, C; Karopka, T

    2017-08-01

    Objective: To give an overview of the role of Free/Libre and Open Source Software (FLOSS) in the context of secondary use of patient data to enable Learning Health Systems (LHSs). Methods: We conducted an environmental scan of the academic and grey literature utilising the MedFLOSS database of open source systems in healthcare to inform a discussion of the role of open source in developing LHSs that reuse patient data for research and quality improvement. Results: A wide range of FLOSS is identified that contributes to the information technology (IT) infrastructure of LHSs including operating systems, databases, frameworks, interoperability software, and mobile and web apps. The recent literature around the development and use of key clinical data management tools is also reviewed. Conclusions: FLOSS already plays a critical role in modern health IT infrastructure for the collection, storage, and analysis of patient data. The nature of FLOSS systems to be collaborative, modular, and modifiable may make open source approaches appropriate for building the digital infrastructure for a LHS. Georg Thieme Verlag KG Stuttgart.

  9. NASA Cold Land Processes Experiment (CLPX 2002/03): Ground-based and near-surface meteorological observations

    Treesearch

    Kelly Elder; Don Cline; Angus Goodbody; Paul Houser; Glen E. Liston; Larry Mahrt; Nick Rutter

    2009-01-01

    A short-term meteorological database has been developed for the Cold Land Processes Experiment (CLPX). This database includes meteorological observations from stations designed and deployed exclusively for CLPXas well as observations available from other sources located in the small regional study area (SRSA) in north-central Colorado. The measured weather parameters...

  10. An Improved Model for Operational Specification of the Electron Density Structure up to Geosynchronous Heights

    DTIC Science & Technology

    2010-07-01

    http://www.iono.noa.gr/ElectronDensity/EDProfile.php The web service has been developed with the following open source tools: a) PHP , for the... MySQL for the database, which was based on the enhancement of the DIAS database. Below we present some screen shots to demonstrate the functionality

  11. Omics databases on kidney disease: where they can be found and how to benefit from them.

    PubMed

    Papadopoulos, Theofilos; Krochmal, Magdalena; Cisek, Katryna; Fernandes, Marco; Husi, Holger; Stevens, Robert; Bascands, Jean-Loup; Schanstra, Joost P; Klein, Julie

    2016-06-01

    In the recent decades, the evolution of omics technologies has led to advances in all biological fields, creating a demand for effective storage, management and exchange of rapidly generated data and research discoveries. To address this need, the development of databases of experimental outputs has become a common part of scientific practice in order to serve as knowledge sources and data-sharing platforms, providing information about genes, transcripts, proteins or metabolites. In this review, we present omics databases available currently, with a special focus on their application in kidney research and possibly in clinical practice. Databases are divided into two categories: general databases with a broad information scope and kidney-specific databases distinctively concentrated on kidney pathologies. In research, databases can be used as a rich source of information about pathophysiological mechanisms and molecular targets. In the future, databases will support clinicians with their decisions, providing better and faster diagnoses and setting the direction towards more preventive, personalized medicine. We also provide a test case demonstrating the potential of biological databases in comparing multi-omics datasets and generating new hypotheses to answer a critical and common diagnostic problem in nephrology practice. In the future, employment of databases combined with data integration and data mining should provide powerful insights into unlocking the mysteries of kidney disease, leading to a potential impact on pharmacological intervention and therapeutic disease management.

  12. NOAA Propagation Database Value in Tsunami Forecast Guidance

    NASA Astrophysics Data System (ADS)

    Eble, M. C.; Wright, L. M.

    2016-02-01

    The National Oceanic and Atmospheric Administration (NOAA) Center for Tsunami Research (NCTR) has developed a tsunami forecasting capability that combines a graphical user interface with data ingestion and numerical models to produce estimates of tsunami wave arrival times, amplitudes, current or water flow rates, and flooding at specific coastal communities. The capability integrates several key components: deep-ocean observations of tsunamis in real-time, a basin-wide pre-computed propagation database of water level and flow velocities based on potential pre-defined seismic unit sources, an inversion or fitting algorithm to refine the tsunami source based on the observations during an event, and tsunami forecast models. As tsunami waves propagate across the ocean, observations from the deep ocean are automatically ingested into the application in real-time to better define the source of the tsunami itself. Since passage of tsunami waves over a deep ocean reporting site is not immediate, we explore the value of the NOAA propagation database in providing placeholder forecasts in advance of deep ocean observations. The propagation database consists of water elevations and flow velocities pre-computed for 50 x 100 [km] unit sources in a continuous series along all known ocean subduction zones. The 2011 Japan Tohoku tsunami is presented as the case study

  13. SPECIATE 4.3: Addendum to SPECIATE 4.2--Speciation database development documentation

    EPA Science Inventory

    SPECIATE is the U.S. Environmental Protection Agency's (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. Among the many uses of speciation data, these source profiles are used to: (1) create speciated emissions inve...

  14. ACToR Chemical Structure processing using Open Source ChemInformatics Libraries (FutureToxII)

    EPA Science Inventory

    ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from ove...

  15. A Methodology for the Development of a Reliability Database for an Advanced Reactor Probabilistic Risk Assessment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grabaskas, Dave; Brunett, Acacia J.; Bucknor, Matthew

    GE Hitachi Nuclear Energy (GEH) and Argonne National Laboratory are currently engaged in a joint effort to modernize and develop probabilistic risk assessment (PRA) techniques for advanced non-light water reactors. At a high level the primary outcome of this project will be the development of next-generation PRA methodologies that will enable risk-informed prioritization of safety- and reliability-focused research and development, while also identifying gaps that may be resolved through additional research. A subset of this effort is the development of a reliability database (RDB) methodology to determine applicable reliability data for inclusion in the quantification of the PRA. The RDBmore » method developed during this project seeks to satisfy the requirements of the Data Analysis element of the ASME/ANS Non-LWR PRA standard. The RDB methodology utilizes a relevancy test to examine reliability data and determine whether it is appropriate to include as part of the reliability database for the PRA. The relevancy test compares three component properties to establish the level of similarity to components examined as part of the PRA. These properties include the component function, the component failure modes, and the environment/boundary conditions of the component. The relevancy test is used to gauge the quality of data found in a variety of sources, such as advanced reactor-specific databases, non-advanced reactor nuclear databases, and non-nuclear databases. The RDB also establishes the integration of expert judgment or separate reliability analysis with past reliability data. This paper provides details on the RDB methodology, and includes an example application of the RDB methodology for determining the reliability of the intermediate heat exchanger of a sodium fast reactor. The example explores a variety of reliability data sources, and assesses their applicability for the PRA of interest through the use of the relevancy test.« less

  16. An open-source, mobile-friendly search engine for public medical knowledge.

    PubMed

    Samwald, Matthias; Hanbury, Allan

    2014-01-01

    The World Wide Web has become an important source of information for medical practitioners. To complement the capabilities of currently available web search engines we developed FindMeEvidence, an open-source, mobile-friendly medical search engine. In a preliminary evaluation, the quality of results from FindMeEvidence proved to be competitive with those from TRIP Database, an established, closed-source search engine for evidence-based medicine.

  17. Development of a Consumer Product Ingredient Database for ...

    EPA Pesticide Factsheets

    Consumer products are a primary source of chemical exposures, yet little structured information is available on the chemical ingredients of these products and the concentrations at which ingredients are present. To address this data gap, we created a database of chemicals in consumer products using product Material Safety Data Sheets (MSDSs) publicly provided by a large retailer. The resulting database represents 1797 unique chemicals mapped to 8921 consumer products and a hierarchy of 353 consumer product “use categories” within a total of 15 top-level categories. We examine the utility of this database and discuss ways in which it will support (i) exposure screening and prioritization, (ii) generic or framework formulations for several indoor/consumer product exposure modeling initiatives, (iii) candidate chemical selection for monitoring near field exposure from proximal sources, and (iv) as activity tracers or ubiquitous exposure sources using “chemical space” map analyses. Chemicals present at high concentrations and across multiple consumer products and use categories that hold high exposure potential are identified. Our database is publicly available to serve regulators, retailers, manufacturers, and the public for predictive screening of chemicals in new and existing consumer products on the basis of exposure and risk. The National Exposure Research Laboratory’s (NERL’s) Human Exposure and Atmospheric Sciences Division (HEASD) conducts resear

  18. Critically evaluated/distributed database of IRAS LRS spectra

    NASA Technical Reports Server (NTRS)

    Stencel, R. E.

    1993-01-01

    Accomplishments under this grant effort include: successful scientific utilization of the IRAS Low Resolution Spectrometer (LRS) database of over 150,000 scans of 7-23 micron spectra for over 50,000 celestial sources; publication in refereed journal of an additional 486 critically evaluated spectra of sources brighter than 20 Jy, completing the LRS ATLAS (Olnon and Raimond 1986 A&A) uniformly to that level, and production of an additional 1,830 critically evaluated spectra of sources brighter than 10 Jy; creation and maintenance of on-line, remotely accessible LRS spectra of over 7500 sources; cooperation with Astrophysics Data System personnel for transitioning this LRS database to the ADS access system after funding for this project expires; and publication of research highlights, which include a systematic variation of the shapes of LRS silicate features among stars of differing IRAS broad-band colors, maser characteristics and light curve asymmetries, all correlated with the chemical and physical development and processing of solid phase materials, and preliminary evidence for silicate profile variations in individual stars as a function of visual light curve phase.

  19. The Space Systems Environmental Test Facility Database (SSETFD), Website Development Status

    NASA Technical Reports Server (NTRS)

    Snyder, James M.

    2008-01-01

    The Aerospace Corporation has been developing a database of U.S. environmental test laboratory capabilities utilized by the space systems hardware development community. To date, 19 sites have been visited by The Aerospace Corporation and verbal agreements reached to include their capability descriptions in the database. A website is being developed to make this database accessible by all interested government, civil, university and industry personnel. The website will be accessible by all interested in learning more about the extensive collective capability that the US based space industry has to offer. The Environments, Test & Assessment Department within The Aerospace Corporation will be responsible for overall coordination and maintenance of the database. Several US government agencies are interested in utilizing this database to assist in the source selection process for future spacecraft programs. This paper introduces the website by providing an overview of its development, location and search capabilities. It will show how the aerospace community can apply this new tool as a way to increase the utilization of existing lab facilities, and as a starting point for capital expenditure/upgrade trade studies. The long term result is expected to be increased utilization of existing laboratory capability and reduced overall development cost of space systems hardware. Finally, the paper will present the process for adding new participants, and how the database will be maintained.

  20. Overview of Historical Earthquake Document Database in Japan and Future Development

    NASA Astrophysics Data System (ADS)

    Nishiyama, A.; Satake, K.

    2014-12-01

    In Japan, damage and disasters from historical large earthquakes have been documented and preserved. Compilation of historical earthquake documents started in the early 20th century and 33 volumes of historical document source books (about 27,000 pages) have been published. However, these source books are not effectively utilized for researchers due to a contamination of low-reliability historical records and a difficulty for keyword searching by characters and dates. To overcome these problems and to promote historical earthquake studies in Japan, construction of text database started in the 21 century. As for historical earthquakes from the beginning of the 7th century to the early 17th century, "Online Database of Historical Documents in Japanese Earthquakes and Eruptions in the Ancient and Medieval Ages" (Ishibashi, 2009) has been already constructed. They investigated the source books or original texts of historical literature, emended the descriptions, and assigned the reliability of each historical document on the basis of written age. Another database compiled the historical documents for seven damaging earthquakes occurred along the Sea of Japan coast in Honshu, central Japan in the Edo period (from the beginning of the 17th century to the middle of the 19th century) and constructed text database and seismic intensity data base. These are now publicized on the web (written only in Japanese). However, only about 9 % of the earthquake source books have been digitized so far. Therefore, we plan to digitize all of the remaining historical documents by the research-program which started in 2014. The specification of the data base will be similar for previous ones. We also plan to combine this database with liquefaction traces database, which will be constructed by other research program, by adding the location information described in historical documents. Constructed database would be utilized to estimate the distributions of seismic intensities and tsunami heights.

  1. GMODWeb: a web framework for the generic model organism database

    PubMed Central

    O'Connor, Brian D; Day, Allen; Cain, Scott; Arnaiz, Olivier; Sperling, Linda; Stein, Lincoln D

    2008-01-01

    The Generic Model Organism Database (GMOD) initiative provides species-agnostic data models and software tools for representing curated model organism data. Here we describe GMODWeb, a GMOD project designed to speed the development of model organism database (MOD) websites. Sites created with GMODWeb provide integration with other GMOD tools and allow users to browse and search through a variety of data types. GMODWeb was built using the open source Turnkey web framework and is available from . PMID:18570664

  2. A Brief Assessment of LC2IEDM, MIST and Web Services for use in Naval Tactical Data Management

    DTIC Science & Technology

    2004-07-01

    server software, messaging between the client and server, and a database. The MIST database is implemented in an open source DBMS named PostGreSQL ... PostGreSQL had its beginnings at the University of California, Berkley, in 1986 [11]. The development of PostGreSQL has since evolved into a...contact history from the database. DRDC Atlantic TM 2004-148 9 Request Software Request Software Server Side Response from service

  3. Documentation for the U.S. Geological Survey Public-Supply Database (PSDB): A database of permitted public-supply wells, surface-water intakes, and systems in the United States

    USGS Publications Warehouse

    Price, Curtis V.; Maupin, Molly A.

    2014-01-01

    The purpose of this report is to document the PSDB and explain the methods used to populate and update the data from the SDWIS, State datasets, and map and geospatial imagery. This report describes 3 data tables and 11 domain tables, including field contents, data sources, and relations between tables. Although the PSDB database is not available to the general public, this information should be useful for others who are developing other database systems to store and analyze public-supply system and facility data.

  4. A comprehensive clinical research database based on CDISC ODM and i2b2.

    PubMed

    Meineke, Frank A; Stäubert, Sebastian; Löbe, Matthias; Winter, Alfred

    2014-01-01

    We present a working approach for a clinical research database as part of an archival information system. The CDISC ODM standard is target for clinical study and research relevant routine data, thus decoupling the data ingest process from the access layer. The presented research database is comprehensive as it covers annotating, mapping and curation of poorly annotated source data. Besides a conventional relational database the medical data warehouse i2b2 serves as main frontend for end-users. The system we developed is suitable to support patient recruitment, cohort identification and quality assurance in daily routine.

  5. EFFECT OF SAMPLING LOCATION ON CONCENTRATION IN LARGE CHAMBER INVESTIGATION OF EMISSIONS FROM MARKERS

    EPA Science Inventory

    Markers were selected for evaluation in this study because (1) they are widely used in schools, offices, and homes; (2) they are a known source of volatile organic compounds (VOCs) in nonoccupational indoor environments; and (3) according to the Source Ranking Database developed ...

  6. KID Project: an internet-based digital video atlas of capsule endoscopy for research purposes.

    PubMed

    Koulaouzidis, Anastasios; Iakovidis, Dimitris K; Yung, Diana E; Rondonotti, Emanuele; Kopylov, Uri; Plevris, John N; Toth, Ervin; Eliakim, Abraham; Wurm Johansson, Gabrielle; Marlicz, Wojciech; Mavrogenis, Georgios; Nemeth, Artur; Thorlacius, Henrik; Tontini, Gian Eugenio

    2017-06-01

     Capsule endoscopy (CE) has revolutionized small-bowel (SB) investigation. Computational methods can enhance diagnostic yield (DY); however, incorporating machine learning algorithms (MLAs) into CE reading is difficult as large amounts of image annotations are required for training. Current databases lack graphic annotations of pathologies and cannot be used. A novel database, KID, aims to provide a reference for research and development of medical decision support systems (MDSS) for CE.  Open-source software was used for the KID database. Clinicians contribute anonymized, annotated CE images and videos. Graphic annotations are supported by an open-access annotation tool (Ratsnake). We detail an experiment based on the KID database, examining differences in SB lesion measurement between human readers and a MLA. The Jaccard Index (JI) was used to evaluate similarity between annotations by the MLA and human readers.  The MLA performed best in measuring lymphangiectasias with a JI of 81 ± 6 %. The other lesion types were: angioectasias (JI 64 ± 11 %), aphthae (JI 64 ± 8 %), chylous cysts (JI 70 ± 14 %), polypoid lesions (JI 75 ± 21 %), and ulcers (JI 56 ± 9 %).  MLA can perform as well as human readers in the measurement of SB angioectasias in white light (WL). Automated lesion measurement is therefore feasible. KID is currently the only open-source CE database developed specifically to aid development of MDSS. Our experiment demonstrates this potential.

  7. Biological data integration: wrapping data and tools.

    PubMed

    Lacroix, Zoé

    2002-06-01

    Nowadays scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data accessing, analyzing, and visualization tools. Building a digital library for scientific data requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web as well as data generated by software. We present an approach to wrapping web data sources, databases, flat files, or data generated by tools through a database view mechanism. Generally, a wrapper has two tasks: it first sends a query to the source to retrieve data and, second builds the expected output with respect to the virtual structure. Our wrappers are composed of a retrieval component based on an intermediate object view mechanism called search views mapping the source capabilities to attributes, and an eXtensible Markup Language (XML) engine, respectively, to perform these two tasks. The originality of the approach consists of: 1) a generic view mechanism to access seamlessly data sources with limited capabilities and 2) the ability to wrap data sources as well as the useful specific tools they may provide. Our approach has been developed and demonstrated as part of the multidatabase system supporting queries via uniform object protocol model (OPM) interfaces.

  8. Can the Millennium Development Goals database be used to measure the effects of globalisation on women's health in Sub-Saharan Africa? A critical analysis.

    PubMed

    Wamala, Sarah; Breman, Anna; Richardson, Matt X; Loewenson, Rene

    2010-03-01

    Africa has had poor returns from integration with world markets in globalisation, has experienced worsening poverty and malnutrition and has high burdens of HIV and communicable disease, with particular burdens on women. It is therefore essential to describe the impact of globalisation on women's health. Indicators such as the Millennium Development Goals (MDGs) are presented as having a major role in measuring this impact, but an assessment of the adequacy of aggregate national indicators used in monitoring the MDGs for this purpose is lacking. The Millennium Development Goals' panel database 2000 to 2006 was used to investigate the association between globalisation and women's health in Sub-Saharan Africa based on various determinants of heath. Out of the 148 countries classified as developing countries, 48 were in Sub-Saharan Africa. Results suggest that developing countries are becoming more integrated with world markets through some lowering of trade barriers. At the same time, women's occupational roles are changing, which could affect their health status. However, it is difficult to measure the impact of globalisation on women's health from the MDG database. First, data on trade liberalization is aggregated at the regional level and does not hold any information on individual countries. Second, too few indicators in the MDG database are disaggregated by sex, making it difficult to separate the effects on women from those on men. The MDG database is not adequate to assess the effects of globalisation on women's health in Sub-Saharan Africa. We recommend that researchers aim to address this research question to find other data sources or turn to case studies. We hope that results from this study will stimulate research on globalisation and health using reliable sources.

  9. Development of New Jersey rates for the NJCMS incident delay model.

    DOT National Transportation Integrated Search

    2012-09-01

    This study developed a working database for calculating incident rates and related delay measures, which contains incident related data collected from various data sources, such as the New Jersey Department of Transportation (NJDOT) Crash Records, Tr...

  10. The NASA Goddard Group's Source Monitoring Database and Program

    NASA Astrophysics Data System (ADS)

    Gipson, John; Le Bail, Karine; Ma, Chopo

    2014-12-01

    Beginning in 2003, the Goddard VLBI group developed a program to purposefully monitor when sources were observed and to increase the observations of ``under-observed'' sources. The heart of the program consists of a MySQL database that keeps track of, on a session-by-session basis: the number of observations that are scheduled for a source, the number of observations that are successfully correlated, and the number of observations that are used in a session. In addition, there is a table that contains the target number of successful sessions over the last twelve months. Initially this table just contained two categories. Sources in the geodetic catalog had a target of 12 sessions/year; the remaining ICRF-1 defining sources had a target of two sessions/year. All other sources did not have a specific target. As the program evolved, different kinds of sources with different observing targets were added. During the scheduling process, the scheduler has the option of automatically selecting N sources which have not met their target. We discuss the history and present some results of this successful program.

  11. Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data

    USGS Publications Warehouse

    Loveland, Thomas R.; Reed, B.C.; Brown, Jesslyn F.; Ohlen, D.O.; Zhu, Z.; Yang, L.; Merchant, J.W.

    2000-01-01

    Researchers from the U.S. Geological Survey, University of Nebraska-Lincoln and the European Commission's Joint Research Centre, Ispra, Italy produced a 1 km resolution global land cover characteristics database for use in a wide range of continental-to global-scale environmental studies. This database provides a unique view of the broad patterns of the biogeographical and ecoclimatic diversity of the global land surface, and presents a detailed interpretation of the extent of human development. The project was carried out as an International Geosphere-Biosphere Programme, Data and Information Systems (IGBP-DIS) initiative. The IGBP DISCover global land cover product is an integral component of the global land cover database. DISCover includes 17 general land cover classes defined to meet the needs of IGBP core science projects. A formal accuracy assessment of the DISCover data layer will be completed in 1998. The 1 km global land cover database was developed through a continent-by-continent unsupervised classification of 1 km monthly Advanced Very High Resolution Radiometer (AVHRR) Normalized Difference Vegetation Index (NDVI) composites covering 1992-1993. Extensive post-classification stratification was necessary to resolve spectral/temporal confusion between disparate land cover types. The complete global database consists of 961 seasonal land cover regions that capture patterns of land cover, seasonality and relative primary productivity. The seasonal land cover regions were aggregated to produce seven separate land cover data sets used for global environmental modelling and assessment. The data sets include IGBP DISCover, U.S. Geological Survey Anderson System, Simple Biosphere Model, Simple Biosphere Model 2, Biosphere-Atmosphere Transfer Scheme, Olson Ecosystems and Running Global Remote Sensing Land Cover. The database also includes all digital sources that were used in the classification. The complete database can be sourced from the website: http://edcwww.cr.usgs.gov/landdaac/glcc/glcc.html.

  12. Nuclear Science References Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pritychenko, B., E-mail: pritychenko@bnl.gov; Běták, E.; Singh, B.

    2014-06-15

    The Nuclear Science References (NSR) database together with its associated Web interface, is the world's only comprehensive source of easily accessible low- and intermediate-energy nuclear physics bibliographic information for more than 210,000 articles since the beginning of nuclear science. The weekly-updated NSR database provides essential support for nuclear data evaluation, compilation and research activities. The principles of the database and Web application development and maintenance are described. Examples of nuclear structure, reaction and decay applications are specifically included. The complete NSR database is freely available at the websites of the National Nuclear Data Center (http://www.nndc.bnl.gov/nsr) and the International Atomic Energymore » Agency (http://www-nds.iaea.org/nsr)« less

  13. The Ensembl genome database project.

    PubMed

    Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

    2002-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

  14. Implementing the EuroFIR Document and Data Repositories as accessible resources of food composition information.

    PubMed

    Unwin, Ian; Jansen-van der Vliet, Martine; Westenbrink, Susanne; Presser, Karl; Infanger, Esther; Porubska, Janka; Roe, Mark; Finglas, Paul

    2016-02-15

    The EuroFIR Document and Data Repositories are being developed as accessible collections of source documents, including grey literature, and the food composition data reported in them. These Repositories will contain source information available to food composition database compilers when selecting their nutritional data. The Document Repository was implemented as searchable bibliographic records in the Europe PubMed Central database, which links to the documents online. The Data Repository will contain original data from source documents in the Document Repository. Testing confirmed the FoodCASE food database management system as a suitable tool for the input, documentation and quality assessment of Data Repository information. Data management requirements for the input and documentation of reported analytical results were established, including record identification and method documentation specifications. Document access and data preparation using the Repositories will provide information resources for compilers, eliminating duplicated work and supporting unambiguous referencing of data contributing to their compiled data. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. Comet: an open-source MS/MS sequence database search tool.

    PubMed

    Eng, Jimmy K; Jahan, Tahmina A; Hoopmann, Michael R

    2013-01-01

    Proteomics research routinely involves identifying peptides and proteins via MS/MS sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of commercial and open-source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases.

    PubMed

    Voss, Erica A; Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B

    2015-05-01

    To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  17. FishTraits: a database of ecological and life-history traits of freshwater fishes of the United States

    USGS Publications Warehouse

    Angermeier, Paul L.; Frimpong, Emmanuel A.

    2011-01-01

    The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. We have compiled a database of > 100 traits for 809 (731 native and 78 nonnative) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database, named Fish Traits, contains information on four major categories of traits: (1) trophic ecology; (2) body size, reproductive ecology, and life history; (3) habitat preferences; and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status was also compiled. The database enhances many opportunities for conducting research on fish species traits and constitutes the first step toward establishing a central repository for a continually expanding set of traits of North American fishes.

  18. The development of a composition database of gluten-free products.

    PubMed

    Mazzeo, Teresa; Cauzzi, Silvia; Brighenti, Furio; Pellegrini, Nicoletta

    2015-06-01

    To develop a composition database of a number of foods representative of different categories of gluten-free products in the Italian diet. The database was built using the nutritional composition of the products, taking into consideration both the composition of the ingredients and the nutritional information reported on the product label. The nutrient composition of each ingredient was obtained from two Italian databases (European Institute of Oncology and the National Institute for Food and Nutrition). The study developed a food composition database including a total of sixty foods representative of different categories of gluten-free products sold on the Italian market. The composition of the products included in the database is given in terms of quantity of macro- and micronutrients per 100 g of product as sold, and includes the full range of nutrient data present in traditional databases of gluten-containing foods. As expected, most of the products had a high content of carbohydrates and some of them can be labelled as a source of fibre (>3 g/100 g). Regarding micronutrients, among the products considered, breads, pizzas and snacks were especially very high in Na content (>400-500 mg/100 g). This database provides an initial useful tool for future nutritional surveys on the dietary habits of coeliac people.

  19. Conversion of environmental data to a digital-spatial database, Puget Sound area, Washington

    USGS Publications Warehouse

    Uhrich, M.A.; McGrath, T.S.

    1997-01-01

    Data and maps from the Puget Sound Environmental Atlas, compiled for the U.S. Environmental Protection Agency, the Puget Sound Water Quality Authority, and the U.S. Army Corps of Engineers, have been converted into a digital-spatial database using a geographic information system. Environmental data for the Puget Sound area,collected from sources other than the Puget SoundEnvironmental Atlas by different Federal, State, andlocal agencies, also have been converted into thisdigital-spatial database. Background on the geographic-information-system planning process, the design and implementation of the geographic information-system database, and the reasons for conversion to this digital-spatial database are included in this report. The Puget Sound Environmental Atlas data layers include information about seabird nesting areas, eelgrass and kelp habitat, marine mammal and fish areas, and shellfish resources and bed certification. Data layers, from sources other than the Puget Sound Environmental Atlas, include the Puget Sound shoreline, the water-body system, shellfish growing areas, recreational shellfish beaches, sewage-treatment outfalls, upland hydrography,watershed and political boundaries, and geographicnames. The sources of data, descriptions of the datalayers, and the steps and errors of processing associated with conversion to a digital-spatial database used in development of the Puget Sound Geographic Information System also are included in this report. The appendixes contain data dictionaries for each of the resource layers and error values for the conversion of Puget SoundEnvironmental Atlas data.

  20. Human Variome Project Quality Assessment Criteria for Variation Databases.

    PubMed

    Vihinen, Mauno; Hancock, John M; Maglott, Donna R; Landrum, Melissa J; Schaafsma, Gerard C P; Taschner, Peter

    2016-06-01

    Numerous databases containing information about DNA, RNA, and protein variations are available. Gene-specific variant databases (locus-specific variation databases, LSDBs) are typically curated and maintained for single genes or groups of genes for a certain disease(s). These databases are widely considered as the most reliable information source for a particular gene/protein/disease, but it should also be made clear they may have widely varying contents, infrastructure, and quality. Quality is very important to evaluate because these databases may affect health decision-making, research, and clinical practice. The Human Variome Project (HVP) established a Working Group for Variant Database Quality Assessment. The basic principle was to develop a simple system that nevertheless provides a good overview of the quality of a database. The HVP quality evaluation criteria that resulted are divided into four main components: data quality, technical quality, accessibility, and timeliness. This report elaborates on the developed quality criteria and how implementation of the quality scheme can be achieved. Examples are provided for the current status of the quality items in two different databases, BTKbase, an LSDB, and ClinVar, a central archive of submissions about variants and their clinical significance. © 2016 WILEY PERIODICALS, INC.

  1. An integrated database on ticks and tick-borne zoonoses in the tropics and subtropics with special reference to developing and emerging countries.

    PubMed

    Vesco, Umberto; Knap, Nataša; Labruna, Marcelo B; Avšič-Županc, Tatjana; Estrada-Peña, Agustín; Guglielmone, Alberto A; Bechara, Gervasio H; Gueye, Arona; Lakos, Andras; Grindatto, Anna; Conte, Valeria; De Meneghi, Daniele

    2011-05-01

    Tick-borne zoonoses (TBZ) are emerging diseases worldwide. A large amount of information (e.g. case reports, results of epidemiological surveillance, etc.) is dispersed through various reference sources (ISI and non-ISI journals, conference proceedings, technical reports, etc.). An integrated database-derived from the ICTTD-3 project ( http://www.icttd.nl )-was developed in order to gather TBZ records in the (sub-)tropics, collected both by the authors and collaborators worldwide. A dedicated website ( http://www.tickbornezoonoses.org ) was created to promote collaboration and circulate information. Data collected are made freely available to researchers for analysis by spatial methods, integrating mapped ecological factors for predicting TBZ risk. The authors present the assembly process of the TBZ database: the compilation of an updated list of TBZ relevant for (sub-)tropics, the database design and its structure, the method of bibliographic search, the assessment of spatial precision of geo-referenced records. At the time of writing, 725 records extracted from 337 publications related to 59 countries in the (sub-)tropics, have been entered in the database. TBZ distribution maps were also produced. Imported cases have been also accounted for. The most important datasets with geo-referenced records were those on Spotted Fever Group rickettsiosis in Latin-America and Crimean-Congo Haemorrhagic Fever in Africa. The authors stress the need for international collaboration in data collection to update and improve the database. Supervision of data entered remains always necessary. Means to foster collaboration are discussed. The paper is also intended to describe the challenges encountered to assemble spatial data from various sources and to help develop similar data collections.

  2. EPAs SPECIATE 4.4 Database: Development and Uses

    EPA Science Inventory

    SPECIATE is the U.S. Environmental Protection Agency’s (EPA) repository of source category-specific particulate matter (PM), volatile organic gas, and other gas speciation profiles of air pollutant emissions. Abt Associates, Inc. developed SPECIATE 4.4 through a collaborat...

  3. Validation of chronic obstructive pulmonary disease (COPD) diagnoses in healthcare databases: a systematic review protocol.

    PubMed

    Rimland, Joseph M; Abraha, Iosief; Luchetta, Maria Laura; Cozzolino, Francesco; Orso, Massimiliano; Cherubini, Antonio; Dell'Aquila, Giuseppina; Chiatti, Carlos; Ambrosio, Giuseppe; Montedori, Alessandro

    2016-06-01

    Healthcare databases are useful sources to investigate the epidemiology of chronic obstructive pulmonary disease (COPD), to assess longitudinal outcomes in patients with COPD, and to develop disease management strategies. However, in order to constitute a reliable source for research, healthcare databases need to be validated. The aim of this protocol is to perform the first systematic review of studies reporting the validation of codes related to COPD diagnoses in healthcare databases. MEDLINE, EMBASE, Web of Science and the Cochrane Library databases will be searched using appropriate search strategies. Studies that evaluated the validity of COPD codes (such as the International Classification of Diseases 9th Revision and 10th Revision system; the Real codes system or the International Classification of Primary Care) in healthcare databases will be included. Inclusion criteria will be: (1) the presence of a reference standard case definition for COPD; (2) the presence of at least one test measure (eg, sensitivity, positive predictive values, etc); and (3) the use of a healthcare database (including administrative claims databases, electronic healthcare databases or COPD registries) as a data source. Pairs of reviewers will independently abstract data using standardised forms and will assess quality using a checklist based on the Standards for Reporting of Diagnostic accuracy (STARD) criteria. This systematic review protocol has been produced in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocol (PRISMA-P) 2015 statement. Ethics approval is not required. Results of this study will be submitted to a peer-reviewed journal for publication. The results from this systematic review will be used for outcome research on COPD and will serve as a guide to identify appropriate case definitions of COPD, and reference standards, for researchers involved in validating healthcare databases. CRD42015029204. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  4. Argonne Geothermal Geochemical Database v2.0

    DOE Data Explorer

    Harto, Christopher

    2013-05-22

    A database of geochemical data from potential geothermal sources aggregated from multiple sources as of March 2010. The database contains fields for the location, depth, temperature, pH, total dissolved solids concentration, chemical composition, and date of sampling. A separate tab contains data on non-condensible gas compositions. The database contains records for over 50,000 wells, although many entries are incomplete. Current versions of source documentation are listed in the dataset.

  5. Phynx: an open source software solution supporting data management and web-based patient-level data review for drug safety studies in the general practice research database and other health care databases.

    PubMed

    Egbring, Marco; Kullak-Ublick, Gerd A; Russmann, Stefan

    2010-01-01

    To develop a software solution that supports management and clinical review of patient data from electronic medical records databases or claims databases for pharmacoepidemiological drug safety studies. We used open source software to build a data management system and an internet application with a Flex client on a Java application server with a MySQL database backend. The application is hosted on Amazon Elastic Compute Cloud. This solution named Phynx supports data management, Web-based display of electronic patient information, and interactive review of patient-level information in the individual clinical context. This system was applied to a dataset from the UK General Practice Research Database (GPRD). Our solution can be setup and customized with limited programming resources, and there is almost no extra cost for software. Access times are short, the displayed information is structured in chronological order and visually attractive, and selected information such as drug exposure can be blinded. External experts can review patient profiles and save evaluations and comments via a common Web browser. Phynx provides a flexible and economical solution for patient-level review of electronic medical information from databases considering the individual clinical context. It can therefore make an important contribution to an efficient validation of outcome assessment in drug safety database studies.

  6. SIDD: A Semantically Integrated Database towards a Global View of Human Disease

    PubMed Central

    Cheng, Liang; Wang, Guohua; Li, Jie; Zhang, Tianjiao; Xu, Peigang; Wang, Yadong

    2013-01-01

    Background A number of databases have been developed to collect disease-related molecular, phenotypic and environmental features (DR-MPEs), such as genes, non-coding RNAs, genetic variations, drugs, phenotypes and environmental factors. However, each of current databases focused on only one or two DR-MPEs. There is an urgent demand to develop an integrated database, which can establish semantic associations among disease-related databases and link them to provide a global view of human disease at the biological level. This database, once developed, will facilitate researchers to query various DR-MPEs through disease, and investigate disease mechanisms from different types of data. Methodology To establish an integrated disease-associated database, disease vocabularies used in different databases are mapped to Disease Ontology (DO) through semantic match. 4,284 and 4,186 disease terms from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM) respectively are mapped to DO. Then, the relationships between DR-MPEs and diseases are extracted and merged from different source databases for reducing the data redundancy. Conclusions A semantically integrated disease-associated database (SIDD) is developed, which integrates 18 disease-associated databases, for researchers to browse multiple types of DR-MPEs in a view. A web interface allows easy navigation for querying information through browsing a disease ontology tree or searching a disease term. Furthermore, a network visualization tool using Cytoscape Web plugin has been implemented in SIDD. It enhances the SIDD usage when viewing the relationships between diseases and DR-MPEs. The current version of SIDD (Jul 2013) documents 4,465,131 entries relating to 139,365 DR-MPEs, and to 3,824 human diseases. The database can be freely accessed from: http://mlg.hit.edu.cn/SIDD. PMID:24146757

  7. Development of a land-cover characteristics database for the conterminous U.S.

    USGS Publications Warehouse

    Loveland, Thomas R.; Merchant, J.W.; Ohlen, D.O.; Brown, Jesslyn F.

    1991-01-01

    Information regarding the characteristics and spatial distribution of the Earth's land cover is critical to global environmental research. A prototype land-cover database for the conterminous United States designed for use in a variety of global modelling, monitoring, mapping, and analytical endeavors has been created. The resultant database contains multiple layers, including the source AVHRR data, the ancillary data layers, the land-cover regions defined by the research, and translation tables linking the regions to other land classification schema (for example, UNESCO, USGS Anderson System). The land-cover characteristics database can be analyzed, transformed, or aggregated by users to meet a broad spectrum of requirements. -from Authors

  8. Database Organisation in a Web-Enabled Free and Open-Source Software (foss) Environment for Spatio-Temporal Landslide Modelling

    NASA Astrophysics Data System (ADS)

    Das, I.; Oberai, K.; Sarathi Roy, P.

    2012-07-01

    Landslides exhibit themselves in different mass movement processes and are considered among the most complex natural hazards occurring on the earth surface. Making landslide database available online via WWW (World Wide Web) promotes the spreading and reaching out of the landslide information to all the stakeholders. The aim of this research is to present a comprehensive database for generating landslide hazard scenario with the help of available historic records of landslides and geo-environmental factors and make them available over the Web using geospatial Free & Open Source Software (FOSS). FOSS reduces the cost of the project drastically as proprietary software's are very costly. Landslide data generated for the period 1982 to 2009 were compiled along the national highway road corridor in Indian Himalayas. All the geo-environmental datasets along with the landslide susceptibility map were served through WEBGIS client interface. Open source University of Minnesota (UMN) mapserver was used as GIS server software for developing web enabled landslide geospatial database. PHP/Mapscript server-side application serve as a front-end application and PostgreSQL with PostGIS extension serve as a backend application for the web enabled landslide spatio-temporal databases. This dynamic virtual visualization process through a web platform brings an insight into the understanding of the landslides and the resulting damage closer to the affected people and user community. The landslide susceptibility dataset is also made available as an Open Geospatial Consortium (OGC) Web Feature Service (WFS) which can be accessed through any OGC compliant open source or proprietary GIS Software.

  9. Seeds in Chernobyl: the database on proteome response on radioactive environment

    PubMed Central

    Klubicová, Katarína; Vesel, Martin; Rashydov, Namik M.; Hajduch, Martin

    2012-01-01

    Two serious nuclear accidents during the last quarter century (Chernobyl, 1986 and Fukushima, 2011) contaminated large agricultural areas with radioactivity. The database “Seeds in Chernobyl” (http://www.chernobylproteomics.sav.sk) contains the information about the abundances of hundreds of proteins from on-going investigation of mature and developing seed harvested from plants grown in radioactive Chernobyl area. This database provides a useful source of information concerning the response of the seed proteome to permanently increased level of ionizing radiation in a user-friendly format. PMID:23087698

  10. PubChem BioAssay: 2017 update

    PubMed Central

    Wang, Yanli; Bryant, Stephen H.; Cheng, Tiejun; Wang, Jiyao; Gindulyte, Asta; Shoemaker, Benjamin A.; Thiessen, Paul A.; He, Siqian; Zhang, Jian

    2017-01-01

    PubChem's BioAssay database (https://pubchem.ncbi.nlm.nih.gov) has served as a public repository for small-molecule and RNAi screening data since 2004 providing open access of its data content to the community. PubChem accepts data submission from worldwide researchers at academia, industry and government agencies. PubChem also collaborates with other chemical biology database stakeholders with data exchange. With over a decade's development effort, it becomes an important information resource supporting drug discovery and chemical biology research. To facilitate data discovery, PubChem is integrated with all other databases at NCBI. In this work, we provide an update for the PubChem BioAssay database describing several recent development including added sources of research data, redesigned BioAssay record page, new BioAssay classification browser and new features in the Upload system facilitating data sharing. PMID:27899599

  11. KID Project: an internet-based digital video atlas of capsule endoscopy for research purposes

    PubMed Central

    Koulaouzidis, Anastasios; Iakovidis, Dimitris K.; Yung, Diana E.; Rondonotti, Emanuele; Kopylov, Uri; Plevris, John N.; Toth, Ervin; Eliakim, Abraham; Wurm Johansson, Gabrielle; Marlicz, Wojciech; Mavrogenis, Georgios; Nemeth, Artur; Thorlacius, Henrik; Tontini, Gian Eugenio

    2017-01-01

    Background and aims  Capsule endoscopy (CE) has revolutionized small-bowel (SB) investigation. Computational methods can enhance diagnostic yield (DY); however, incorporating machine learning algorithms (MLAs) into CE reading is difficult as large amounts of image annotations are required for training. Current databases lack graphic annotations of pathologies and cannot be used. A novel database, KID, aims to provide a reference for research and development of medical decision support systems (MDSS) for CE. Methods  Open-source software was used for the KID database. Clinicians contribute anonymized, annotated CE images and videos. Graphic annotations are supported by an open-access annotation tool (Ratsnake). We detail an experiment based on the KID database, examining differences in SB lesion measurement between human readers and a MLA. The Jaccard Index (JI) was used to evaluate similarity between annotations by the MLA and human readers. Results  The MLA performed best in measuring lymphangiectasias with a JI of 81 ± 6 %. The other lesion types were: angioectasias (JI 64 ± 11 %), aphthae (JI 64 ± 8 %), chylous cysts (JI 70 ± 14 %), polypoid lesions (JI 75 ± 21 %), and ulcers (JI 56 ± 9 %). Conclusion  MLA can perform as well as human readers in the measurement of SB angioectasias in white light (WL). Automated lesion measurement is therefore feasible. KID is currently the only open-source CE database developed specifically to aid development of MDSS. Our experiment demonstrates this potential. PMID:28580415

  12. The Cardiac Atlas Project--an imaging database for computational modeling and statistical atlases of the heart.

    PubMed

    Fonseca, Carissa G; Backhaus, Michael; Bluemke, David A; Britten, Randall D; Chung, Jae Do; Cowan, Brett R; Dinov, Ivo D; Finn, J Paul; Hunter, Peter J; Kadish, Alan H; Lee, Daniel C; Lima, Joao A C; Medrano-Gracia, Pau; Shivkumar, Kalyanam; Suinesiaputra, Avan; Tao, Wenchao; Young, Alistair A

    2011-08-15

    Integrative mathematical and statistical models of cardiac anatomy and physiology can play a vital role in understanding cardiac disease phenotype and planning therapeutic strategies. However, the accuracy and predictive power of such models is dependent upon the breadth and depth of noninvasive imaging datasets. The Cardiac Atlas Project (CAP) has established a large-scale database of cardiac imaging examinations and associated clinical data in order to develop a shareable, web-accessible, structural and functional atlas of the normal and pathological heart for clinical, research and educational purposes. A goal of CAP is to facilitate collaborative statistical analysis of regional heart shape and wall motion and characterize cardiac function among and within population groups. Three main open-source software components were developed: (i) a database with web-interface; (ii) a modeling client for 3D + time visualization and parametric description of shape and motion; and (iii) open data formats for semantic characterization of models and annotations. The database was implemented using a three-tier architecture utilizing MySQL, JBoss and Dcm4chee, in compliance with the DICOM standard to provide compatibility with existing clinical networks and devices. Parts of Dcm4chee were extended to access image specific attributes as search parameters. To date, approximately 3000 de-identified cardiac imaging examinations are available in the database. All software components developed by the CAP are open source and are freely available under the Mozilla Public License Version 1.1 (http://www.mozilla.org/MPL/MPL-1.1.txt). http://www.cardiacatlas.org a.young@auckland.ac.nz Supplementary data are available at Bioinformatics online.

  13. Building strategies for tsunami scenarios databases to be used in a tsunami early warning decision support system: an application to western Iberia

    NASA Astrophysics Data System (ADS)

    Tinti, S.; Armigliato, A.; Pagnoni, G.; Zaniboni, F.

    2012-04-01

    One of the most challenging goals that the geo-scientific community is facing after the catastrophic tsunami occurred on December 2004 in the Indian Ocean is to develop the so-called "next generation" Tsunami Early Warning Systems (TEWS). Indeed, the meaning of "next generation" does not refer to the aim of a TEWS, which obviously remains to detect whether a tsunami has been generated or not by a given source and, in the first case, to send proper warnings and/or alerts in a suitable time to all the countries and communities that can be affected by the tsunami. Instead, "next generation" identifies with the development of a Decision Support System (DSS) that, in general terms, relies on 1) an integrated set of seismic, geodetic and marine sensors whose objective is to detect and characterise the possible tsunamigenic sources and to monitor instrumentally the time and space evolution of the generated tsunami, 2) databases of pre-computed numerical tsunami scenarios to be suitably combined based on the information coming from the sensor environment and to be used to forecast the degree of exposition of different coastal places both in the near- and in the far-field, 3) a proper overall (software) system architecture. The EU-FP7 TRIDEC Project aims at developing such a DSS and has selected two test areas in the Euro-Mediterranean region, namely the western Iberian margin and the eastern Mediterranean (Turkish coasts). In this study, we discuss the strategies that are being adopted in TRIDEC to build the databases of pre-computed tsunami scenarios and we show some applications to the western Iberian margin. In particular, two different databases are being populated, called "Virtual Scenario Database" (VSDB) and "Matching Scenario Database" (MSDB). The VSDB contains detailed simulations of few selected earthquake-generated tsunamis. The cases provided by the members of the VSDB are computed "real events"; in other words, they represent the unknowns that the TRIDEC platform must be able to recognise and match during the early crisis management phase. The MSDB contains a very large number (order of thousands) of tsunami simulations performed starting from many different simple earthquake sources of different magnitudes and located in the "vicinity" of the virtual scenario earthquake. Examples from both databases will be presented.

  14. Data Aggregation System: A system for information retrieval on demand over relational and non-relational distributed data sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ball, G.; Kuznetsov, V.; Evans, D.

    We present the Data Aggregation System, a system for information retrieval and aggregation from heterogenous sources of relational and non-relational data for the Compact Muon Solenoid experiment on the CERN Large Hadron Collider. The experiment currently has a number of organically-developed data sources, including front-ends to a number of different relational databases and non-database data services which do not share common data structures or APIs (Application Programming Interfaces), and cannot at this stage be readily converged. DAS provides a single interface for querying all these services, a caching layer to speed up access to expensive underlying calls and the abilitymore » to merge records from different data services pertaining to a single primary key.« less

  15. Recent advances on terrain database correlation testing

    NASA Astrophysics Data System (ADS)

    Sakude, Milton T.; Schiavone, Guy A.; Morelos-Borja, Hector; Martin, Glenn; Cortes, Art

    1998-08-01

    Terrain database correlation is a major requirement for interoperability in distributed simulation. There are numerous situations in which terrain database correlation problems can occur that, in turn, lead to lack of interoperability in distributed training simulations. Examples are the use of different run-time terrain databases derived from inconsistent on source data, the use of different resolutions, and the use of different data models between databases for both terrain and culture data. IST has been developing a suite of software tools, named ZCAP, to address terrain database interoperability issues. In this paper we discuss recent enhancements made to this suite, including improved algorithms for sampling and calculating line-of-sight, an improved method for measuring terrain roughness, and the application of a sparse matrix method to the terrain remediation solution developed at the Visual Systems Lab of the Institute for Simulation and Training. We review the application of some of these new algorithms to the terrain correlation measurement processes. The application of these new algorithms improves our support for very large terrain databases, and provides the capability for performing test replications to estimate the sampling error of the tests. With this set of tools, a user can quantitatively assess the degree of correlation between large terrain databases.

  16. Detailed Uncertainty Analysis of the Ares I A106 Liftoff/Transition Database

    NASA Technical Reports Server (NTRS)

    Hanke, Jeremy L.

    2011-01-01

    The Ares I A106 Liftoff/Transition Force and Moment Aerodynamics Database describes the aerodynamics of the Ares I Crew Launch Vehicle (CLV) from the moment of liftoff through the transition from high to low total angles of attack at low subsonic Mach numbers. The database includes uncertainty estimates that were developed using a detailed uncertainty quantification procedure. The Ares I Aerodynamics Panel developed both the database and the uncertainties from wind tunnel test data acquired in the NASA Langley Research Center s 14- by 22-Foot Subsonic Wind Tunnel Test 591 using a 1.75 percent scale model of the Ares I and the tower assembly. The uncertainty modeling contains three primary uncertainty sources: experimental uncertainty, database modeling uncertainty, and database query interpolation uncertainty. The final database and uncertainty model represent a significant improvement in the quality of the aerodynamic predictions for this regime of flight over the estimates previously used by the Ares Project. The maximum possible aerodynamic force pushing the vehicle towards the launch tower assembly in a dispersed case using this database saw a 40 percent reduction from the worst-case scenario in previously released data for Ares I.

  17. Emission Database for Global Atmospheric Research (EDGAR).

    ERIC Educational Resources Information Center

    Olivier, J. G. J.; And Others

    1994-01-01

    Presents the objective and methodology chosen for the construction of a global emissions source database called EDGAR and the structural design of the database system. The database estimates on a regional and grid basis, 1990 annual emissions of greenhouse gases, and of ozone depleting compounds from all known sources. (LZ)

  18. Interactive bibliographical database on color

    NASA Astrophysics Data System (ADS)

    Caivano, Jose L.

    2002-06-01

    The paper describes the methodology and results of a project under development, aimed at the elaboration of an interactive bibliographical database on color in all fields of application: philosophy, psychology, semiotics, education, anthropology, physical and natural sciences, biology, medicine, technology, industry, architecture and design, arts, linguistics, geography, history. The project is initially based upon an already developed bibliography, published in different journals, updated in various opportunities, and now available at the Internet, with more than 2,000 entries. The interactive database will amplify that bibliography, incorporating hyperlinks and contents (indexes, abstracts, keywords, introductions, or eventually the complete document), and devising mechanisms for information retrieval. The sources to be included are: books, doctoral dissertations, multimedia publications, reference works. The main arrangement will be chronological, but the design of the database will allow rearrangements or selections by different fields: subject, Decimal Classification System, author, language, country, publisher, etc. A further project is to develop another database, including color-specialized journals or newsletters, and articles on color published in international journals, arranged in this case by journal name and date of publication, but allowing also rearrangements or selections by author, subject and keywords.

  19. The CATDAT damaging earthquakes database

    NASA Astrophysics Data System (ADS)

    Daniell, J. E.; Khazai, B.; Wenzel, F.; Vervaeck, A.

    2011-08-01

    The global CATDAT damaging earthquakes and secondary effects (tsunami, fire, landslides, liquefaction and fault rupture) database was developed to validate, remove discrepancies, and expand greatly upon existing global databases; and to better understand the trends in vulnerability, exposure, and possible future impacts of such historic earthquakes. Lack of consistency and errors in other earthquake loss databases frequently cited and used in analyses was a major shortcoming in the view of the authors which needed to be improved upon. Over 17 000 sources of information have been utilised, primarily in the last few years, to present data from over 12 200 damaging earthquakes historically, with over 7000 earthquakes since 1900 examined and validated before insertion into the database. Each validated earthquake includes seismological information, building damage, ranges of social losses to account for varying sources (deaths, injuries, homeless, and affected), and economic losses (direct, indirect, aid, and insured). Globally, a slightly increasing trend in economic damage due to earthquakes is not consistent with the greatly increasing exposure. The 1923 Great Kanto (214 billion USD damage; 2011 HNDECI-adjusted dollars) compared to the 2011 Tohoku (>300 billion USD at time of writing), 2008 Sichuan and 1995 Kobe earthquakes show the increasing concern for economic loss in urban areas as the trend should be expected to increase. Many economic and social loss values not reported in existing databases have been collected. Historical GDP (Gross Domestic Product), exchange rate, wage information, population, HDI (Human Development Index), and insurance information have been collected globally to form comparisons. This catalogue is the largest known cross-checked global historic damaging earthquake database and should have far-reaching consequences for earthquake loss estimation, socio-economic analysis, and the global reinsurance field.

  20. Integration of relational and textual biomedical sources. A pilot experiment using a semi-automated method for logical schema acquisition.

    PubMed

    García-Remesal, M; Maojo, V; Billhardt, H; Crespo, J

    2010-01-01

    Bringing together structured and text-based sources is an exciting challenge for biomedical informaticians, since most relevant biomedical sources belong to one of these categories. In this paper we evaluate the feasibility of integrating relational and text-based biomedical sources using: i) an original logical schema acquisition method for textual databases developed by the authors, and ii) OntoFusion, a system originally designed by the authors for the integration of relational sources. We conducted an integration experiment involving a test set of seven differently structured sources covering the domain of genetic diseases. We used our logical schema acquisition method to generate schemas for all textual sources. The sources were integrated using the methods and tools provided by OntoFusion. The integration was validated using a test set of 500 queries. A panel of experts answered a questionnaire to evaluate i) the quality of the extracted schemas, ii) the query processing performance of the integrated set of sources, and iii) the relevance of the retrieved results. The results of the survey show that our method extracts coherent and representative logical schemas. Experts' feedback on the performance of the integrated system and the relevance of the retrieved results was also positive. Regarding the validation of the integration, the system successfully provided correct results for all queries in the test set. The results of the experiment suggest that text-based sources including a logical schema can be regarded as equivalent to structured databases. Using our method, previous research and existing tools designed for the integration of structured databases can be reused - possibly subject to minor modifications - to integrate differently structured sources.

  1. MODELING THE DISTRIBUTION OF NONPOINT NITROGEN SOURCES AND SINKS IN THE NEUSE RIVER BASIN OF NORTH CAROLINA, USA

    EPA Science Inventory

    This study quantified nonpoint nitrogen (N) sources and sinks across the 14,582 km2 Neuse River Basin (NRB) located in North Carolina, to provide a tabular database to initialize in-stream N decay models and graphic overlay products for the development of management approaches to...

  2. Open-source tools for data mining.

    PubMed

    Zupan, Blaz; Demsar, Janez

    2008-03-01

    With a growing volume of biomedical databases and repositories, the need to develop a set of tools to address their analysis and support knowledge discovery is becoming acute. The data mining community has developed a substantial set of techniques for computational treatment of these data. In this article, we discuss the evolution of open-source toolboxes that data mining researchers and enthusiasts have developed over the span of a few decades and review several currently available open-source data mining suites. The approaches we review are diverse in data mining methods and user interfaces and also demonstrate that the field and its tools are ready to be fully exploited in biomedical research.

  3. Online Sources of Competitive Intelligence.

    ERIC Educational Resources Information Center

    Wagers, Robert

    1986-01-01

    Presents an approach to using online sources of information for competitor intelligence (i.e., monitoring industry and tracking activities of competitors); identifies principal sources; and suggests some ways of making use of online databases. Types and sources of information and sources and database charts are appended. Eight references are…

  4. Application of China's National Forest Continuous Inventory database.

    PubMed

    Xie, Xiaokui; Wang, Qingli; Dai, Limin; Su, Dongkai; Wang, Xinchuang; Qi, Guang; Ye, Yujing

    2011-12-01

    The maintenance of a timely, reliable and accurate spatial database on current forest ecosystem conditions and changes is essential to characterize and assess forest resources and support sustainable forest management. Information for such a database can be obtained only through a continuous forest inventory. The National Forest Continuous Inventory (NFCI) is the first level of China's three-tiered inventory system. The NFCI is administered by the State Forestry Administration; data are acquired by five inventory institutions around the country. Several important components of the database include land type, forest classification and ageclass/ age-group. The NFCI database in China is constructed based on 5-year inventory periods, resulting in some of the data not being timely when reports are issued. To address this problem, a forest growth simulation model has been developed to update the database for years between the periodic inventories. In order to aid in forest plan design and management, a three-dimensional virtual reality system of forest landscapes for selected units in the database (compartment or sub-compartment) has also been developed based on Virtual Reality Modeling Language. In addition, a transparent internet publishing system for a spatial database based on open source WebGIS (UMN Map Server) has been designed and utilized to enhance public understanding and encourage free participation of interested parties in the development, implementation, and planning of sustainable forest management.

  5. Identification and evaluation of fluvial-dominated deltaic (Class 1 oil) reservoirs in Oklahoma. Yearly technical progress report, January 1--December 31, 1993

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mankin, C.J.; Banken, M.K.

    The Oklahoma Geological Survey (OGS), the Geological Information Systems department, and the School of Petroleum and Geological Engineering at the University of Oklahoma are engaged in a five-year program to identify and address Oklahoma`s oil recovery opportunities in fluvial-dominated deltaic (FDD) reservoirs. This program includes the systematic and comprehensive collection, evaluation, and distribution of information on all of Oklahoma`s FDD oil reservoirs and the recovery technologies that can be applied to those reservoirs with commercial success. Exhaustive literature searches are being conducted for these plays, both through published sources and through unpublished theses from regional universities. A bibliographic database hasmore » been developed to record these literature sources and their related plays. Trend maps are being developed to identify the FDD portions of the relevant reservoirs, through accessing current production databases and through compiling the literature results. A reservoir database system also has been developed, to record specific reservoir data elements that are identified through the literature, and through public and private data sources. Thus far, the initial demonstration for one has been completed, and second is nearly completed. All of the information gathered through these efforts will be transferred to the Oklahoma petroleum industry through a series of publications and workshops. Additionally, plans are being developed, and hardware and software resources are being acquired, in preparation for the opening of a publicly-accessible computer users laboratory, one component of the technology transfer program.« less

  6. HIPdb: a database of experimentally validated HIV inhibiting peptides.

    PubMed

    Qureshi, Abid; Thakur, Nishant; Kumar, Manoj

    2013-01-01

    Besides antiretroviral drugs, peptides have also demonstrated potential to inhibit the Human immunodeficiency virus (HIV). For example, T20 has been discovered to effectively block the HIV entry and was approved by the FDA as a novel anti-HIV peptide (AHP). We have collated all experimental information on AHPs at a single platform. HIPdb is a manually curated database of experimentally verified HIV inhibiting peptides targeting various steps or proteins involved in the life cycle of HIV e.g. fusion, integration, reverse transcription etc. This database provides experimental information of 981 peptides. These are of varying length obtained from natural as well as synthetic sources and tested on different cell lines. Important fields included are peptide sequence, length, source, target, cell line, inhibition/IC(50), assay and reference. The database provides user friendly browse, search, sort and filter options. It also contains useful services like BLAST and 'Map' for alignment with user provided sequences. In addition, predicted structure and physicochemical properties of the peptides are also included. HIPdb database is freely available at http://crdd.osdd.net/servers/hipdb. Comprehensive information of this database will be helpful in selecting/designing effective anti-HIV peptides. Thus it may prove a useful resource to researchers for peptide based therapeutics development.

  7. NHEXAS PHASE I ARIZONA STUDY--STANDARD OPERATING PROCEDURE FOR DATABASE TREE AND DATA SOURCES (UA-D-41.0)

    EPA Science Inventory

    The purpose of this SOP is to describe the database storage organization, as well as describe the sources of data for each database used during the Arizona NHEXAS project and the "Border" study. Keywords: data; database; organization.

    The National Human Exposure Assessment Sur...

  8. Development of an open source laboratory information management system for 2-D gel electrophoresis-based proteomics workflow

    PubMed Central

    Morisawa, Hiraku; Hirota, Mikako; Toda, Tosifusa

    2006-01-01

    Background In the post-genome era, most research scientists working in the field of proteomics are confronted with difficulties in management of large volumes of data, which they are required to keep in formats suitable for subsequent data mining. Therefore, a well-developed open source laboratory information management system (LIMS) should be available for their proteomics research studies. Results We developed an open source LIMS appropriately customized for 2-D gel electrophoresis-based proteomics workflow. The main features of its design are compactness, flexibility and connectivity to public databases. It supports the handling of data imported from mass spectrometry software and 2-D gel image analysis software. The LIMS is equipped with the same input interface for 2-D gel information as a clickable map on public 2DPAGE databases. The LIMS allows researchers to follow their own experimental procedures by reviewing the illustrations of 2-D gel maps and well layouts on the digestion plates and MS sample plates. Conclusion Our new open source LIMS is now available as a basic model for proteome informatics, and is accessible for further improvement. We hope that many research scientists working in the field of proteomics will evaluate our LIMS and suggest ways in which it can be improved. PMID:17018156

  9. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database.

    PubMed

    Zappia, Luke; Phipson, Belinda; Oshlack, Alicia

    2018-06-25

    As single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database (www.scRNA-tools.org) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source and open-science approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records the growth of the field over time.

  10. LOVD: easy creation of a locus-specific sequence variation database using an "LSDB-in-a-box" approach.

    PubMed

    Fokkema, Ivo F A C; den Dunnen, Johan T; Taschner, Peter E M

    2005-08-01

    The completion of the human genome project has initiated, as well as provided the basis for, the collection and study of all sequence variation between individuals. Direct access to up-to-date information on sequence variation is currently provided most efficiently through web-based, gene-centered, locus-specific databases (LSDBs). We have developed the Leiden Open (source) Variation Database (LOVD) software approaching the "LSDB-in-a-Box" idea for the easy creation and maintenance of a fully web-based gene sequence variation database. LOVD is platform-independent and uses PHP and MySQL open source software only. The basic gene-centered and modular design of the database follows the recommendations of the Human Genome Variation Society (HGVS) and focuses on the collection and display of DNA sequence variations. With minimal effort, the LOVD platform is extendable with clinical data. The open set-up should both facilitate and promote functional extension with scripts written by the community. The LOVD software is freely available from the Leiden Muscular Dystrophy pages (www.DMD.nl/LOVD/). To promote the use of LOVD, we currently offer curators the possibility to set up an LSDB on our Leiden server. (c) 2005 Wiley-Liss, Inc.

  11. Construction of crystal structure prototype database: methods and applications.

    PubMed

    Su, Chuanxun; Lv, Jian; Li, Quan; Wang, Hui; Zhang, Lijun; Wang, Yanchao; Ma, Yanming

    2017-04-26

    Crystal structure prototype data have become a useful source of information for materials discovery in the fields of crystallography, chemistry, physics, and materials science. This work reports the development of a robust and efficient method for assessing the similarity of structures on the basis of their interatomic distances. Using this method, we proposed a simple and unambiguous definition of crystal structure prototype based on hierarchical clustering theory, and constructed the crystal structure prototype database (CSPD) by filtering the known crystallographic structures in a database. With similar method, a program structure prototype analysis package (SPAP) was developed to remove similar structures in CALYPSO prediction results and extract predicted low energy structures for a separate theoretical structure database. A series of statistics describing the distribution of crystal structure prototypes in the CSPD was compiled to provide an important insight for structure prediction and high-throughput calculations. Illustrative examples of the application of the proposed database are given, including the generation of initial structures for structure prediction and determination of the prototype structure in databases. These examples demonstrate the CSPD to be a generally applicable and useful tool for materials discovery.

  12. Construction of crystal structure prototype database: methods and applications

    NASA Astrophysics Data System (ADS)

    Su, Chuanxun; Lv, Jian; Li, Quan; Wang, Hui; Zhang, Lijun; Wang, Yanchao; Ma, Yanming

    2017-04-01

    Crystal structure prototype data have become a useful source of information for materials discovery in the fields of crystallography, chemistry, physics, and materials science. This work reports the development of a robust and efficient method for assessing the similarity of structures on the basis of their interatomic distances. Using this method, we proposed a simple and unambiguous definition of crystal structure prototype based on hierarchical clustering theory, and constructed the crystal structure prototype database (CSPD) by filtering the known crystallographic structures in a database. With similar method, a program structure prototype analysis package (SPAP) was developed to remove similar structures in CALYPSO prediction results and extract predicted low energy structures for a separate theoretical structure database. A series of statistics describing the distribution of crystal structure prototypes in the CSPD was compiled to provide an important insight for structure prediction and high-throughput calculations. Illustrative examples of the application of the proposed database are given, including the generation of initial structures for structure prediction and determination of the prototype structure in databases. These examples demonstrate the CSPD to be a generally applicable and useful tool for materials discovery.

  13. Developing Modern Information Systems and Services: Africa's Challenges for the Future.

    ERIC Educational Resources Information Center

    Chowdhury, G. G.

    1996-01-01

    Discusses the current state of information systems and services in Africa, examines future possibilities, and suggests areas for improvement. Topics include the lack of automation; CD-ROM databases for accessibility to information sources; developing low-cost electronic communication facilities; Internet connectivity; dependence on imported…

  14. [Technical improvement of cohort constitution in administrative health databases: Providing a tool for integration and standardization of data applicable in the French National Health Insurance Database (SNIIRAM)].

    PubMed

    Ferdynus, C; Huiart, L

    2016-09-01

    Administrative health databases such as the French National Heath Insurance Database - SNIIRAM - are a major tool to answer numerous public health research questions. However the use of such data requires complex and time-consuming data management. Our objective was to develop and make available a tool to optimize cohort constitution within administrative health databases. We developed a process to extract, transform and load (ETL) data from various heterogeneous sources in a standardized data warehouse. This data warehouse is architected as a star schema corresponding to an i2b2 star schema model. We then evaluated the performance of this ETL using data from a pharmacoepidemiology research project conducted in the SNIIRAM database. The ETL we developed comprises a set of functionalities for creating SAS scripts. Data can be integrated into a standardized data warehouse. As part of the performance assessment of this ETL, we achieved integration of a dataset from the SNIIRAM comprising more than 900 million lines in less than three hours using a desktop computer. This enables patient selection from the standardized data warehouse within seconds of the request. The ETL described in this paper provides a tool which is effective and compatible with all administrative health databases, without requiring complex database servers. This tool should simplify cohort constitution in health databases; the standardization of warehouse data facilitates collaborative work between research teams. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  15. Recent Advances and Coming Attractions in the NASA/IPAC Extragalactic Database

    NASA Astrophysics Data System (ADS)

    Mazzarella, Joseph M.; Baker, Kay; Pan Chan, Hiu; Chen, Xi; Ebert, Rick; Frayer, Cren; Helou, George; Jacobson, Jeffery D.; Lo, Tak M.; Madore, Barry; Ogle, Patrick M.; Pevunova, Olga; Steer, Ian; Schmitz, Marion; Terek, Scott

    2017-01-01

    We review highlights of recent advances and developments underway at the NASA/IPAC Extragalactic Database (NED). Extensive updates have been made to the infrastructure and processes essential for scaling NED for the next steps in its evolution. A major overhaul of the data integration pipeline provides greater modularity and parallelization to increase the rate of source cross-matching and data integration. The new pipeline was used recently to fold in data for nearly 300,000 sources published in over 900 recent journal articles, as well as fundamental parameters for 42 million sources in the Spitzer Enhanced Imaging Products Source List. The latter has added over 360 million photometric measurements at 3.6, 4.5, 5.8. 8.0 (IRAC) and 24 microns (MIPS) to the spectral energy distributions of affected objects in NED. The recent discovery of super-luminous spiral galaxies (Ogle et al. 2016) exemplifies the opportunities for science discovery and data mining available directly from NED’s unique data synthesis, spanning the spectrum from gamma ray through radio frequencies. The number of references in NED has surpassed 103,000. In the coming year, cross-identifications of sources in the 2MASS Point Source Catalog and in the AllWISE Source Catalog with prior objects in the database (including GALEX) will increase the holdings to over a billion distinct objects, providing a rich resource for multi-wavelength analysis. Information about a recent surge in growth of redshift-independent distances in NED is presented at this meeting by Steer et al. (2017). Website updates include a ’simple search’ to perform common queries in a single entry field, an interface to query the image repository with options to sort and filter the initial results, connectivity to the IRSA Finder Chart service, as well as a program interface to query images using the international virtual observatory Simple Image Access protocol. Graphical characterizations of NED content and completeness are being further developed. A brief summary of new science functionality under development is also given. NED is operated by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration.

  16. Using Social Media to Identify Sources of Healthy Food in Urban Neighborhoods.

    PubMed

    Gomez-Lopez, Iris N; Clarke, Philippa; Hill, Alex B; Romero, Daniel M; Goodspeed, Robert; Berrocal, Veronica J; Vinod Vydiswaran, V G; Veinot, Tiffany C

    2017-06-01

    An established body of research has used secondary data sources (such as proprietary business databases) to demonstrate the importance of the neighborhood food environment for multiple health outcomes. However, documenting food availability using secondary sources in low-income urban neighborhoods can be particularly challenging since small businesses play a crucial role in food availability. These small businesses are typically underrepresented in national databases, which rely on secondary sources to develop data for marketing purposes. Using social media and other crowdsourced data to account for these smaller businesses holds promise, but the quality of these data remains unknown. This paper compares the quality of full-line grocery store information from Yelp, a crowdsourced content service, to a "ground truth" data set (Detroit Food Map) and a commercially-available dataset (Reference USA) for the greater Detroit area. Results suggest that Yelp is more accurate than Reference USA in identifying healthy food stores in urban areas. Researchers investigating the relationship between the nutrition environment and health may consider Yelp as a reliable and valid source for identifying sources of healthy food in urban environments.

  17. Supporting Building Portfolio Investment and Policy Decision Making through an Integrated Building Utility Data Platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aziz, Azizan; Lasternas, Bertrand; Alschuler, Elena

    The American Recovery and Reinvestment Act stimulus funding of 2009 for smart grid projects resulted in the tripling of smart meters deployment. In 2012, the Green Button initiative provided utility customers with access to their real-time1 energy usage. The availability of finely granular data provides an enormous potential for energy data analytics and energy benchmarking. The sheer volume of time-series utility data from a large number of buildings also poses challenges in data collection, quality control, and database management for rigorous and meaningful analyses. In this paper, we will describe a building portfolio-level data analytics tool for operational optimization, businessmore » investment and policy assessment using 15-minute to monthly intervals utility data. The analytics tool is developed on top of the U.S. Department of Energy’s Standard Energy Efficiency Data (SEED) platform, an open source software application that manages energy performance data of large groups of buildings. To support the significantly large volume of granular interval data, we integrated a parallel time-series database to the existing relational database. The time-series database improves on the current utility data input, focusing on real-time data collection, storage, analytics and data quality control. The fully integrated data platform supports APIs for utility apps development by third party software developers. These apps will provide actionable intelligence for building owners and facilities managers. Unlike a commercial system, this platform is an open source platform funded by the U.S. Government, accessible to the public, researchers and other developers, to support initiatives in reducing building energy consumption.« less

  18. New tools and methods for direct programmatic access to the dbSNP relational database.

    PubMed

    Saccone, Scott F; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A; Rice, John P

    2011-01-01

    Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale.

  19. Earthquake forecasting studies using radon time series data in Taiwan

    NASA Astrophysics Data System (ADS)

    Walia, Vivek; Kumar, Arvind; Fu, Ching-Chou; Lin, Shih-Jung; Chou, Kuang-Wu; Wen, Kuo-Liang; Chen, Cheng-Hong

    2017-04-01

    For few decades, growing number of studies have shown usefulness of data in the field of seismogeochemistry interpreted as geochemical precursory signals for impending earthquakes and radon is idendified to be as one of the most reliable geochemical precursor. Radon is recognized as short-term precursor and is being monitored in many countries. This study is aimed at developing an effective earthquake forecasting system by inspecting long term radon time series data. The data is obtained from a network of radon monitoring stations eastblished along different faults of Taiwan. The continuous time series radon data for earthquake studies have been recorded and some significant variations associated with strong earthquakes have been observed. The data is also examined to evaluate earthquake precursory signals against environmental factors. An automated real-time database operating system has been developed recently to improve the data processing for earthquake precursory studies. In addition, the study is aimed at the appraisal and filtrations of these environmental parameters, in order to create a real-time database that helps our earthquake precursory study. In recent years, automatic operating real-time database has been developed using R, an open source programming language, to carry out statistical computation on the data. To integrate our data with our working procedure, we use the popular and famous open source web application solution, AMP (Apache, MySQL, and PHP), creating a website that could effectively show and help us manage the real-time database.

  20. Development of a database system for near-future climate change projections under the Japanese National Project SI-CAT

    NASA Astrophysics Data System (ADS)

    Nakagawa, Y.; Kawahara, S.; Araki, F.; Matsuoka, D.; Ishikawa, Y.; Fujita, M.; Sugimoto, S.; Okada, Y.; Kawazoe, S.; Watanabe, S.; Ishii, M.; Mizuta, R.; Murata, A.; Kawase, H.

    2017-12-01

    Analyses of large ensemble data are quite useful in order to produce probabilistic effect projection of climate change. Ensemble data of "+2K future climate simulations" are currently produced by Japanese national project "Social Implementation Program on Climate Change Adaptation Technology (SI-CAT)" as a part of a database for Policy Decision making for Future climate change (d4PDF; Mizuta et al. 2016) produced by Program for Risk Information on Climate Change. Those data consist of global warming simulations and regional downscaling simulations. Considering that those data volumes are too large (a few petabyte) to download to a local computer of users, a user-friendly system is required to search and download data which satisfy requests of the users. We develop "a database system for near-future climate change projections" for providing functions to find necessary data for the users under SI-CAT. The database system for near-future climate change projections mainly consists of a relational database, a data download function and user interface. The relational database using PostgreSQL is a key function among them. Temporally and spatially compressed data are registered on the relational database. As a first step, we develop the relational database for precipitation, temperature and track data of typhoon according to requests by SI-CAT members. The data download function using Open-source Project for a Network Data Access Protocol (OPeNDAP) provides a function to download temporally and spatially extracted data based on search results obtained by the relational database. We also develop the web-based user interface for using the relational database and the data download function. A prototype of the database system for near-future climate change projections are currently in operational test on our local server. The database system for near-future climate change projections will be released on Data Integration and Analysis System Program (DIAS) in fiscal year 2017. Techniques of the database system for near-future climate change projections might be quite useful for simulation and observational data in other research fields. We report current status of development and some case studies of the database system for near-future climate change projections.

  1. Design and Implementation of a Three-Tiered Web-Based Inventory Ordering and Tracking System Prototype Using CORBA and Java

    DTIC Science & Technology

    2000-03-01

    languages yet still be able to access the legacy relational databases that businesses have huge investments in. JDBC is a low-level API designed for...consider the return of investment . The system requirements, discussed in Chapter II, are the main source of input to developing the relational...1996. Inprise, Gatekeeper Guide, Inprise Corporation, 1999. Kroenke, D., Database Processing Fundementals , Design, and Implementation, Sixth Edition

  2. EPA GHG Certification of Medium- and Heavy-Duty Vehicles: Development of Road Grade Profiles Representative of US Controlled Access Highways

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wood, Eric; Duran, Adam; Burton, Evan

    This report includes a detailed comparison of the TomTom national road grade database relative to a local road grade dataset generated by Southwest Research Institute and a national elevation dataset publically available from the U.S. Geological Survey. This analysis concluded that the TomTom national road grade database was a suitable source of road grade data for purposes of this study.

  3. Database for Rapid Dereplication of Known Natural Products Using Data from MS and Fast NMR Experiments.

    PubMed

    Zani, Carlos L; Carroll, Anthony R

    2017-06-23

    The discovery of novel and/or new bioactive natural products from biota sources is often confounded by the reisolation of known natural products. Dereplication strategies that involve the analysis of NMR and MS spectroscopic data to infer structural features present in purified natural products in combination with database searches of these substructures provide an efficient method to rapidly identify known natural products. Unfortunately this strategy has been hampered by the lack of publically available and comprehensive natural product databases and open source cheminformatics tools. A new platform, DEREP-NP, has been developed to help solve this problem. DEREP-NP uses the open source cheminformatics program DataWarrior to generate a database containing counts of 65 structural fragments present in 229 358 natural product structures derived from plants, animals, and microorganisms, published before 2013 and freely available in the nonproprietary Universal Natural Products Database (UNPD). By counting the number of times one or more of these structural features occurs in an unknown compound, as deduced from the analysis of its NMR ( 1 H, HSQC, and/or HMBC) and/or MS data, matching structures carrying the same numeric combination of searched structural features can be retrieved from the database. Confirmation that the matching structure is the same compound can then be verified through literature comparison of spectroscopic data. This methodology can be applied to both purified natural products and fractions containing a small number of individual compounds that are often generated as screening libraries. The utility of DEREP-NP has been verified through the analysis of spectra derived from compounds (and fractions containing two or three compounds) isolated from plant, marine invertebrate, and fungal sources. DEREP-NP is freely available at https://github.com/clzani/DEREP-NP and will help to streamline the natural product discovery process.

  4. Defense and Development in Sub-Saharan Africa: Codebook.

    DTIC Science & Technology

    1988-03-01

    countries by presenting the different data sources and explaining how they were compiled. The statistics in the 0 database cover 41 African countries for...February 1984, pp. 157-164 -vi Finally, in addition to the economic and military data , some statistics have been compiled that monitor social and...32 IX. SOCIAL/POLITICAL STATISTICS ....................................34 SOURCES AND NOTES ON COLLECTION OF DATA

  5. Brain Tumor Database, a free relational database for collection and analysis of brain tumor patient information.

    PubMed

    Bergamino, Maurizio; Hamilton, David J; Castelletti, Lara; Barletta, Laura; Castellan, Lucio

    2015-03-01

    In this study, we describe the development and utilization of a relational database designed to manage the clinical and radiological data of patients with brain tumors. The Brain Tumor Database was implemented using MySQL v.5.0, while the graphical user interface was created using PHP and HTML, thus making it easily accessible through a web browser. This web-based approach allows for multiple institutions to potentially access the database. The BT Database can record brain tumor patient information (e.g. clinical features, anatomical attributes, and radiological characteristics) and be used for clinical and research purposes. Analytic tools to automatically generate statistics and different plots are provided. The BT Database is a free and powerful user-friendly tool with a wide range of possible clinical and research applications in neurology and neurosurgery. The BT Database graphical user interface source code and manual are freely available at http://tumorsdatabase.altervista.org. © The Author(s) 2013.

  6. Cross-Matching Source Observations from the Palomar Transient Factory (PTF)

    NASA Astrophysics Data System (ADS)

    Laher, Russ; Grillmair, C.; Surace, J.; Monkewitz, S.; Jackson, E.

    2009-01-01

    Over the four-year lifetime of the PTF project, approximately 40 billion instances of astronomical-source observations will be extracted from the image data. The instances will correspond to the same astronomical objects being observed at roughly 25-50 different times, and so a very large catalog containing important object-variability information will be the chief PTF product. Organizing astronomical-source catalogs is conventionally done by dividing the catalog into declination zones and sorting by right ascension within each zone (e.g., the USNOA star catalog), in order to facilitate catalog searches. This method was reincarnated as the "zones" algorithm in a SQL-Server database implementation (Szalay et al., MSR-TR-2004-32), with corrections given by Gray et al. (MSR-TR-2006-52). The primary advantage of this implementation is that all of the work is done entirely on the database server and client/server communication is eliminated. We implemented the methods outlined in Gray et al. for a PostgreSQL database. We programmed the methods as database functions in PL/pgSQL procedural language. The cross-matching is currently based on source positions, but we intend to extend it to use both positions and positional uncertainties to form a chi-square statistic for optimal thresholding. The database design includes three main tables, plus a handful of internal tables. The Sources table stores the SExtractor source extractions taken at various times; the MergedSources table stores statistics about the astronomical objects, which are the result of cross-matching records in the Sources table; and the Merges table, which associates cross-matched primary keys in the Sources table with primary keys in the MergedSoures table. Besides judicious database indexing, we have also internally partitioned the Sources table by declination zone, in order to speed up the population of Sources records and make the database more manageable. The catalog will be accessible to the public after the proprietary period through IRSA (irsa.ipac.caltech.edu).

  7. Generation of large scale urban environments to support advanced sensor and seeker simulation

    NASA Astrophysics Data System (ADS)

    Giuliani, Joseph; Hershey, Daniel; McKeown, David, Jr.; Willis, Carla; Van, Tan

    2009-05-01

    One of the key aspects for the design of a next generation weapon system is the need to operate in cluttered and complex urban environments. Simulation systems rely on accurate representation of these environments and require automated software tools to construct the underlying 3D geometry and associated spectral and material properties that are then formatted for various objective seeker simulation systems. Under an Air Force Small Business Innovative Research (SBIR) contract, we have developed an automated process to generate 3D urban environments with user defined properties. These environments can be composed from a wide variety of source materials, including vector source data, pre-existing 3D models, and digital elevation models, and rapidly organized into a geo-specific visual simulation database. This intermediate representation can be easily inspected in the visible spectrum for content and organization and interactively queried for accuracy. Once the database contains the required contents, it can then be exported into specific synthetic scene generation runtime formats, preserving the relationship between geometry and material properties. To date an exporter for the Irma simulation system developed and maintained by AFRL/Eglin has been created and a second exporter to Real Time Composite Hardbody and Missile Plume (CHAMP) simulation system for real-time use is currently being developed. This process supports significantly more complex target environments than previous approaches to database generation. In this paper we describe the capabilities for content creation for advanced seeker processing algorithms simulation and sensor stimulation, including the overall database compilation process and sample databases produced and exported for the Irma runtime system. We also discuss the addition of object dynamics and viewer dynamics within the visual simulation into the Irma runtime environment.

  8. The Changing Role of a Professional Society Library.

    ERIC Educational Resources Information Center

    Lees, Nigel

    1997-01-01

    Describes developments in the United Kingdom's Royal Society of Chemistry's Library and Information Centre that has changed from a professional and learned society library into a business center. Development of a priced information service, electronic sources of information including online databases and the Internet, and marketing and promotion…

  9. The establishment and use of the point source catalog database of the 2MASS near infrared survey

    NASA Astrophysics Data System (ADS)

    Gao, Y. F.; Shan, H. G.; Cheng, D.

    2003-02-01

    The 2MASS near infrared survey project is introduced briefly. The 2MASS point sources catalog (2MASS PSC) database and the network query system are established by using the PHP Hypertext Preprocessor and MySQL database server. By using the system, one can not only query information of sources listed in the catalog, but also draw the plots related. Moreover, after the 2MASS data are diagnosed , some research fields which can be benefited from this database are suggested.

  10. Trading Time with Space - Development of subduction zone parameter database for a maximum magnitude correlation assessment

    NASA Astrophysics Data System (ADS)

    Schaefer, Andreas; Wenzel, Friedemann

    2017-04-01

    Subduction zones are generally the sources of the earthquakes with the highest magnitudes. Not only in Japan or Chile, but also in Pakistan, the Solomon Islands or for the Lesser Antilles, subduction zones pose a significant hazard for the people. To understand the behavior of subduction zones, especially to identify their capabilities to produce maximum magnitude earthquakes, various physical models have been developed leading to a large number of various datasets, e.g. from geodesy, geomagnetics, structural geology, etc. There have been various studies to utilize this data for the compilation of a subduction zone parameters database, but mostly concentrating on only the major zones. Here, we compile the largest dataset of subduction zone parameters both in parameter diversity but also in the number of considered subduction zones. In total, more than 70 individual sources have been assessed and the aforementioned parametric data have been combined with seismological data and many more sources have been compiled leading to more than 60 individual parameters. Not all parameters have been resolved for each zone, since the data completeness depends on the data availability and quality for each source. In addition, the 3D down-dip geometry of a majority of the subduction zones has been resolved using historical earthquake hypocenter data and centroid moment tensors where available and additionally compared and verified with results from previous studies. With such a database, a statistical study has been undertaken to identify not only correlations between those parameters to estimate a parametric driven way to identify potentials for maximum possible magnitudes, but also to identify similarities between the sources themselves. This identification of similarities leads to a classification system for subduction zones. Here, it could be expected if two sources share enough common characteristics, other characteristics of interest may be similar as well. This concept technically trades time with space, considering subduction zones where we have likely not observed the maximum possible event yet. However, by identifying sources of the same class, the not-yet observed temporal behavior can be replaced by spatial similarity among different subduction zones. This database aims to enhance the research and understanding of subduction zones and to quantify their potential in producing mega earthquakes considering potential strong motion impact on nearby cities and their tsunami potential.

  11. User's guide: Minerals management service outer continental shelf activity database (moad). Final report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Steiner, C.K.; Causley, M.C.; Yocke, M.A.

    1994-04-01

    The 1990 Clean Air Act Amendments require the Minerals Management Service (MMS) to conduct a research study to assess the potential onshore air quality impact from the development of outer continental shelf (OCS) petroleum resources in the Gulf of Mexico. The need for this study arises from concern about the cumulative impacts of current and future OCS emissions on ozone concentrations on nonattainment areas, particularly in Texas and Louisiana. To make quantitative assessments of these impacts, MMS has commissioned an air quality study which includes as a major component the development of a comprehensive emission inventory for photochemical grid modeling.more » The emission inventories prepared in this study include both onshore and offshore emissions. All relevant emissions from anthropogenic and biogenic sources are considered, with special attention focused on offshore anthropogenic sources, including OCS oil and gas production facilities, crew and supply vessels and helicopters serving OCS facilities, commercial shipping and fishing, recreational boating, intercoastal barge traffic and other sources located in the adjacent state waters. This document describes the database created during this study that contains the activity information collected for the development of the OCS platform, and crew/supply vessel and helicopter emission inventories.« less

  12. U.S.-MEXICO BORDER PROGRAM ARIZONA BORDER STUDY--STANDARD OPERATING PROCEDURE FOR DATABASE TREE AND DATA SOURCES (UA-D-41.0)

    EPA Science Inventory

    The purpose of this SOP is to describe the database storage organization, and to describe the sources of data for each database used during the Arizona NHEXAS project and the Border study. Keywords: data; database; organization.

    The U.S.-Mexico Border Program is sponsored by t...

  13. Certifiable database generation for SVS

    NASA Astrophysics Data System (ADS)

    Schiefele, Jens; Damjanovic, Dejan; Kubbat, Wolfgang

    2000-06-01

    In future aircraft cockpits SVS will be used to display 3D physical and virtual information to pilots. A review of prototype and production Synthetic Vision Displays (SVD) from Euro Telematic, UPS Advanced Technologies, Universal Avionics, VDO-Luftfahrtgeratewerk, and NASA, are discussed. As data sources terrain, obstacle, navigation, and airport data is needed, Jeppesen-Sanderson, Inc. and Darmstadt Univ. of Technology currently develop certifiable methods for acquisition, validation, and processing methods for terrain, obstacle, and airport databases. The acquired data will be integrated into a High-Quality Database (HQ-DB). This database is the master repository. It contains all information relevant for all types of aviation applications. From the HQ-DB SVS relevant data is retried, converted, decimated, and adapted into a SVS Real-Time Onboard Database (RTO-DB). The process of data acquisition, verification, and data processing will be defined in a way that allows certication within DO-200a and new RTCA/EUROCAE standards for airport and terrain data. The open formats proposed will be established and evaluated for industrial usability. Finally, a NASA-industry cooperation to develop industrial SVS products under the umbrella of the NASA Aviation Safety Program (ASP) is introduced. A key element of the SVS NASA-ASP is the Jeppesen lead task to develop methods for world-wide database generation and certification. Jeppesen will build three airport databases that will be used in flight trials with NASA aircraft.

  14. Recent advances in proteomics of cereals.

    PubMed

    Bansal, Monika; Sharma, Madhu; Kanwar, Priyanka; Goyal, Aakash

    Cereals contribute a major part of human nutrition and are considered as an integral source of energy for human diets. With genomic databases already available in cereals such as rice, wheat, barley, and maize, the focus has now moved to proteome analysis. Proteomics studies involve the development of appropriate databases based on developing suitable separation and purification protocols, identification of protein functions, and can confirm their functional networks based on already available data from other sources. Tremendous progress has been made in the past decade in generating huge data-sets for covering interactions among proteins, protein composition of various organs and organelles, quantitative and qualitative analysis of proteins, and to characterize their modulation during plant development, biotic, and abiotic stresses. Proteomics platforms have been used to identify and improve our understanding of various metabolic pathways. This article gives a brief review of efforts made by different research groups on comparative descriptive and functional analysis of proteomics applications achieved in the cereal science so far.

  15. The International Database of Efficient Appliances (IDEA): A New Resource for Global Efficiency Policy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gerke, Brian F; McNeil, Michael A; Tu, Thomas

    A major barrier to effective appliance efficiency program design and evaluation is a lack of data for determination of market baselines and cost-effective energy savings potential. The data gap is particularly acute in developing countries, which may have the greatest savings potential per unit GDP. To address this need, we are developing the International Database of Efficient Appliances (IDEA), which automatically compiles data from a wide variety of online sources to create a unified repository of information on efficiency, price, and features for a wide range of energy-consuming products across global markets. This paper summarizes the database framework and demonstratesmore » the power of IDEA as a resource for appliance efficiency research and policy development. Using IDEA data for refrigerators in China and India, we develop robust cost-effectiveness indicators that allow rapid determination of savings potential within each market, as well as comparison of that potential across markets and appliance types. We discuss implications for future energy efficiency policy development.« less

  16. Trends in Solar energy Driven Vertical Ground Source Heat Pump Systems in Sweden - An Analysis Based on the Swedish Well Database

    NASA Astrophysics Data System (ADS)

    Juhlin, K.; Gehlin, S.

    2016-12-01

    Sweden is a world leader in developing and using vertical ground source heat pump (GSHP) technology. GSHP systems extract passively stored solar energy in the ground and the Earth's natural geothermal energy. Geothermal energy is an admitted renewable energy source in Sweden since 2007 and is the third largest renewable energy source in the country today. The Geological Survey of Sweden (SGU) is the authority in Sweden that provides open access geological data of rock, soil and groundwater for the public. All wells drilled must be registered in the SGU Well Database and it is the well driller's duty to submit registration of drilled wells.Both active and passive geothermal energy systems are in use. Large GSHP systems, with at least 20 boreholes, are active geothermal energy systems. Energy is stored in the ground which allows both comfort heating and cooling to be extracted. Active systems are therefore relevant for larger properties and industrial buildings. Since 1978 more than 600 000 wells (water wells, GSHP boreholes etc) have been registered in the Well Database, with around 20 000 new registrations per year. Of these wells an estimated 320 000 wells are registered as GSHP boreholes. The vast majority of these boreholes are single boreholes for single-family houses. The number of properties with registered vertical borehole GSHP installations amounts to approximately 243 000. Of these sites between 300-350 are large GSHP systems with at least 20 boreholes. While the increase in number of new registrations for smaller homes and households has slowed down after the rapid development in the 80's and 90's, the larger installations for commercial and industrial buildings have increased in numbers over the last ten years. This poster uses data from the SGU Well Database to quantify and analyze the trends in vertical GSHP systems reported between 1978-2015 in Sweden, with special focus on large systems. From the new aggregated data, conclusions can be drawn about the development of larger vertical GSHP system installments over the years and the geographical distribution in Sweden.

  17. Structure and needs of global loss databases about natural disaster

    NASA Astrophysics Data System (ADS)

    Steuer, Markus

    2010-05-01

    Global loss databases are used for trend analyses and statistics in scientific projects, studies for governmental and nongovernmental organizations and for the insurance and finance industry as well. At the moment three global data sets are established: EM-DAT (CRED), Sigma (Swiss Re) and NatCatSERVICE (Munich Re). Together with the Asian Disaster Reduction Center (ADRC) and United Nations Development Program (UNDP) started a collaborative initiative in 2007 with the aim to agreed on and implemented a common "Disaster Category Classification and Peril Terminology for Operational Databases". This common classification has been established through several technical meetings and working groups and represents a first and important step in the development of a standardized international classification of disasters and terminology of perils. This means concrete to set up a common hierarchy and terminology for all global and regional databases on natural disasters and establish a common and agreed definition of disaster groups, main types and sub-types of events. Also the theme of georeferencing, temporal aspects, methodology and sourcing were other issues that have been identified and will be discussed. The implementation of the new and defined structure for global loss databases is already set up for Munich Re NatCatSERVICE. In the following oral session we will show the structure of the global databases as defined and in addition to give more transparency of the data sets behind published statistics and analyses. The special focus will be on the catastrophe classification from a moderate loss event up to a great natural catastrophe, also to show the quality of sources and give inside information about the assessment of overall and insured losses. Keywords: disaster category classification, peril terminology, overall and insured losses, definition

  18. A Chado case study: an ontology-based modular schema for representing genome-associated biological information.

    PubMed

    Mungall, Christopher J; Emmert, David B

    2007-07-01

    A few years ago, FlyBase undertook to design a new database schema to store Drosophila data. It would fully integrate genomic sequence and annotation data with bibliographic, genetic, phenotypic and molecular data from the literature representing a distillation of the first 100 years of research on this major animal model system. In developing this new integrated schema, FlyBase also made a commitment to ensure that its design was generic, extensible and available as open source, so that it could be employed as the core schema of any model organism data repository, thereby avoiding redundant software development and potentially increasing interoperability. Our question was whether we could create a relational database schema that would be successfully reused. Chado is a relational database schema now being used to manage biological knowledge for a wide variety of organisms, from human to pathogens, especially the classes of information that directly or indirectly can be associated with genome sequences or the primary RNA and protein products encoded by a genome. Biological databases that conform to this schema can interoperate with one another, and with application software from the Generic Model Organism Database (GMOD) toolkit. Chado is distinctive because its design is driven by ontologies. The use of ontologies (or controlled vocabularies) is ubiquitous across the schema, as they are used as a means of typing entities. The Chado schema is partitioned into integrated subschemas (modules), each encapsulating a different biological domain, and each described using representations in appropriate ontologies. To illustrate this methodology, we describe here the Chado modules used for describing genomic sequences. GMOD is a collaboration of several model organism database groups, including FlyBase, to develop a set of open-source software for managing model organism data. The Chado schema is freely distributed under the terms of the Artistic License (http://www.opensource.org/licenses/artistic-license.php) from GMOD (www.gmod.org).

  19. Development of Human Face Literature Database Using Text Mining Approach: Phase I.

    PubMed

    Kaur, Paramjit; Krishan, Kewal; Sharma, Suresh K

    2018-06-01

    The face is an important part of the human body by which an individual communicates in the society. Its importance can be highlighted by the fact that a person deprived of face cannot sustain in the living world. The amount of experiments being performed and the number of research papers being published under the domain of human face have surged in the past few decades. Several scientific disciplines, which are conducting research on human face include: Medical Science, Anthropology, Information Technology (Biometrics, Robotics, and Artificial Intelligence, etc.), Psychology, Forensic Science, Neuroscience, etc. This alarms the need of collecting and managing the data concerning human face so that the public and free access of it can be provided to the scientific community. This can be attained by developing databases and tools on human face using bioinformatics approach. The current research emphasizes on creating a database concerning literature data of human face. The database can be accessed on the basis of specific keywords, journal name, date of publication, author's name, etc. The collected research papers will be stored in the form of a database. Hence, the database will be beneficial to the research community as the comprehensive information dedicated to the human face could be found at one place. The information related to facial morphologic features, facial disorders, facial asymmetry, facial abnormalities, and many other parameters can be extracted from this database. The front end has been developed using Hyper Text Mark-up Language and Cascading Style Sheets. The back end has been developed using hypertext preprocessor (PHP). The JAVA Script has used as scripting language. MySQL (Structured Query Language) is used for database development as it is most widely used Relational Database Management System. XAMPP (X (cross platform), Apache, MySQL, PHP, Perl) open source web application software has been used as the server.The database is still under the developmental phase and discusses the initial steps of its creation. The current paper throws light on the work done till date.

  20. Clinical results of HIS, RIS, PACS integration using data integration CASE tools

    NASA Astrophysics Data System (ADS)

    Taira, Ricky K.; Chan, Hing-Ming; Breant, Claudine M.; Huang, Lu J.; Valentino, Daniel J.

    1995-05-01

    Current infrastructure research in PACS is dominated by the development of communication networks (local area networks, teleradiology, ATM networks, etc.), multimedia display workstations, and hierarchical image storage architectures. However, limited work has been performed on developing flexible, expansible, and intelligent information processing architectures for the vast decentralized image and text data repositories prevalent in healthcare environments. Patient information is often distributed among multiple data management systems. Current large-scale efforts to integrate medical information and knowledge sources have been costly with limited retrieval functionality. Software integration strategies to unify distributed data and knowledge sources is still lacking commercially. Systems heterogeneity (i.e., differences in hardware platforms, communication protocols, database management software, nomenclature, etc.) is at the heart of the problem and is unlikely to be standardized in the near future. In this paper, we demonstrate the use of newly available CASE (computer- aided software engineering) tools to rapidly integrate HIS, RIS, and PACS information systems. The advantages of these tools include fast development time (low-level code is generated from graphical specifications), and easy system maintenance (excellent documentation, easy to perform changes, and centralized code repository in an object-oriented database). The CASE tools are used to develop and manage the `middle-ware' in our client- mediator-serve architecture for systems integration. Our architecture is scalable and can accommodate heterogeneous database and communication protocols.

  1. Dietary fibre: challenges in production and use of food composition data.

    PubMed

    Westenbrink, Susanne; Brunt, Kommer; van der Kamp, Jan-Willem

    2013-10-01

    Dietary fibre is a heterogeneous group of components for which several definitions and analytical methods were developed over the past decades, causing confusion among users and producers of dietary fibre data in food composition databases. An overview is given of current definitions and analytical methods. Some of the issues related to maintaining dietary fibre values in food composition databases are discussed. Newly developed AOAC methods (2009.01 or modifications) yield higher dietary fibre values, due to the inclusion of low molecular weight dietary fibre and resistant starch. For food composition databases procedures need to be developed to combine 'classic' and 'new' dietary fibre values since re-analysing all foods on short notice is impossible due to financial restrictions. Standardised value documentation procedures are important to evaluate dietary fibre values from several sources before exchanging and using the data, e.g. for dietary intake research. Copyright © 2012 Elsevier Ltd. All rights reserved.

  2. Cry-Bt identifier: a biological database for PCR detection of Cry genes present in transgenic plants.

    PubMed

    Singh, Vinay Kumar; Ambwani, Sonu; Marla, Soma; Kumar, Anil

    2009-10-23

    We describe the development of a user friendly tool that would assist in the retrieval of information relating to Cry genes in transgenic crops. The tool also helps in detection of transformed Cry genes from Bacillus thuringiensis present in transgenic plants by providing suitable designed primers for PCR identification of these genes. The tool designed based on relational database model enables easy retrieval of information from the database with simple user queries. The tool also enables users to access related information about Cry genes present in various databases by interacting with different sources (nucleotide sequences, protein sequence, sequence comparison tools, published literature, conserved domains, evolutionary and structural data). http://insilicogenomics.in/Cry-btIdentifier/welcome.html.

  3. Global building inventory for earthquake loss estimation and risk management

    USGS Publications Warehouse

    Jaiswal, Kishor; Wald, David; Porter, Keith

    2010-01-01

    We develop a global database of building inventories using taxonomy of global building types for use in near-real-time post-earthquake loss estimation and pre-earthquake risk analysis, for the U.S. Geological Survey’s Prompt Assessment of Global Earthquakes for Response (PAGER) program. The database is available for public use, subject to peer review, scrutiny, and open enhancement. On a country-by-country level, it contains estimates of the distribution of building types categorized by material, lateral force resisting system, and occupancy type (residential or nonresidential, urban or rural). The database draws on and harmonizes numerous sources: (1) UN statistics, (2) UN Habitat’s demographic and health survey (DHS) database, (3) national housing censuses, (4) the World Housing Encyclopedia and (5) other literature.

  4. Dealing with the Data Deluge: Handling the Multitude Of Chemical Biology Data Sources

    PubMed Central

    Guha, Rajarshi; Nguyen, Dac-Trung; Southall, Noel; Jadhav, Ajit

    2012-01-01

    Over the last 20 years, there has been an explosion in the amount and type of biological and chemical data that has been made publicly available in a variety of online databases. While this means that vast amounts of information can be found online, there is no guarantee that it can be found easily (or at all). A scientist searching for a specific piece of information is faced with a daunting task - many databases have overlapping content, use their own identifiers and, in some cases, have arcane and unintuitive user interfaces. In this overview, a variety of well known data sources for chemical and biological information are highlighted, focusing on those most useful for chemical biology research. The issue of using multiple data sources together and the associated problems such as identifier disambiguation are highlighted. A brief discussion is then provided on Tripod, a recently developed platform that supports the integration of arbitrary data sources, providing users a simple interface to search across a federated collection of resources. PMID:26609498

  5. Transformative Use of an Improved All-Payer Hospital Discharge Data Infrastructure for Community-Based Participatory Research: A Sustainability Pathway

    PubMed Central

    Salemi, Jason L; Salinas-Miranda, Abraham A; Wilson, Roneé E; Salihu, Hamisu M

    2015-01-01

    Objective To describe the use of a clinically enhanced maternal and child health (MCH) database to strengthen community-engaged research activities, and to support the sustainability of data infrastructure initiatives. Data Sources/Study Setting Population-based, longitudinal database covering over 2.3 million mother–infant dyads during a 12-year period (1998–2009) in Florida. Setting: A community-based participatory research (CBPR) project in a socioeconomically disadvantaged community in central Tampa, Florida. Study Design Case study of the use of an enhanced state database for supporting CBPR activities. Principal Findings A federal data infrastructure award resulted in the creation of an MCH database in which over 92 percent of all birth certificate records for infants born between 1998 and 2009 were linked to maternal and infant hospital encounter-level data. The population-based, longitudinal database was used to supplement data collected from focus groups and community surveys with epidemiological and health care cost data on important MCH disparity issues in the target community. Data were used to facilitate a community-driven, decision-making process in which the most important priorities for intervention were identified. Conclusions Integrating statewide all-payer, hospital-based databases into CBPR can empower underserved communities with a reliable source of health data, and it can promote the sustainability of newly developed data systems. PMID:25879276

  6. Classification of Antibiotic Resistance Patterns of Indicator Bacteria by Discriminant Analysis: Use in Predicting the Source of Fecal Contamination in Subtropical Waters

    PubMed Central

    Harwood, Valerie J.; Whitlock, John; Withington, Victoria

    2000-01-01

    The antibiotic resistance patterns of fecal streptococci and fecal coliforms isolated from domestic wastewater and animal feces were determined using a battery of antibiotics (amoxicillin, ampicillin, cephalothin, chlortetracycline, oxytetracycline, tetracycline, erythromycin, streptomycin, and vancomycin) at four concentrations each. The sources of animal feces included wild birds, cattle, chickens, dogs, pigs, and raccoons. Antibiotic resistance patterns of fecal streptococci and fecal coliforms from known sources were grouped into two separate databases, and discriminant analysis of these patterns was used to establish the relationship between the antibiotic resistance patterns and the bacterial source. The fecal streptococcus and fecal coliform databases classified isolates from known sources with similar accuracies. The average rate of correct classification for the fecal streptococcus database was 62.3%, and that for the fecal coliform database was 63.9%. The sources of fecal streptococci and fecal coliforms isolated from surface waters were identified by discriminant analysis of their antibiotic resistance patterns. Both databases identified the source of indicator bacteria isolated from surface waters directly impacted by septic tank discharges as human. At sample sites selected for relatively low anthropogenic impact, the dominant sources of indicator bacteria were identified as various animals. The antibiotic resistance analysis technique promises to be a useful tool in assessing sources of fecal contamination in subtropical waters, such as those in Florida. PMID:10966379

  7. Data integration and warehousing: coordination between newborn screening and related public health programs.

    PubMed

    Therrell, Bradford L

    2003-01-01

    At birth, patient demographic and health information begin to accumulate in varied databases. There are often multiple sources of the same or similar data. New public health programs are often created without considering data linkages. Recently, newborn hearing screening (NHS) programs and immunization programs have virtually ignored the existence of newborn dried blood spot (DBS) newborn screening databases containing similar demographic data, creating data duplication in their 'new' systems. Some progressive public health departments are developing data warehouses of basic, recurrent patient information, and linking these databases to other health program databases where programs and services can benefit from such linkages. Demographic data warehousing saves time (and money) by eliminating duplicative data entry and reducing the chances of data errors. While newborn screening data are usually the first data available, they should not be the only data source considered for early data linkage or for populating a data warehouse. Birth certificate information should also be considered along with other data sources for infants that may not have received newborn screening or who may have been born outside of the jurisdiction and not have birth certificate information locally available. This newborn screening serial number provides a convenient identification number for use in the DBS program and for linking with other systems. As a minimum, data linkages should exist between newborn dried blood spot screening, newborn hearing screening, immunizations, birth certificates and birth defect registries.

  8. A geodata warehouse: Using denormalisation techniques as a tool for delivering spatially enabled integrated geological information to geologists

    NASA Astrophysics Data System (ADS)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2016-11-01

    New requirements to understand geological properties in three dimensions have led to the development of PropBase, a data structure and delivery tools to deliver this. At the BGS, relational database management systems (RDBMS) has facilitated effective data management using normalised subject-based database designs with business rules in a centralised, vocabulary controlled, architecture. These have delivered effective data storage in a secure environment. However, isolated subject-oriented designs prevented efficient cross-domain querying of datasets. Additionally, the tools provided often did not enable effective data discovery as they struggled to resolve the complex underlying normalised structures providing poor data access speeds. Users developed bespoke access tools to structures they did not fully understand sometimes delivering them incorrect results. Therefore, BGS has developed PropBase, a generic denormalised data structure within an RDBMS to store property data, to facilitate rapid and standardised data discovery and access, incorporating 2D and 3D physical and chemical property data, with associated metadata. This includes scripts to populate and synchronise the layer with its data sources through structured input and transcription standards. A core component of the architecture includes, an optimised query object, to deliver geoscience information from a structure equivalent to a data warehouse. This enables optimised query performance to deliver data in multiple standardised formats using a web discovery tool. Semantic interoperability is enforced through vocabularies combined from all data sources facilitating searching of related terms. PropBase holds 28.1 million spatially enabled property data points from 10 source databases incorporating over 50 property data types with a vocabulary set that includes 557 property terms. By enabling property data searches across multiple databases PropBase has facilitated new scientific research, previously considered impractical. PropBase is easily extended to incorporate 4D data (time series) and is providing a baseline for new "big data" monitoring projects.

  9. Fish Karyome: A karyological information network database of Indian Fishes.

    PubMed

    Nagpure, Naresh Sahebrao; Pathak, Ajey Kumar; Pati, Rameshwar; Singh, Shri Prakash; Singh, Mahender; Sarkar, Uttam Kumar; Kushwaha, Basdeo; Kumar, Ravindra

    2012-01-01

    'Fish Karyome', a database on karyological information of Indian fishes have been developed that serves as central source for karyotype data about Indian fishes compiled from the published literature. Fish Karyome has been intended to serve as a liaison tool for the researchers and contains karyological information about 171 out of 2438 finfish species reported in India and is publically available via World Wide Web. The database provides information on chromosome number, morphology, sex chromosomes, karyotype formula and cytogenetic markers etc. Additionally, it also provides the phenotypic information that includes species name, its classification, and locality of sample collection, common name, local name, sex, geographical distribution, and IUCN Red list status. Besides, fish and karyotype images, references for 171 finfish species have been included in the database. Fish Karyome has been developed using SQL Server 2008, a relational database management system, Microsoft's ASP.NET-2008 and Macromedia's FLASH Technology under Windows 7 operating environment. The system also enables users to input new information and images into the database, search and view the information and images of interest using various search options. Fish Karyome has wide range of applications in species characterization and identification, sex determination, chromosomal mapping, karyo-evolution and systematics of fishes.

  10. [A Terahertz Spectral Database Based on Browser/Server Technique].

    PubMed

    Zhang, Zhuo-yong; Song, Yue

    2015-09-01

    With the solution of key scientific and technical problems and development of instrumentation, the application of terahertz technology in various fields has been paid more and more attention. Owing to the unique characteristic advantages, terahertz technology has been showing a broad future in the fields of fast, non-damaging detections, as well as many other fields. Terahertz technology combined with other complementary methods can be used to cope with many difficult practical problems which could not be solved before. One of the critical points for further development of practical terahertz detection methods depends on a good and reliable terahertz spectral database. We developed a BS (browser/server) -based terahertz spectral database recently. We designed the main structure and main functions to fulfill practical requirements. The terahertz spectral database now includes more than 240 items, and the spectral information was collected based on three sources: (1) collection and citation from some other abroad terahertz spectral databases; (2) collected from published literatures; and (3) spectral data measured in our laboratory. The present paper introduced the basic structure and fundament functions of the terahertz spectral database developed in our laboratory. One of the key functions of this THz database is calculation of optical parameters. Some optical parameters including absorption coefficient, refractive index, etc. can be calculated based on the input THz time domain spectra. The other main functions and searching methods of the browser/server-based terahertz spectral database have been discussed. The database search system can provide users convenient functions including user registration, inquiry, displaying spectral figures and molecular structures, spectral matching, etc. The THz database system provides an on-line searching function for registered users. Registered users can compare the input THz spectrum with the spectra of database, according to the obtained correlation coefficient one can perform the searching task very fast and conveniently. Our terahertz spectral database can be accessed at http://www.teralibrary.com. The proposed terahertz spectral database is based on spectral information so far, and will be improved in the future. We hope this terahertz spectral database can provide users powerful, convenient, and high efficient functions, and could promote the broader applications of terahertz technology.

  11. Mass and Reliability Source (MaRS) Database

    NASA Technical Reports Server (NTRS)

    Valdenegro, Wladimir

    2017-01-01

    The Mass and Reliability Source (MaRS) Database consolidates components mass and reliability data for all Oribital Replacement Units (ORU) on the International Space Station (ISS) into a single database. It was created to help engineers develop a parametric model that relates hardware mass and reliability. MaRS supplies relevant failure data at the lowest possible component level while providing support for risk, reliability, and logistics analysis. Random-failure data is usually linked to the ORU assembly. MaRS uses this data to identify and display the lowest possible component failure level. As seen in Figure 1, the failure point is identified to the lowest level: Component 2.1. This is useful for efficient planning of spare supplies, supporting long duration crewed missions, allowing quicker trade studies, and streamlining diagnostic processes. MaRS is composed of information from various databases: MADS (operating hours), VMDB (indentured part lists), and ISS PART (failure data). This information is organized in Microsoft Excel and accessed through a program made in Microsoft Access (Figure 2). The focus of the Fall 2017 internship tour was to identify the components that were the root cause of failure from the given random-failure data, develop a taxonomy for the database, and attach material headings to the component list. Secondary objectives included verifying the integrity of the data in MaRS, eliminating any part discrepancies, and generating documentation for future reference. Due to the nature of the random-failure data, data mining had to be done manually without the assistance of an automated program to ensure positive identification.

  12. Database of historically documented springs and spring flow measurements in Texas

    USGS Publications Warehouse

    Heitmuller, Franklin T.; Reece, Brian D.

    2003-01-01

    Springs are naturally occurring features that convey excess ground water to the land surface; they represent a transition from ground water to surface water. Water issues through one opening, multiple openings, or numerous seeps in the rock or soil. The database of this report provides information about springs and spring flow in Texas including spring names, identification numbers, location, and, if available, water source and use. This database does not include every spring in Texas, but is limited to an aggregation of selected digital and hard-copy data of the U.S. Geological Survey (USGS), the Texas Water Development Board (TWDB), and Capitol Environmental Services.

  13. A Mediterranean coastal database for assessing the impacts of sea-level rise and associated hazards

    NASA Astrophysics Data System (ADS)

    Wolff, Claudia; Vafeidis, Athanasios T.; Muis, Sanne; Lincke, Daniel; Satta, Alessio; Lionello, Piero; Jimenez, Jose A.; Conte, Dario; Hinkel, Jochen

    2018-03-01

    We have developed a new coastal database for the Mediterranean basin that is intended for coastal impact and adaptation assessment to sea-level rise and associated hazards on a regional scale. The data structure of the database relies on a linear representation of the coast with associated spatial assessment units. Using information on coastal morphology, human settlements and administrative boundaries, we have divided the Mediterranean coast into 13 900 coastal assessment units. To these units we have spatially attributed 160 parameters on the characteristics of the natural and socio-economic subsystems, such as extreme sea levels, vertical land movement and number of people exposed to sea-level rise and extreme sea levels. The database contains information on current conditions and on plausible future changes that are essential drivers for future impacts, such as sea-level rise rates and socio-economic development. Besides its intended use in risk and impact assessment, we anticipate that the Mediterranean Coastal Database (MCD) constitutes a useful source of information for a wide range of coastal applications.

  14. A geospatial database model for the management of remote sensing datasets at multiple spectral, spatial, and temporal scales

    NASA Astrophysics Data System (ADS)

    Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George

    2017-10-01

    In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.

  15. A Mediterranean coastal database for assessing the impacts of sea-level rise and associated hazards

    PubMed Central

    Wolff, Claudia; Vafeidis, Athanasios T.; Muis, Sanne; Lincke, Daniel; Satta, Alessio; Lionello, Piero; Jimenez, Jose A.; Conte, Dario; Hinkel, Jochen

    2018-01-01

    We have developed a new coastal database for the Mediterranean basin that is intended for coastal impact and adaptation assessment to sea-level rise and associated hazards on a regional scale. The data structure of the database relies on a linear representation of the coast with associated spatial assessment units. Using information on coastal morphology, human settlements and administrative boundaries, we have divided the Mediterranean coast into 13 900 coastal assessment units. To these units we have spatially attributed 160 parameters on the characteristics of the natural and socio-economic subsystems, such as extreme sea levels, vertical land movement and number of people exposed to sea-level rise and extreme sea levels. The database contains information on current conditions and on plausible future changes that are essential drivers for future impacts, such as sea-level rise rates and socio-economic development. Besides its intended use in risk and impact assessment, we anticipate that the Mediterranean Coastal Database (MCD) constitutes a useful source of information for a wide range of coastal applications. PMID:29583140

  16. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases

    PubMed Central

    Orchard, Sandra; Ammari, Mais; Aranda, Bruno; Breuza, Lionel; Briganti, Leonardo; Broackes-Carter, Fiona; Campbell, Nancy H.; Chavali, Gayatri; Chen, Carol; del-Toro, Noemi; Duesbury, Margaret; Dumousseau, Marine; Galeota, Eugenia; Hinz, Ursula; Iannuccelli, Marta; Jagannathan, Sruthi; Jimenez, Rafael; Khadake, Jyoti; Lagreid, Astrid; Licata, Luana; Lovering, Ruth C.; Meldal, Birgit; Melidoni, Anna N.; Milagros, Mila; Peluso, Daniele; Perfetto, Livia; Porras, Pablo; Raghunath, Arathi; Ricard-Blum, Sylvie; Roechert, Bernd; Stutz, Andre; Tognolli, Michael; van Roey, Kim; Cesareni, Gianni; Hermjakob, Henning

    2014-01-01

    IntAct (freely available at http://www.ebi.ac.uk/intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMEx- and MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriate levels of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium (http://www.imexconsortium.org). PMID:24234451

  17. National security and national competitiveness: Open source solutions; NASA requirements and capabilities

    NASA Technical Reports Server (NTRS)

    Cotter, Gladys A.

    1993-01-01

    Foreign competitors are challenging the world leadership of the U.S. aerospace industry, and increasingly tight budgets everywhere make international cooperation in aerospace science necessary. The NASA STI Program has as part of its mission to support NASA R&D, and to that end has developed a knowledge base of aerospace-related information known as the NASA Aerospace Database. The NASA STI Program is already involved in international cooperation with NATO/AGARD/TIP, CENDI, ICSU/ICSTI, and the U.S. Japan Committee on STI. With the new more open political climate, the perceived dearth of foreign information in the NASA Aerospace Database, and the development of the ESA database and DELURA, the German databases, the NASA STI Program is responding by sponsoring workshops on foreign acquisitions and by increasing its cooperation with international partners and with other U.S. agencies. The STI Program looks to the future of improved database access through networking and a GUI; new media; optical disk, video, and full text; and a Technology Focus Group that will keep the NASA STI Program current with technology.

  18. PineElm_SSRdb: a microsatellite marker database identified from genomic, chloroplast, mitochondrial and EST sequences of pineapple (Ananas comosus (L.) Merrill).

    PubMed

    Chaudhary, Sakshi; Mishra, Bharat Kumar; Vivek, Thiruvettai; Magadum, Santoshkumar; Yasin, Jeshima Khan

    2016-01-01

    Simple Sequence Repeats or microsatellites are resourceful molecular genetic markers. There are only few reports of SSR identification and development in pineapple. Complete genome sequence of pineapple available in the public domain can be used to develop numerous novel SSRs. Therefore, an attempt was made to identify SSRs from genomic, chloroplast, mitochondrial and EST sequences of pineapple which will help in deciphering genetic makeup of its germplasm resources. A total of 359511 SSRs were identified in pineapple (356385 from genome sequence, 45 from chloroplast sequence, 249 in mitochondrial sequence and 2832 from EST sequences). The list of EST-SSR markers and their details are available in the database. PineElm_SSRdb is an open source database available for non-commercial academic purpose at http://app.bioelm.com/ with a mapping tool which can develop circular maps of selected marker set. This database will be of immense use to breeders, researchers and graduates working on Ananas spp. and to others working on cross-species transferability of markers, investigating diversity, mapping and DNA fingerprinting.

  19. "Publish or Perish" as citation metrics used to analyze scientific output in the humanities: International case studies in economics, geography, social sciences, philosophy, and history.

    PubMed

    Baneyx, Audrey

    2008-01-01

    Traditionally, the most commonly used source of bibliometric data is the Thomson ISI Web of Knowledge, in particular the (Social) Science Citation Index and the Journal Citation Reports, which provide the yearly Journal Impact Factors. This database used for the evaluation of researchers is not advantageous in the humanities, mainly because books, conference papers, and non-English journals, which are an important part of scientific activity, are not (well) covered. This paper presents the use of an alternative source of data, Google Scholar, and its benefits in calculating citation metrics in the humanities. Because of its broader range of data sources, the use of Google Scholar generally results in more comprehensive citation coverage in the humanities. This presentation compares and analyzes some international case studies with ISI Web of Knowledge and Google Scholar. The fields of economics, geography, social sciences, philosophy, and history are focused on to illustrate the differences of results between these two databases. To search for relevant publications in the Google Scholar database, the use of "Publish or Perish" and of CleanPoP, which the author developed to clean the results, are compared.

  20. New tools and methods for direct programmatic access to the dbSNP relational database

    PubMed Central

    Saccone, Scott F.; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A.; Rice, John P.

    2011-01-01

    Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale. PMID:21037260

  1. CDM analysis

    NASA Technical Reports Server (NTRS)

    Larson, Robert E.; Mcentire, Paul L.; Oreilly, John G.

    1993-01-01

    The C Data Manager (CDM) is an advanced tool for creating an object-oriented database and for processing queries related to objects stored in that database. The CDM source code was purchased and will be modified over the course of the Arachnid project. In this report, the modified CDM is referred to as MCDM. Using MCDM, a detailed series of experiments was designed and conducted on a Sun Sparcstation. The primary results and analysis of the CDM experiment are provided in this report. The experiments involved creating the Long-form Faint Source Catalog (LFSC) database and then analyzing it with respect to following: (1) the relationships between the volume of data and the time required to create a database; (2) the storage requirements of the database files; and (3) the properties of query algorithms. The effort focused on defining, implementing, and analyzing seven experimental scenarios: (1) find all sources by right ascension--RA; (2) find all sources by declination--DEC; (3) find all sources in the right ascension interval--RA1, RA2; (4) find all sources in the declination interval--DEC1, DEC2; (5) find all sources in the rectangle defined by--RA1, RA2, DEC1, DEC2; (6) find all sources that meet certain compound conditions; and (7) analyze a variety of query algorithms. Throughout this document, the numerical results obtained from these scenarios are reported; conclusions are presented at the end of the document.

  2. Fatigue Crack Growth Database for Damage Tolerance Analysis

    NASA Technical Reports Server (NTRS)

    Forman, R. G.; Shivakumar, V.; Cardinal, J. W.; Williams, L. C.; McKeighan, P. C.

    2005-01-01

    The objective of this project was to begin the process of developing a fatigue crack growth database (FCGD) of metallic materials for use in damage tolerance analysis of aircraft structure. For this initial effort, crack growth rate data in the NASGRO (Registered trademark) database, the United States Air Force Damage Tolerant Design Handbook, and other publicly available sources were examined and used to develop a database that characterizes crack growth behavior for specific applications (materials). The focus of this effort was on materials for general commercial aircraft applications, including large transport airplanes, small transport commuter airplanes, general aviation airplanes, and rotorcraft. The end products of this project are the FCGD software and this report. The specific goal of this effort was to present fatigue crack growth data in three usable formats: (1) NASGRO equation parameters, (2) Walker equation parameters, and (3) tabular data points. The development of this FCGD will begin the process of developing a consistent set of standard fatigue crack growth material properties. It is envisioned that the end product of the process will be a general repository for credible and well-documented fracture properties that may be used as a default standard in damage tolerance analyses.

  3. JEnsembl: a version-aware Java API to Ensembl data systems.

    PubMed

    Paterson, Trevor; Law, Andy

    2012-11-01

    The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing 'through time' comparative analyses to be performed. Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net).

  4. Kamine Development Corporation Request for a PSD Innovative Control Technology Waiver

    EPA Pesticide Factsheets

    This document may be of assistance in applying the New Source Review (NSR) air permitting regulations including the Prevention of Significant Deterioration (PSD) requirements. This document is part of the NSR Policy and Guidance Database. Some documents in the database are a scanned or retyped version of a paper photocopy of the original. Although we have taken considerable effort to quality assure the documents, some may contain typographical errors. Contact the office that issued the document if you need a copy of the original.

  5. Computer programs to characterize alloys and predict cyclic life using the total strain version of strainrange partitioning: Tutorial and users manual, version 1.0

    NASA Technical Reports Server (NTRS)

    Saltsman, James F.

    1992-01-01

    This manual presents computer programs for characterizing and predicting fatigue and creep-fatigue resistance of metallic materials in the high-temperature, long-life regime for isothermal and nonisothermal fatigue. The programs use the total strain version of Strainrange Partitioning (TS-SRP). An extensive database has also been developed in a parallel effort. This database is probably the largest source of high-temperature, creep-fatigue test data available in the public domain and can be used with other life prediction methods as well. This users manual, software, and database are all in the public domain and are available through COSMIC (382 East Broad Street, Athens, GA 30602; (404) 542-3265, FAX (404) 542-4807). Two disks accompany this manual. The first disk contains the source code, executable files, and sample output from these programs. The second disk contains the creep-fatigue data in a format compatible with these programs.

  6. California Fault Parameters for the National Seismic Hazard Maps and Working Group on California Earthquake Probabilities 2007

    USGS Publications Warehouse

    Wills, Chris J.; Weldon, Ray J.; Bryant, W.A.

    2008-01-01

    This report describes development of fault parameters for the 2007 update of the National Seismic Hazard Maps and the Working Group on California Earthquake Probabilities (WGCEP, 2007). These reference parameters are contained within a database intended to be a source of values for use by scientists interested in producing either seismic hazard or deformation models to better understand the current seismic hazards in California. These parameters include descriptions of the geometry and rates of movements of faults throughout the state. These values are intended to provide a starting point for development of more sophisticated deformation models which include known rates of movement on faults as well as geodetic measurements of crustal movement and the rates of movements of the tectonic plates. The values will be used in developing the next generation of the time-independent National Seismic Hazard Maps, and the time-dependant seismic hazard calculations being developed for the WGCEP. Due to the multiple uses of this information, development of these parameters has been coordinated between USGS, CGS and SCEC. SCEC provided the database development and editing tools, in consultation with USGS, Golden. This database has been implemented in Oracle and supports electronic access (e.g., for on-the-fly access). A GUI-based application has also been developed to aid in populating the database. Both the continually updated 'living' version of this database, as well as any locked-down official releases (e.g., used in a published model for calculating earthquake probabilities or seismic shaking hazards) are part of the USGS Quaternary Fault and Fold Database http://earthquake.usgs.gov/regional/qfaults/ . CGS has been primarily responsible for updating and editing of the fault parameters, with extensive input from USGS and SCEC scientists.

  7. A Web-based open-source database for the distribution of hyperspectral signatures

    NASA Astrophysics Data System (ADS)

    Ferwerda, J. G.; Jones, S. D.; Du, Pei-Jun

    2006-10-01

    With the coming of age of field spectroscopy as a non-destructive means to collect information on the physiology of vegetation, there is a need for storage of signatures, and, more importantly, their metadata. Without the proper organisation of metadata, the signatures itself become limited. In order to facilitate re-distribution of data, a database for the storage & distribution of hyperspectral signatures and their metadata was designed. The database was built using open-source software, and can be used by the hyperspectral community to share their data. Data is uploaded through a simple web-based interface. The database recognizes major file-formats by ASD, GER and International Spectronics. The database source code is available for download through the hyperspectral.info web domain, and we happily invite suggestion for additions & modification for the database to be submitted through the online forums on the same website.

  8. Preliminary geologic map of the Oat Mountain 7.5' quadrangle, Southern California: a digital database

    USGS Publications Warehouse

    Yerkes, R.F.; Campbell, Russell H.

    1995-01-01

    This database, identified as "Preliminary Geologic Map of the Oat Mountain 7.5' Quadrangle, southern California: A Digital Database," has been approved for release and publication by the Director of the USGS. Although this database has been reviewed and is substantially complete, the USGS reserves the right to revise the data pursuant to further analysis and review. This database is released on condition that neither the USGS nor the U. S. Government may be held liable for any damages resulting from its use. This digital map database is compiled from previously published sources combined with some new mapping and modifications in nomenclature. The geologic map database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U. S. Geological Survey. For detailed descriptions of the units, their stratigraphic relations and sources of geologic mapping consult Yerkes and Campbell (1993). More specific information about the units may be available in the original sources.

  9. The plant phenological online database (PPODB): an online database for long-term phenological data

    NASA Astrophysics Data System (ADS)

    Dierenbach, Jonas; Badeck, Franz-W.; Schaber, Jörg

    2013-09-01

    We present an online database that provides unrestricted and free access to over 16 million plant phenological observations from over 8,000 stations in Central Europe between the years 1880 and 2009. Unique features are (1) a flexible and unrestricted access to a full-fledged database, allowing for a wide range of individual queries and data retrieval, (2) historical data for Germany before 1951 ranging back to 1880, and (3) more than 480 curated long-term time series covering more than 100 years for individual phenological phases and plants combined over Natural Regions in Germany. Time series for single stations or Natural Regions can be accessed through a user-friendly graphical geo-referenced interface. The joint databases made available with the plant phenological database PPODB render accessible an important data source for further analyses of long-term changes in phenology. The database can be accessed via www.ppodb.de .

  10. How to prepare a systematic review of economic evaluations for clinical practice guidelines: database selection and search strategy development (part 2/3).

    PubMed

    Thielen, F W; Van Mastrigt, Gapg; Burgers, L T; Bramer, W M; Majoie, Hjm; Evers, Smaa; Kleijnen, J

    2016-12-01

    This article is part of the series "How to prepare a systematic review of economic evaluations (EES) for informing evidence-based healthcare decisions", in which a five-step approach is proposed. Areas covered: This paper focuses on the selection of relevant databases and developing a search strategy for detecting EEs, as well as on how to perform the search and how to extract relevant data from retrieved records. Expert commentary: Thus far, little has been published on how to conduct systematic review EEs. Moreover, reliable sources of information, such as the Health Economic Evaluation Database, have ceased to publish updates. Researchers are thus left without authoritative guidance on how to conduct SR-EEs. Together with van Mastrigt et al. we seek to fill this gap.

  11. "TPSX: Thermal Protection System Expert and Material Property Database"

    NASA Technical Reports Server (NTRS)

    Squire, Thomas H.; Milos, Frank S.; Rasky, Daniel J. (Technical Monitor)

    1997-01-01

    The Thermal Protection Branch at NASA Ames Research Center has developed a computer program for storing, organizing, and accessing information about thermal protection materials. The program, called Thermal Protection Systems Expert and Material Property Database, or TPSX, is available for the Microsoft Windows operating system. An "on-line" version is also accessible on the World Wide Web. TPSX is designed to be a high-quality source for TPS material properties presented in a convenient, easily accessible form for use by engineers and researchers in the field of high-speed vehicle design. Data can be displayed and printed in several formats. An information window displays a brief description of the material with properties at standard pressure and temperature. A spread sheet window displays complete, detailed property information. Properties which are a function of temperature and/or pressure can be displayed as graphs. In any display the data can be converted from English to SI units with the click of a button. Two material databases included with TPSX are: 1) materials used and/or developed by the Thermal Protection Branch at NASA Ames Research Center, and 2) a database compiled by NASA Johnson Space Center 9JSC). The Ames database contains over 60 advanced TPS materials including flexible blankets, rigid ceramic tiles, and ultra-high temperature ceramics. The JSC database contains over 130 insulative and structural materials. The Ames database is periodically updated and expanded as required to include newly developed materials and material property refinements.

  12. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases.

    PubMed

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-10-18

    Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at http://www.ebi.ac.uk/Tools/picr.

  13. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases

    PubMed Central

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-01-01

    Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. Results We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at . PMID:17945017

  14. A multimedia perioperative record keeper for clinical research.

    PubMed

    Perrino, A C; Luther, M A; Phillips, D B; Levin, F L

    1996-05-01

    To develop a multimedia perioperative recordkeeper that provides: 1. synchronous, real-time acquisition of multimedia data, 2. on-line access to the patient's chart data, and 3. advanced data analysis capabilities through integrated, multimedia database and analysis applications. To minimize cost and development time, the system design utilized industry standard hardware components and graphical. software development tools. The system was configured to use a Pentium PC complemented with a variety of hardware interfaces to external data sources. These sources included physiologic monitors with data in digital, analog, video, and audio as well as paper-based formats. The development process was guided by trials in over 80 clinical cases and by the critiques from numerous users. As a result of this process, a suite of custom software applications were created to meet the design goals. The Perioperative Data Acquisition application manages data collection from a variety of physiological monitors. The Charter application provides for rapid creation of an electronic medical record from the patient's paper-based chart and investigator's notes. The Multimedia Medical Database application provides a relational database for the organization and management of multimedia data. The Triscreen application provides an integrated data analysis environment with simultaneous, full-motion data display. With recent technological advances in PC power, data acquisition hardware, and software development tools, the clinical researcher now has the ability to collect and examine a more complete perioperative record. It is hoped that the description of the MPR and its development process will assist and encourage others to advance these tools for perioperative research.

  15. Common characteristics of open source software development and applicability for drug discovery: a systematic review.

    PubMed

    Ardal, Christine; Alstadsæter, Annette; Røttingen, John-Arne

    2011-09-28

    Innovation through an open source model has proven to be successful for software development. This success has led many to speculate if open source can be applied to other industries with similar success. We attempt to provide an understanding of open source software development characteristics for researchers, business leaders and government officials who may be interested in utilizing open source innovation in other contexts and with an emphasis on drug discovery. A systematic review was performed by searching relevant, multidisciplinary databases to extract empirical research regarding the common characteristics and barriers of initiating and maintaining an open source software development project. Common characteristics to open source software development pertinent to open source drug discovery were extracted. The characteristics were then grouped into the areas of participant attraction, management of volunteers, control mechanisms, legal framework and physical constraints. Lastly, their applicability to drug discovery was examined. We believe that the open source model is viable for drug discovery, although it is unlikely that it will exactly follow the form used in software development. Hybrids will likely develop that suit the unique characteristics of drug discovery. We suggest potential motivations for organizations to join an open source drug discovery project. We also examine specific differences between software and medicines, specifically how the need for laboratories and physical goods will impact the model as well as the effect of patents.

  16. Federated Access to Heterogeneous Information Resources in the Neuroscience Information Framework (NIF)

    PubMed Central

    Gupta, Amarnath; Bug, William; Marenco, Luis; Qian, Xufei; Condit, Christopher; Rangarajan, Arun; Müller, Hans Michael; Miller, Perry L.; Sanders, Brian; Grethe, Jeffrey S.; Astakhov, Vadim; Shepherd, Gordon; Sternberg, Paul W.; Martone, Maryann E.

    2009-01-01

    The overarching goal of the NIF (Neuroscience Information Framework) project is to be a one-stop-shop for Neuroscience. This paper provides a technical overview of how the system is designed. The technical goal of the first version of the NIF system was to develop an information system that a neuroscientist can use to locate relevant information from a wide variety of information sources by simple keyword queries. Although the user would provide only keywords to retrieve information, the NIF system is designed to treat them as concepts whose meanings are interpreted by the system. Thus, a search for term should find a record containing synonyms of the term. The system is targeted to find information from web pages, publications, databases, web sites built upon databases, XML documents and any other modality in which such information may be published. We have designed a system to achieve this functionality. A central element in the system is an ontology called NIFSTD (for NIF Standard) constructed by amalgamating a number of known and newly developed ontologies. NIFSTD is used by our ontology management module, called OntoQuest to perform ontology-based search over data sources. The NIF architecture currently provides three different mechanisms for searching heterogeneous data sources including relational databases, web sites, XML documents and full text of publications. Version 1.0 of the NIF system is currently in beta test and may be accessed through http://nif.nih.gov. PMID:18958629

  17. Federated access to heterogeneous information resources in the Neuroscience Information Framework (NIF).

    PubMed

    Gupta, Amarnath; Bug, William; Marenco, Luis; Qian, Xufei; Condit, Christopher; Rangarajan, Arun; Müller, Hans Michael; Miller, Perry L; Sanders, Brian; Grethe, Jeffrey S; Astakhov, Vadim; Shepherd, Gordon; Sternberg, Paul W; Martone, Maryann E

    2008-09-01

    The overarching goal of the NIF (Neuroscience Information Framework) project is to be a one-stop-shop for Neuroscience. This paper provides a technical overview of how the system is designed. The technical goal of the first version of the NIF system was to develop an information system that a neuroscientist can use to locate relevant information from a wide variety of information sources by simple keyword queries. Although the user would provide only keywords to retrieve information, the NIF system is designed to treat them as concepts whose meanings are interpreted by the system. Thus, a search for term should find a record containing synonyms of the term. The system is targeted to find information from web pages, publications, databases, web sites built upon databases, XML documents and any other modality in which such information may be published. We have designed a system to achieve this functionality. A central element in the system is an ontology called NIFSTD (for NIF Standard) constructed by amalgamating a number of known and newly developed ontologies. NIFSTD is used by our ontology management module, called OntoQuest to perform ontology-based search over data sources. The NIF architecture currently provides three different mechanisms for searching heterogeneous data sources including relational databases, web sites, XML documents and full text of publications. Version 1.0 of the NIF system is currently in beta test and may be accessed through http://nif.nih.gov.

  18. Canto: an online tool for community literature curation.

    PubMed

    Rutherford, Kim M; Harris, Midori A; Lock, Antonia; Oliver, Stephen G; Wood, Valerie

    2014-06-15

    Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species. Canto code and documentation are available under an Open Source license from http://curation.pombase.org/. Canto is a component of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/). © The Author 2014. Published by Oxford University Press.

  19. ExplorEnz: the primary source of the IUBMB enzyme list

    PubMed Central

    McDonald, Andrew G.; Boyce, Sinéad; Tipton, Keith F.

    2009-01-01

    ExplorEnz is the MySQL database that is used for the curation and dissemination of the International Union of Biochemistry and Molecular Biology (IUBMB) Enzyme Nomenclature. A simple web-based query interface is provided, along with an advanced search engine for more complex Boolean queries. The WWW front-end is accessible at http://www.enzyme-database.org, from where downloads of the database as SQL and XML are also available. An associated form-based curatorial application has been developed to facilitate the curation of enzyme data as well as the internal and public review processes that occur before an enzyme entry is made official. Suggestions for new enzyme entries, or modifications to existing ones, can be made using the forms provided at http://www.enzyme-database.org/forms.php. PMID:18776214

  20. A global building inventory for earthquake loss estimation and risk management

    USGS Publications Warehouse

    Jaiswal, K.; Wald, D.; Porter, K.

    2010-01-01

    We develop a global database of building inventories using taxonomy of global building types for use in near-real-time post-earthquake loss estimation and pre-earthquake risk analysis, for the U.S. Geological Survey's Prompt Assessment of Global Earthquakes for Response (PAGER) program. The database is available for public use, subject to peer review, scrutiny, and open enhancement. On a country-by-country level, it contains estimates of the distribution of building types categorized by material, lateral force resisting system, and occupancy type (residential or nonresidential, urban or rural). The database draws on and harmonizes numerous sources: (1) UN statistics, (2) UN Habitat's demographic and health survey (DHS) database, (3) national housing censuses, (4) the World Housing Encyclopedia and (5) other literature. ?? 2010, Earthquake Engineering Research Institute.

  1. BIRS - Bioterrorism Information Retrieval System.

    PubMed

    Tewari, Ashish Kumar; Rashi; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Jain, Chakresh Kumar

    2013-01-01

    Bioterrorism is the intended use of pathogenic strains of microbes to widen terror in a population. There is a definite need to promote research for development of vaccines, therapeutics and diagnostic methods as a part of preparedness to any bioterror attack in the future. BIRS is an open-access database of collective information on the organisms related to bioterrorism. The architecture of database utilizes the current open-source technology viz PHP ver 5.3.19, MySQL and IIS server under windows platform for database designing. Database stores information on literature, generic- information and unique pathways of about 10 microorganisms involved in bioterrorism. This may serve as a collective repository to accelerate the drug discovery and vaccines designing process against such bioterrorist agents (microbes). The available data has been validated from various online resources and literature mining in order to provide the user with a comprehensive information system. The database is freely available at http://www.bioterrorism.biowaves.org.

  2. Method of identification of patent trends based on descriptions of technical functions

    NASA Astrophysics Data System (ADS)

    Korobkin, D. M.; Fomenkov, S. A.; Golovanchikov, A. B.

    2018-05-01

    The use of the global patent space to determine the scientific and technological priorities for the technical systems development (identifying patent trends) allows one to forecast the direction of the technical systems development and, accordingly, select patents of priority technical subjects as a source for updating the technical functions database and physical effects database. The authors propose an original method that uses as trend terms not individual unigrams or n-gram (usually for existing methods and systems), but structured descriptions of technical functions in the form “Subject-Action-Object” (SAO), which in the authors’ opinion are the basis of the invention.

  3. The Tropical Biominer Project: mining old sources for new drugs.

    PubMed

    Artiguenave, François; Lins, André; Maciel, Wesley Dias; Junior, Antonio Celso Caldeira; Nacif-Coelho, Carla; de Souza Linhares, Maria Margarida Ribeiro; de Oliveira, Guilherme Correa; Barbosa, Luis Humberto Rezende; Lopes, Júlio César Dias; Junior, Claudionor Nunes Coelho

    2005-01-01

    The Tropical Biominer Project is a recent initiative from the Federal University of Minas Gerais (UFMG) and the Oswaldo Cruz foundation, with the participation of the Biominas Foundation (Belo Horizonte, Minas Gerais, Brazil) and the start-up Homologix. The main objective of the project is to build a new resource for the chemogenomics research, on chemical compounds, with a strong emphasis on natural molecules. Adopted technologies include the search of information from structured, semi-structured, and non-structured documents (the last two from the web) and datamining tools in order to gather information from different sources. The database is the support for developing applications to find new potential treatments for parasitic infections by using virtual screening tools. We present here the midpoint of the project: the conception and implementation of the Tropical Biominer Database. This is a Federated Database designed to store data from different resources. Connected to the database, a web crawler is able to gather information from distinct, patented web sites and store them after automatic classification using datamining tools. Finally, we demonstrate the interest of the approach, by formulating new hypotheses on specific targets of a natural compound, violacein, using inferences from a Virtual Screening procedure.

  4. Magnetic Fields for All: The GPIPS Community Web-Access Portal

    NASA Astrophysics Data System (ADS)

    Carveth, Carol; Clemens, D. P.; Pinnick, A.; Pavel, M.; Jameson, K.; Taylor, B.

    2007-12-01

    The new GPIPS website portal provides community users with an intuitive and powerful interface to query the data products of the Galactic Plane Infrared Polarization Survey. The website, which was built using PHP for the front end and MySQL for the database back end, allows users to issue queries based on galactic or equatorial coordinates, GPIPS-specific identifiers, polarization information, magnitude information, and several other attributes. The returns are presented in HTML tables, with the added option of either downloading or being emailed an ASCII file including the same or more information from the database. Other functionalities of the website include providing details of the status of the Survey (which fields have been observed or are planned to be observed), techniques involved in data collection and analysis, and descriptions of the database contents and names. For this initial launch of the website, users may access the GPIPS polarization point source catalog and the deep coadd photometric point source catalog. Future planned developments include a graphics-based method for querying the database, as well as tools to combine neighboring GPIPS images into larger image files for both polarimetry and photometry. This work is partially supported by NSF grant AST-0607500.

  5. VitisExpDB: a database resource for grape functional genomics.

    PubMed

    Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

    2008-02-28

    The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores approximately 320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of approximately 20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website http://cropdisease.ars.usda.gov/vitis_at/main-page.htm.

  6. VitisExpDB: A database resource for grape functional genomics

    PubMed Central

    Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

    2008-01-01

    Background The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. Description VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of ~20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. Conclusion The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website . PMID:18307813

  7. Review and Comparison of the Search Effectiveness and User Interface of Three Major Online Chemical Databases

    ERIC Educational Resources Information Center

    Bharti, Neelam; Leonard, Michelle; Singh, Shailendra

    2016-01-01

    Online chemical databases are the largest source of chemical information and, therefore, the main resource for retrieving results from published journals, books, patents, conference abstracts, and other relevant sources. Various commercial, as well as free, chemical databases are available. SciFinder, Reaxys, and Web of Science are three major…

  8. An image database management system for conducting CAD research

    NASA Astrophysics Data System (ADS)

    Gruszauskas, Nicholas; Drukker, Karen; Giger, Maryellen L.

    2007-03-01

    The development of image databases for CAD research is not a trivial task. The collection and management of images and their related metadata from multiple sources is a time-consuming but necessary process. By standardizing and centralizing the methods in which these data are maintained, one can generate subsets of a larger database that match the specific criteria needed for a particular research project in a quick and efficient manner. A research-oriented management system of this type is highly desirable in a multi-modality CAD research environment. An online, webbased database system for the storage and management of research-specific medical image metadata was designed for use with four modalities of breast imaging: screen-film mammography, full-field digital mammography, breast ultrasound and breast MRI. The system was designed to consolidate data from multiple clinical sources and provide the user with the ability to anonymize the data. Input concerning the type of data to be stored as well as desired searchable parameters was solicited from researchers in each modality. The backbone of the database was created using MySQL. A robust and easy-to-use interface for entering, removing, modifying and searching information in the database was created using HTML and PHP. This standardized system can be accessed using any modern web-browsing software and is fundamental for our various research projects on computer-aided detection, diagnosis, cancer risk assessment, multimodality lesion assessment, and prognosis. Our CAD database system stores large amounts of research-related metadata and successfully generates subsets of cases that match the user's desired search criteria.

  9. Konnichi Wa, Nihon (Hello, Japan!): Best Databases for Business, Technology and News.

    ERIC Educational Resources Information Center

    Hoetker, Glenn

    1994-01-01

    Describes online information sources for Japanese business, scientific, and technical developments. Highlights include English language materials versus the need for translation from Japanese; government research; scientific and technical information; patent information; corporate financial information; business information from newswires and…

  10. EVALUATION OF PUBLIC DATABASES AS SOURCES OF DATA FOR LIFE CYCLE ASSESSMENTS

    EPA Science Inventory

    Methods to determine the environmental effects of production systems must encourage a comprehensive evaluation of all "upstream" and "downstream" effects and their interrelationships. This cradle-to-grave approach, called Life Cycle Assessment (LCA), has led to the development...

  11. WASHINGTON DAIRIES

    EPA Science Inventory

    The dairy_wa.zip file is a zip file containing an Arc/Info export file and a text document. Note the DISCLAIM.TXT file as these data are not verified. Map extent: statewide. Input Source: Address database obtained from Wa Dept of Agriculture. Data was originally developed und...

  12. JEnsembl: a version-aware Java API to Ensembl data systems

    PubMed Central

    Paterson, Trevor; Law, Andy

    2012-01-01

    Motivation: The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. Results: The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing ‘through time’ comparative analyses to be performed. Availability: Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net). Contact: jensembl-develop@lists.sf.net, andy.law@roslin.ed.ac.uk, trevor.paterson@roslin.ed.ac.uk PMID:22945789

  13. A database of natural products and chemical entities from marine habitat

    PubMed Central

    Babu, Padavala Ajay; Puppala, Suma Sree; Aswini, Satyavarapu Lakshmi; Vani, Metta Ramya; Kumar, Chinta Narasimha; Prasanna, Tallapragada

    2008-01-01

    Marine compound database consists of marine natural products and chemical entities, collected from various literature sources, which are known to possess bioactivity against human diseases. The database is constructed using html code. The 12 categories of 182 compounds are provided with the source, compound name, 2-dimensional structure, bioactivity and clinical trial information. The database is freely available online and can be accessed at http://www.progenebio.in/mcdb/index.htm PMID:19238254

  14. Facilitating Science Discoveries from NED Today and in the 2020s

    NASA Astrophysics Data System (ADS)

    Mazzarella, Joseph M.; NED Team

    2018-06-01

    I will review recent developments, work in progress, and major challenges that lie ahead as we enhance the capabilities of the NASA/IPAC Extragalactic Database (NED) to facilitate and accelerate multi-wavelength research on objects beyond our Milky Way galaxy. The recent fusion of data for over 470 million sources from the 2MASS Point Source Catalog and approximately 750 million sources from the AllWISE Source Catalog (next up) with redshifts from the SDSS and other data in NED is increasing the holdings to over a billion distinct objects with cross-identifications, providing a rich resource for multi-wavelength research. Combining data across such large surveys, as well as integrating data from over 110,000 smaller but scientifically important catalogs and journal articles, presents many challanges including the need to update the computing infrastructure and re-tool production and operations on a regular basis. Integration of the Firefly toolkit into the new user interface is ushering in a new phase of interative data visualization in NED, with features and capabilities familiar to users of IRSA and the emerging LSST science user interface. Graphical characterizations of NED content and estimates of completeness in different sky and spectral regions are also being developed. A newly implemented service that follows the Table Access Protocol (TAP) enables astronomers to issue queries to the NED object directory using Astronomical Data Language (ADQL), a standard shared in common with the NASA mission archives and other virtual observatories around the world. A brief review will be given of new science capabilities under development and planned for 2019-2020, as well as initiatives underway involving deployment of a parallel database, cloud technologies, machine learning, and first steps in bringing analysis capabilities close to the database in collaboration with IRSA. I will close with some questions for the community to consider in helping us plan future science capabilities and directions for NED in the 2020s.

  15. The Cardiac Safety Research Consortium ECG database.

    PubMed

    Kligfield, Paul; Green, Cynthia L

    2012-01-01

    The Cardiac Safety Research Consortium (CSRC) ECG database was initiated to foster research using anonymized, XML-formatted, digitized ECGs with corresponding descriptive variables from placebo- and positive-control arms of thorough QT studies submitted to the US Food and Drug Administration (FDA) by pharmaceutical sponsors. The database can be expanded to other data that are submitted directly to CSRC from other sources, and currently includes digitized ECGs from patients with genotyped varieties of congenital long-QT syndrome; this congenital long-QT database is also linked to ambulatory electrocardiograms stored in the Telemetric and Holter ECG Warehouse (THEW). Thorough QT data sets are available from CSRC for unblinded development of algorithms for analysis of repolarization and for blinded comparative testing of algorithms developed for the identification of moxifloxacin, as used as a positive control in thorough QT studies. Policies and procedures for access to these data sets are available from CSRC, which has developed tools for statistical analysis of blinded new algorithm performance. A recently approved CSRC project will create a data set for blinded analysis of automated ECG interval measurements, whose initial focus will include comparison of four of the major manufacturers of automated electrocardiographs in the United States. CSRC welcomes application for use of the ECG database for clinical investigation. Copyright © 2012 Elsevier Inc. All rights reserved.

  16. Evaluating Land-Atmosphere Interactions with the North American Soil Moisture Database

    NASA Astrophysics Data System (ADS)

    Giles, S. M.; Quiring, S. M.; Ford, T.; Chavez, N.; Galvan, J.

    2015-12-01

    The North American Soil Moisture Database (NASMD) is a high-quality observational soil moisture database that was developed to study land-atmosphere interactions. It includes over 1,800 monitoring stations the United States, Canada and Mexico. Soil moisture data are collected from multiple sources, quality controlled and integrated into an online database (soilmoisture.tamu.edu). The period of record varies substantially and only a few of these stations have an observation record extending back into the 1990s. Daily soil moisture observations have been quality controlled using the North American Soil Moisture Database QAQC algorithm. The database is designed to facilitate observationally-driven investigations of land-atmosphere interactions, validation of the accuracy of soil moisture simulations in global land surface models, satellite calibration/validation for SMOS and SMAP, and an improved understanding of how soil moisture influences climate on seasonal to interannual timescales. This paper provides some examples of how the NASMD has been utilized to enhance understanding of land-atmosphere interactions in the U.S. Great Plains.

  17. Human grasping database for activities of daily living with depth, color and kinematic data streams.

    PubMed

    Saudabayev, Artur; Rysbek, Zhanibek; Khassenova, Raykhan; Varol, Huseyin Atakan

    2018-05-29

    This paper presents a grasping database collected from multiple human subjects for activities of daily living in unstructured environments. The main strength of this database is the use of three different sensing modalities: color images from a head-mounted action camera, distance data from a depth sensor on the dominant arm and upper body kinematic data acquired from an inertial motion capture suit. 3826 grasps were identified in the data collected during 9-hours of experiments. The grasps were grouped according to a hierarchical taxonomy into 35 different grasp types. The database contains information related to each grasp and associated sensor data acquired from the three sensor modalities. We also provide our data annotation software written in Matlab as an open-source tool. The size of the database is 172 GB. We believe this database can be used as a stepping stone to develop big data and machine learning techniques for grasping and manipulation with potential applications in rehabilitation robotics and intelligent automation.

  18. Comparison of Online Agricultural Information Services.

    ERIC Educational Resources Information Center

    Reneau, Fred; Patterson, Richard

    1984-01-01

    Outlines major online agricultural information services--agricultural databases, databases with agricultural services, educational databases in agriculture--noting services provided, access to the database, and costs. Benefits of online agricultural database sources (availability of agricultural marketing, weather, commodity prices, management…

  19. Optical Characteristics of Astrometric Radio Sources OCARS

    NASA Astrophysics Data System (ADS)

    Malkin, Z.

    2013-04-01

    In this paper, the current status of the catalog of Optical Characteristics of Astrometric Radio Sources OCARS is presented. The catalog includes radio sources observed in various astrometric and geodetic VLBI programs in 1979-2012. For these sources the physical object type, redshift and visual or infrared magnitude is given when available. Detailed comments are provided when some problems with published data were encountered. Since the first version created in December 2007, the catalog is continuously developed and expanded in respect to inclusion of new radio sources and addition of new or correction of old astrophysical data. Several sources of information are used for OCARS. The main of them are the NASA/IPAC Extragalactic Database (NED) and SIMBAD astronomical databases. Besides several astronomical journals and arXiv depository are regularly monitored, so that new data is included in OCARS just after publication. The redshift for about 150 sources have been determined from dedicated optical spectroscopic observations. As of October 2012, OCARS catalog includes 7173 radio sources. 3898 sources have known redshift, and 4860 sources have known magnitude. In 2009, it was used as a supplement material to the ICRF2. The list of radio sources with a good observational history but lacking astrophysical information is provide for planning of optical observations of the most important astrometric sources. The OCARS catalog is updated, in average every several weeks and is available at http://www.gao.spb.ru/english/as/ac_vlbi/ocars.txt.

  20. Molecule database framework: a framework for creating database applications with chemical structure search capability

    PubMed Central

    2013-01-01

    Background Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Results Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes: • Support for multi-component compounds (mixtures) • Import and export of SD-files • Optional security (authorization) For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. Conclusions By using a simple web application it was shown that Molecule Database Framework successfully abstracts chemical structure searches and SD-File import and export to simple method calls. The framework offers good search performance on a standard laptop without any database tuning. This is also due to the fact that chemical structure searches are paged and cached. Molecule Database Framework is available for download on the projects web page on bitbucket: https://bitbucket.org/kienerj/moleculedatabaseframework. PMID:24325762

  1. Molecule database framework: a framework for creating database applications with chemical structure search capability.

    PubMed

    Kiener, Joos

    2013-12-11

    Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes:•Support for multi-component compounds (mixtures)•Import and export of SD-files•Optional security (authorization)For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures).Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. By using a simple web application it was shown that Molecule Database Framework successfully abstracts chemical structure searches and SD-File import and export to simple method calls. The framework offers good search performance on a standard laptop without any database tuning. This is also due to the fact that chemical structure searches are paged and cached. Molecule Database Framework is available for download on the projects web page on bitbucket: https://bitbucket.org/kienerj/moleculedatabaseframework.

  2. Utilization of open source electronic health record around the world: A systematic review.

    PubMed

    Aminpour, Farzaneh; Sadoughi, Farahnaz; Ahamdi, Maryam

    2014-01-01

    Many projects on developing Electronic Health Record (EHR) systems have been carried out in many countries. The current study was conducted to review the published data on the utilization of open source EHR systems in different countries all over the world. Using free text and keyword search techniques, six bibliographic databases were searched for related articles. The identified papers were screened and reviewed during a string of stages for the irrelevancy and validity. The findings showed that open source EHRs have been wildly used by source limited regions in all continents, especially in Sub-Saharan Africa and South America. It would create opportunities to improve national healthcare level especially in developing countries with minimal financial resources. Open source technology is a solution to overcome the problems of high-costs and inflexibility associated with the proprietary health information systems.

  3. Freshwater Biological Traits Database (Traits)

    EPA Pesticide Factsheets

    The traits database was compiled for a project on climate change effects on river and stream ecosystems. The traits data, gathered from multiple sources, focused on information published or otherwise well-documented by trustworthy sources.

  4. The 3XMM spectral fit database

    NASA Astrophysics Data System (ADS)

    Georgantopoulos, I.; Corral, A.; Watson, M.; Carrera, F.; Webb, N.; Rosen, S.

    2016-06-01

    I will present the XMMFITCAT database which is a spectral fit inventory of the sources in the 3XMM catalogue. Spectra are available by the XMM/SSC for all 3XMM sources which have more than 50 background subtracted counts per module. This work is funded in the framework of the ESA Prodex project. The 3XMM catalog currently covers 877 sq. degrees and contains about 400,000 unique sources. Spectra are available for over 120,000 sources. Spectral fist have been performed with various spectral models. The results are available in the web page http://xraygroup.astro.noa.gr/ and also at the University of Leicester LEDAS database webpage ledas-www.star.le.ac.uk/. The database description as well as some science results in the joint area with SDSS are presented in two recent papers: Corral et al. 2015, A&A, 576, 61 and Corral et al. 2014, A&A, 569, 71. At least for extragalactic sources, the spectral fits will acquire added value when photometric redshifts become available. In the framework of a new Prodex project we have been funded to derive photometric redshifts for the 3XMM sources using machine learning techniques. I will present the techniques as well as the optical near-IR databases that will be used.

  5. Preliminary geologic map of the Piru 7.5' quadrangle, southern California: a digital database

    USGS Publications Warehouse

    Yerkes, R.F.; Campbell, Russell H.

    1995-01-01

    This Open-File report is a digital geologic map database. This pamphlet serves to introduce and describe the digital data. There is no paper map included in the Open-File report. This digital map database is compiled from previously published sources combined with some new mapping and modifications in nomenclature. The geologic map database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U. S. Geological Survey. For detailed descriptions of the units, their stratigraphic relations and sources of geologic mapping consult Yerkes and Campbell (1995). More specific information about the units may be available in the original sources.

  6. Using CLIPS in a distributed system: The Network Control Center (NCC) expert system

    NASA Technical Reports Server (NTRS)

    Wannemacher, Tom

    1990-01-01

    This paper describes an intelligent troubleshooting system for the Help Desk domain. It was developed on an IBM-compatible 80286 PC using Microsoft C and CLIPS and an AT&T 3B2 minicomputer using the UNIFY database and a combination of shell script, C programs and SQL queries. The two computers are linked by a lan. The functions of this system are to help non-technical NCC personnel handle trouble calls, to keep a log of problem calls with complete, concise information, and to keep a historical database of problems. The database helps identify hardware and software problem areas and provides a source of new rules for the troubleshooting knowledge base.

  7. A multidisciplinary database for global distribution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wolfe, P.J.

    The issue of selenium toxicity in the environment has been documented in the scientific literature for over 50 years. Recent studies reveal a complex connection between selenium and human and animal populations. This article introduces a bibliographic citation database on selenium in the environment developed for global distribution via the Internet by the University of Wyoming Libraries. The database incorporates material from commercial sources, print abstracts, indexes, and U.S. government literature, resulting in a multidisciplinary resource. Relevant disciplines include, biology, medicine, veterinary science, botany, chemistry, geology, pollution, aquatic sciences, ecology, and others. It covers the years 1985-1996 for most subjectmore » material, with additional years being added as resources permit.« less

  8. Source attribution using FLEXPART and carbon monoxide emission inventories for the IAGOS In-situ Observation database

    NASA Astrophysics Data System (ADS)

    Fontaine, Alain; Sauvage, Bastien; Pétetin, Hervé; Auby, Antoine; Boulanger, Damien; Thouret, Valerie

    2016-04-01

    Since 1994, the IAGOS program (In-Service Aircraft for a Global Observing System http://www.iagos.org) and its predecessor MOZAIC has produced in-situ measurements of the atmospheric composition during more than 46000 commercial aircraft flights. In order to help analyzing these observations and further understanding the processes driving their evolution, we developed a modelling tool SOFT-IO quantifying their source/receptor link. We improved the methodology used by Stohl et al. (2003), based on the FLEXPART plume dispersion model, to simulate the contributions of anthropogenic and biomass burning emissions from the ECCAD database (http://eccad.aeris-data.fr) to the measured carbon monoxide mixing ratio along each IAGOS flight. Thanks to automated processes, contributions are simulated for the last 20 days before observation, separating individual contributions from the different source regions. The main goal is to supply add-value products to the IAGOS database showing pollutants geographical origin and emission type. Using this information, it may be possible to link trends in the atmospheric composition to changes in the transport pathways and to the evolution of emissions. This tool could be used for statistical validation as well as for inter-comparisons of emission inventories using large amounts of data, as Lagrangian models are able to bring the global scale emissions down to a smaller scale, where they can be directly compared to the in-situ observations from the IAGOS database.

  9. DianaHealth.com, an On-Line Database Containing Appraisals of the Clinical Value and Appropriateness of Healthcare Interventions: Database Development and Retrospective Analysis.

    PubMed

    Bonfill, Xavier; Osorio, Dimelza; Solà, Ivan; Pijoan, Jose Ignacio; Balasso, Valentina; Quintana, Maria Jesús; Puig, Teresa; Bolibar, Ignasi; Urrútia, Gerard; Zamora, Javier; Emparanza, José Ignacio; Gómez de la Cámara, Agustín; Ferreira-González, Ignacio

    2016-01-01

    To describe the development of a novel on-line database aimed to serve as a source of information concerning healthcare interventions appraised for their clinical value and appropriateness by several initiatives worldwide, and to present a retrospective analysis of the appraisals already included in the database. Database development and a retrospective analysis. The database DianaHealth.com is already on-line and it is regularly updated, independent, open access and available in English and Spanish. Initiatives are identified in medical news, in article references, and by contacting experts in the field. We include appraisals in the form of clinical recommendations, expert analyses, conclusions from systematic reviews, and original research that label any health care intervention as low-value or inappropriate. We obtain the information necessary to classify the appraisals according to type of intervention, specialties involved, publication year, authoring initiative, and key words. The database is accessible through a search engine which retrieves a list of appraisals and a link to the website where they were published. DianaHealth.com also provides a brief description of the initiatives and a section where users can report new appraisals or suggest new initiatives. From January 2014 to July 2015, the on-line database included 2940 appraisals from 22 initiatives: eleven campaigns gathering clinical recommendations from scientific societies, five sets of conclusions from literature review, three sets of recommendations from guidelines, two collections of articles on low clinical value in medical journals, and an initiative of our own. We have developed an open access on-line database of appraisals about healthcare interventions considered of low clinical value or inappropriate. DianaHealth.com could help physicians and other stakeholders make better decisions concerning patient care and healthcare systems sustainability. Future efforts should be focused on assessing the impact of these appraisals in the clinical practice.

  10. DianaHealth.com, an On-Line Database Containing Appraisals of the Clinical Value and Appropriateness of Healthcare Interventions: Database Development and Retrospective Analysis

    PubMed Central

    Bonfill, Xavier; Osorio, Dimelza; Solà, Ivan; Pijoan, Jose Ignacio; Balasso, Valentina; Quintana, Maria Jesús; Puig, Teresa; Bolibar, Ignasi; Urrútia, Gerard; Zamora, Javier; Emparanza, José Ignacio; Gómez de la Cámara, Agustín; Ferreira-González, Ignacio

    2016-01-01

    Objective To describe the development of a novel on-line database aimed to serve as a source of information concerning healthcare interventions appraised for their clinical value and appropriateness by several initiatives worldwide, and to present a retrospective analysis of the appraisals already included in the database. Methods and Findings Database development and a retrospective analysis. The database DianaHealth.com is already on-line and it is regularly updated, independent, open access and available in English and Spanish. Initiatives are identified in medical news, in article references, and by contacting experts in the field. We include appraisals in the form of clinical recommendations, expert analyses, conclusions from systematic reviews, and original research that label any health care intervention as low-value or inappropriate. We obtain the information necessary to classify the appraisals according to type of intervention, specialties involved, publication year, authoring initiative, and key words. The database is accessible through a search engine which retrieves a list of appraisals and a link to the website where they were published. DianaHealth.com also provides a brief description of the initiatives and a section where users can report new appraisals or suggest new initiatives. From January 2014 to July 2015, the on-line database included 2940 appraisals from 22 initiatives: eleven campaigns gathering clinical recommendations from scientific societies, five sets of conclusions from literature review, three sets of recommendations from guidelines, two collections of articles on low clinical value in medical journals, and an initiative of our own. Conclusions We have developed an open access on-line database of appraisals about healthcare interventions considered of low clinical value or inappropriate. DianaHealth.com could help physicians and other stakeholders make better decisions concerning patient care and healthcare systems sustainability. Future efforts should be focused on assessing the impact of these appraisals in the clinical practice. PMID:26840451

  11. The XSD-Builder Specification Language—Toward a Semantic View of XML Schema Definition

    NASA Astrophysics Data System (ADS)

    Fong, Joseph; Cheung, San Kuen

    In the present database market, XML database model is a main structure for the forthcoming database system in the Internet environment. As a conceptual schema of XML database, XML Model has its limitation on presenting its data semantics. System analyst has no toolset for modeling and analyzing XML system. We apply XML Tree Model (shown in Figure 2) as a conceptual schema of XML database to model and analyze the structure of an XML database. It is important not only for visualizing, specifying, and documenting structural models, but also for constructing executable systems. The tree model represents inter-relationship among elements inside different logical schema such as XML Schema Definition (XSD), DTD, Schematron, XDR, SOX, and DSD (shown in Figure 1, an explanation of the terms in the figure are shown in Table 1). The XSD-Builder consists of XML Tree Model, source language, translator, and XSD. The source language is called XSD-Source which is mainly for providing an environment with concept of user friendliness while writing an XSD. The source language will consequently be translated by XSD-Translator. Output of XSD-Translator is an XSD which is our target and is called as an object language.

  12. EPA’s SPECIATE 4.4 Database - Development and Uses

    EPA Science Inventory

    SPECIATE is the EPA's repository of TOG, PM, and Other Gases speciation profiles of air pollution sources. It includes weight fractions of both organic species and PM and provides data in consistent units. Species include metals, ions, elements, and organic and inorganic compound...

  13. PRELIMINARY DATABASE DEVELOPMENT IN TARGET DIETARY SAMPLES AND COMPOSITE DIET SAMPLES

    EPA Science Inventory

    Food may surpass drinking water as the major source of ingestion of total elemental arsenic for the general population. For this reason, accurate assessments of inorganic arsenic intake via food are needed to provide estimates for dietary exposure within future epidemiology stud...

  14. Inconsistencies in the red blood cell membrane proteome analysis: generation of a database for research and diagnostic applications

    PubMed Central

    Hegedűs, Tamás; Chaubey, Pururawa Mayank; Várady, György; Szabó, Edit; Sarankó, Hajnalka; Hofstetter, Lia; Roschitzki, Bernd; Sarkadi, Balázs

    2015-01-01

    Based on recent results, the determination of the easily accessible red blood cell (RBC) membrane proteins may provide new diagnostic possibilities for assessing mutations, polymorphisms or regulatory alterations in diseases. However, the analysis of the current mass spectrometry-based proteomics datasets and other major databases indicates inconsistencies—the results show large scattering and only a limited overlap for the identified RBC membrane proteins. Here, we applied membrane-specific proteomics studies in human RBC, compared these results with the data in the literature, and generated a comprehensive and expandable database using all available data sources. The integrated web database now refers to proteomic, genetic and medical databases as well, and contains an unexpected large number of validated membrane proteins previously thought to be specific for other tissues and/or related to major human diseases. Since the determination of protein expression in RBC provides a method to indicate pathological alterations, our database should facilitate the development of RBC membrane biomarker platforms and provide a unique resource to aid related further research and diagnostics. Database URL: http://rbcc.hegelab.org PMID:26078478

  15. TOPDOM: database of conservatively located domains and motifs in proteins.

    PubMed

    Varga, Julia; Dobson, László; Tusnády, Gábor E

    2016-09-01

    The TOPDOM database-originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins-has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. TOPDOM database is available at http://topdom.enzim.hu The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. tusnady.gabor@ttk.mta.hu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  16. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions.

    PubMed

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases. © The Author(s) 2016. Published by Oxford University Press.

  17. Assessment and application of national environmental databases and mapping tools at the local level to two community case studies.

    PubMed

    Hammond, Davyda; Conlon, Kathryn; Barzyk, Timothy; Chahine, Teresa; Zartarian, Valerie; Schultz, Brad

    2011-03-01

    Communities are concerned over pollution levels and seek methods to systematically identify and prioritize the environmental stressors in their communities. Geographic information system (GIS) maps of environmental information can be useful tools for communities in their assessment of environmental-pollution-related risks. Databases and mapping tools that supply community-level estimates of ambient concentrations of hazardous pollutants, risk, and potential health impacts can provide relevant information for communities to understand, identify, and prioritize potential exposures and risk from multiple sources. An assessment of existing databases and mapping tools was conducted as part of this study to explore the utility of publicly available databases, and three of these databases were selected for use in a community-level GIS mapping application. Queried data from the U.S. EPA's National-Scale Air Toxics Assessment, Air Quality System, and National Emissions Inventory were mapped at the appropriate spatial and temporal resolutions for identifying risks of exposure to air pollutants in two communities. The maps combine monitored and model-simulated pollutant and health risk estimates, along with local survey results, to assist communities with the identification of potential exposure sources and pollution hot spots. Findings from this case study analysis will provide information to advance the development of new tools to assist communities with environmental risk assessments and hazard prioritization. © 2010 Society for Risk Analysis.

  18. Developing a Cyberinfrastructure for integrated assessments of environmental contaminants.

    PubMed

    Kaur, Taranjit; Singh, Jatinder; Goodale, Wing M; Kramar, David; Nelson, Peter

    2005-03-01

    The objective of this study was to design and implement prototype software for capturing field data and automating the process for reporting and analyzing the distribution of mercury. The four phase process used to design, develop, deploy and evaluate the prototype software is described. Two different development strategies were used: (1) design of a mobile data collection application intended to capture field data in a meaningful format and automate transfer into user databases, followed by (2) a re-engineering of the original software to develop an integrated database environment with improved methods for aggregating and sharing data. Results demonstrated that innovative use of commercially available hardware and software components can lead to the development of an end-to-end digital cyberinfrastructure that captures, records, stores, transmits, compiles and integrates multi-source data as it relates to mercury.

  19. A personal digital assistant application (MobilDent) for dental fieldwork data collection, information management and database handling.

    PubMed

    Forsell, M; Häggström, M; Johansson, O; Sjögren, P

    2008-11-08

    To develop a personal digital assistant (PDA) application for oral health assessment fieldwork, including back-office and database systems (MobilDent). System design, construction and implementation of PDA, back-office and database systems. System requirements for MobilDent were collected, analysed and translated into system functions. User interfaces were implemented and system architecture was outlined. MobilDent was based on a platform with. NET (Microsoft) components, using an SQL Server 2005 (Microsoft) for data storage with Windows Mobile (Microsoft) operating system. The PDA devices were Dell Axim. System functions and user interfaces were specified for MobilDent. User interfaces for PDA, back-office and database systems were based on. NET programming. The PDA user interface was based on Windows suitable to a PDA display, whereas the back-office interface was designed for a normal-sized computer screen. A synchronisation module (MS Active Sync, Microsoft) was used to enable download of field data from PDA to the database. MobilDent is a feasible application for oral health assessment fieldwork, and the oral health assessment database may prove a valuable source for care planning, educational and research purposes. Further development of the MobilDent system will include wireless connectivity with download-on-demand technology.

  20. Web application for detailed real-time database transaction monitoring for CMS condition data

    NASA Astrophysics Data System (ADS)

    de Gruttola, Michele; Di Guida, Salvatore; Innocente, Vincenzo; Pierro, Antonio

    2012-12-01

    In the upcoming LHC era, database have become an essential part for the experiments collecting data from LHC, in order to safely store, and consistently retrieve, a wide amount of data, which are produced by different sources. In the CMS experiment at CERN, all this information is stored in ORACLE databases, allocated in several servers, both inside and outside the CERN network. In this scenario, the task of monitoring different databases is a crucial database administration issue, since different information may be required depending on different users' tasks such as data transfer, inspection, planning and security issues. We present here a web application based on Python web framework and Python modules for data mining purposes. To customize the GUI we record traces of user interactions that are used to build use case models. In addition the application detects errors in database transactions (for example identify any mistake made by user, application failure, unexpected network shutdown or Structured Query Language (SQL) statement error) and provides warning messages from the different users' perspectives. Finally, in order to fullfill the requirements of the CMS experiment community, and to meet the new development in many Web client tools, our application was further developed, and new features were deployed.

  1. Development of an effective dose coefficient database using a computational human phantom and Monte Carlo simulations to evaluate exposure dose for the usage of NORM-added consumer products.

    PubMed

    Yoo, Do Hyeon; Shin, Wook-Geun; Lee, Jaekook; Yeom, Yeon Soo; Kim, Chan Hyeong; Chang, Byung-Uck; Min, Chul Hee

    2017-11-01

    After the Fukushima accident in Japan, the Korean Government implemented the "Act on Protective Action Guidelines Against Radiation in the Natural Environment" to regulate unnecessary radiation exposure to the public. However, despite the law which came into effect in July 2012, an appropriate method to evaluate the equivalent and effective doses from naturally occurring radioactive material (NORM) in consumer products is not available. The aim of the present study is to develop and validate an effective dose coefficient database enabling the simple and correct evaluation of the effective dose due to the usage of NORM-added consumer products. To construct the database, we used a skin source method with a computational human phantom and Monte Carlo (MC) simulation. For the validation, the effective dose was compared between the database using interpolation method and the original MC method. Our result showed a similar equivalent dose across the 26 organs and a corresponding average dose between the database and the MC calculations of < 5% difference. The differences in the effective doses were even less, and the result generally show that equivalent and effective doses can be quickly calculated with the database with sufficient accuracy. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Building a structured monitoring and evaluating system of postmarketing drug use in Shanghai.

    PubMed

    Du, Wenmin; Levine, Mitchell; Wang, Longxing; Zhang, Yaohua; Yi, Chengdong; Wang, Hongmin; Wang, Xiaoyu; Xie, Hongjuan; Xu, Jianglong; Jin, Huilin; Wang, Tongchun; Huang, Gan; Wu, Ye

    2007-01-01

    In order to understand a drug's full profile in the post-marketing environment, information is needed regarding utilization patterns, beneficial effects, ADRs and economic value. China, the most populated country in the world, has the largest number of people who are taking medications. To begin to appreciate the impact of these medications, a multifunctional evaluation and surveillance system was developed, the Shanghai Drug Monitoring and Evaluative System (SDMES). Set up by the Shanghai Center for Adverse Drug Reaction Monitoring in 2001, the SDMES contains three databases: a population health data base of middle aged and elderly persons; hospital patient medical records; and a spontaneous ADR reporting database. Each person has a unique identification and Medicare number, which permits record-linkage within and between these three databases. After more than three years in development, the population health database has comprehensive data for more than 320,000 residents. The hospital database has two years of inpatient medical records from five major hospitals, and will be increasing to 10 hospitals in 2007. The spontaneous reporting ADR database has collected 20,205 cases since 2001 from approximately 295 sources, including hospitals, pharmaceutical companies, drug wholesalers and pharmacies. The SDMES has the potential to become an important national and international pharmacoepidemiology resource for drug evaluation.

  3. Ensembl 2002: accommodating comparative genomics.

    PubMed

    Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E

    2003-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.

  4. Specialist Bibliographic Databases

    PubMed Central

    2016-01-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  5. Specialist Bibliographic Databases.

    PubMed

    Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D

    2016-05-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls.

  6. Development of a pseudo/anonymised primary care research database: Proof-of-concept study.

    PubMed

    MacRury, Sandra; Finlayson, Jim; Hussey-Wilson, Susan; Holden, Samantha

    2016-06-01

    General practice records present a comprehensive source of data that could form a variety of anonymised or pseudonymised research databases to aid identification of potential research participants regardless of location. A proof-of-concept study was undertaken to extract data from general practice systems in 15 practices across the region to form pseudo and anonymised research data sets. Two feasibility studies and a disease surveillance study compared numbers of potential study participants and accuracy of disease prevalence, respectively. There was a marked reduction in screening time and increase in numbers of potential study participants identified with the research repository compared with conventional methods. Accurate disease prevalence was established and enhanced with the addition of selective text mining. This study confirms the potential for development of national anonymised research database from general practice records in addition to improving data collection for local or national audits and epidemiological projects. © The Author(s) 2014.

  7. Structure and software tools of AIDA.

    PubMed

    Duisterhout, J S; Franken, B; Witte, F

    1987-01-01

    AIDA consists of a set of software tools to allow for fast development and easy-to-maintain Medical Information Systems. AIDA supports all aspects of such a system both during development and operation. It contains tools to build and maintain forms for interactive data entry and on-line input validation, a database management system including a data dictionary and a set of run-time routines for database access, and routines for querying the database and output formatting. Unlike an application generator, the user of AIDA may select parts of the tools to fulfill his needs and program other subsystems not developed with AIDA. The AIDA software uses as host language the ANSI-standard programming language MUMPS, an interpreted language embedded in an integrated database and programming environment. This greatly facilitates the portability of AIDA applications. The database facilities supported by AIDA are based on a relational data model. This data model is built on top of the MUMPS database, the so-called global structure. This relational model overcomes the restrictions of the global structure regarding string length. The global structure is especially powerful for sorting purposes. Using MUMPS as a host language allows the user an easy interface between user-defined data validation checks or other user-defined code and the AIDA tools. AIDA has been designed primarily for prototyping and for the construction of Medical Information Systems in a research environment which requires a flexible approach. The prototyping facility of AIDA operates terminal independent and is even to a great extent multi-lingual. Most of these features are table-driven; this allows on-line changes in the use of terminal type and language, but also causes overhead. AIDA has a set of optimizing tools by which it is possible to build a faster, but (of course) less flexible code from these table definitions. By separating the AIDA software in a source and a run-time version, one is able to write implementation-specific code which can be selected and loaded by a special source loader, being part of the AIDA software. This feature is also accessible for maintaining software on different sites and on different installations.

  8. Some thoughts on cartographic and geographic information systems for the 1980's

    USGS Publications Warehouse

    Starr, L.E.; Anderson, Kirk E.

    1981-01-01

    The U.S. Geological Survey is adopting computer techniques to meet the expanding need for cartographic base category data. Digital methods are becoming increasingly important in the mapmaking process, and the demand is growing for physical, social, and economic data. Recognizing these emerging needs, the National Mapping Division began, several years ago, an active program to develop advanced digital methods to support cartographic and geographic data processing. An integrated digital cartographic database would meet the anticipated needs. Such a database would contain data from various sources, and could provide a variety of standard and customized map and digital data file products. This cartographic database soon will be technologically feasible. The present trends in the economics of cartographic and geographic data handling and the growing needs for integrated physical, social, and economic data make such a database virtually mandatory.

  9. Adaptive Neuro-Fuzzy Modeling of UH-60A Pilot Vibration

    NASA Technical Reports Server (NTRS)

    Kottapalli, Sesi; Malki, Heidar A.; Langari, Reza

    2003-01-01

    Adaptive neuro-fuzzy relationships have been developed to model the UH-60A Black Hawk pilot floor vertical vibration. A 200 point database that approximates the entire UH-60A helicopter flight envelope is used for training and testing purposes. The NASA/Army Airloads Program flight test database was the source of the 200 point database. The present study is conducted in two parts. The first part involves level flight conditions and the second part involves the entire (200 point) database including maneuver conditions. The results show that a neuro-fuzzy model can successfully predict the pilot vibration. Also, it is found that the training phase of this neuro-fuzzy model takes only two or three iterations to converge for most cases. Thus, the proposed approach produces a potentially viable model for real-time implementation.

  10. DEVELOPMENT AND APPLICATION OF A MASS SPECTRA-VOLATILITY DATABASE OF COMBUSTION AND SECONDARY ORGANIC AEROSOL SOURCES FOR THE AERODYNE AEROSOL MASS SPECTROMETER

    EPA Science Inventory

    1. Thermodenuder Development:

    Two TD systems were designed, constructed, and tested at Aerodyne. In this design, the vaporizer consists of a 50 cm long, 1 inch OD stainless steel tube wrapped with three heating tapes and fiberglass insulation and then mounted in a sta...

  11. Influence of a Hospital-Based, Internal Leadership Development Program on Leadership Effectiveness

    ERIC Educational Resources Information Center

    Welch-Carre, Elizabeth

    2017-01-01

    A search on Amazon revealed more than 6,000 books related to leadership development. The Business Source database has more than 700 articles with the word leadership in the title, published between 2005 and 2015. This suggests that leadership is a topic in which many are interested. Clearly, leadership makes a difference in an organization's…

  12. Machine Learning and Decision Support in Critical Care

    PubMed Central

    Johnson, Alistair E. W.; Ghassemi, Mohammad M.; Nemati, Shamim; Niehaus, Katherine E.; Clifton, David A.; Clifford, Gari D.

    2016-01-01

    Clinical data management systems typically provide caregiver teams with useful information, derived from large, sometimes highly heterogeneous, data sources that are often changing dynamically. Over the last decade there has been a significant surge in interest in using these data sources, from simply re-using the standard clinical databases for event prediction or decision support, to including dynamic and patient-specific information into clinical monitoring and prediction problems. However, in most cases, commercial clinical databases have been designed to document clinical activity for reporting, liability and billing reasons, rather than for developing new algorithms. With increasing excitement surrounding “secondary use of medical records” and “Big Data” analytics, it is important to understand the limitations of current databases and what needs to change in order to enter an era of “precision medicine.” This review article covers many of the issues involved in the collection and preprocessing of critical care data. The three challenges in critical care are considered: compartmentalization, corruption, and complexity. A range of applications addressing these issues are covered, including the modernization of static acuity scoring; on-line patient tracking; personalized prediction and risk assessment; artifact detection; state estimation; and incorporation of multimodal data sources such as genomic and free text data. PMID:27765959

  13. Design and deployment of a large brain-image database for clinical and nonclinical research

    NASA Astrophysics Data System (ADS)

    Yang, Guo Liang; Lim, Choie Cheio Tchoyoson; Banukumar, Narayanaswami; Aziz, Aamer; Hui, Francis; Nowinski, Wieslaw L.

    2004-04-01

    An efficient database is an essential component of organizing diverse information on image metadata and patient information for research in medical imaging. This paper describes the design, development and deployment of a large database system serving as a brain image repository that can be used across different platforms in various medical researches. It forms the infrastructure that links hospitals and institutions together and shares data among them. The database contains patient-, pathology-, image-, research- and management-specific data. The functionalities of the database system include image uploading, storage, indexing, downloading and sharing as well as database querying and management with security and data anonymization concerns well taken care of. The structure of database is multi-tier client-server architecture with Relational Database Management System, Security Layer, Application Layer and User Interface. Image source adapter has been developed to handle most of the popular image formats. The database has a user interface based on web browsers and is easy to handle. We have used Java programming language for its platform independency and vast function libraries. The brain image database can sort data according to clinically relevant information. This can be effectively used in research from the clinicians" points of view. The database is suitable for validation of algorithms on large population of cases. Medical images for processing could be identified and organized based on information in image metadata. Clinical research in various pathologies can thus be performed with greater efficiency and large image repositories can be managed more effectively. The prototype of the system has been installed in a few hospitals and is working to the satisfaction of the clinicians.

  14. SAADA: Astronomical Databases Made Easier

    NASA Astrophysics Data System (ADS)

    Michel, L.; Nguyen, H. N.; Motch, C.

    2005-12-01

    Many astronomers wish to share datasets with their community but have not enough manpower to develop databases having the functionalities required for high-level scientific applications. The SAADA project aims at automatizing the creation and deployment process of such databases. A generic but scientifically relevant data model has been designed which allows one to build databases by providing only a limited number of product mapping rules. Databases created by SAADA rely on a relational database supporting JDBC and covered by a Java layer including a lot of generated code. Such databases can simultaneously host spectra, images, source lists and plots. Data are grouped in user defined collections whose content can be seen as one unique set per data type even if their formats differ. Datasets can be correlated one with each other using qualified links. These links help, for example, to handle the nature of a cross-identification (e.g., a distance or a likelihood) or to describe their scientific content (e.g., by associating a spectrum to a catalog entry). The SAADA query engine is based on a language well suited to the data model which can handle constraints on linked data, in addition to classical astronomical queries. These constraints can be applied on the linked objects (number, class and attributes) and/or on the link qualifier values. Databases created by SAADA are accessed through a rich WEB interface or a Java API. We are currently developing an inter-operability module implanting VO protocols.

  15. Development of the Global Earthquake Model’s neotectonic fault database

    USGS Publications Warehouse

    Christophersen, Annemarie; Litchfield, Nicola; Berryman, Kelvin; Thomas, Richard; Basili, Roberto; Wallace, Laura; Ries, William; Hayes, Gavin P.; Haller, Kathleen M.; Yoshioka, Toshikazu; Koehler, Richard D.; Clark, Dan; Wolfson-Schwehr, Monica; Boettcher, Margaret S.; Villamor, Pilar; Horspool, Nick; Ornthammarath, Teraphan; Zuñiga, Ramon; Langridge, Robert M.; Stirling, Mark W.; Goded, Tatiana; Costa, Carlos; Yeats, Robert

    2015-01-01

    The Global Earthquake Model (GEM) aims to develop uniform, openly available, standards, datasets and tools for worldwide seismic risk assessment through global collaboration, transparent communication and adapting state-of-the-art science. GEM Faulted Earth (GFE) is one of GEM’s global hazard module projects. This paper describes GFE’s development of a modern neotectonic fault database and a unique graphical interface for the compilation of new fault data. A key design principle is that of an electronic field notebook for capturing observations a geologist would make about a fault. The database is designed to accommodate abundant as well as sparse fault observations. It features two layers, one for capturing neotectonic faults and fold observations, and the other to calculate potential earthquake fault sources from the observations. In order to test the flexibility of the database structure and to start a global compilation, five preexisting databases have been uploaded to the first layer and two to the second. In addition, the GFE project has characterised the world’s approximately 55,000 km of subduction interfaces in a globally consistent manner as a basis for generating earthquake event sets for inclusion in earthquake hazard and risk modelling. Following the subduction interface fault schema and including the trace attributes of the GFE database schema, the 2500-km-long frontal thrust fault system of the Himalaya has also been characterised. We propose the database structure to be used widely, so that neotectonic fault data can make a more complete and beneficial contribution to seismic hazard and risk characterisation globally.

  16. Common characteristics of open source software development and applicability for drug discovery: a systematic review

    PubMed Central

    2011-01-01

    Background Innovation through an open source model has proven to be successful for software development. This success has led many to speculate if open source can be applied to other industries with similar success. We attempt to provide an understanding of open source software development characteristics for researchers, business leaders and government officials who may be interested in utilizing open source innovation in other contexts and with an emphasis on drug discovery. Methods A systematic review was performed by searching relevant, multidisciplinary databases to extract empirical research regarding the common characteristics and barriers of initiating and maintaining an open source software development project. Results Common characteristics to open source software development pertinent to open source drug discovery were extracted. The characteristics were then grouped into the areas of participant attraction, management of volunteers, control mechanisms, legal framework and physical constraints. Lastly, their applicability to drug discovery was examined. Conclusions We believe that the open source model is viable for drug discovery, although it is unlikely that it will exactly follow the form used in software development. Hybrids will likely develop that suit the unique characteristics of drug discovery. We suggest potential motivations for organizations to join an open source drug discovery project. We also examine specific differences between software and medicines, specifically how the need for laboratories and physical goods will impact the model as well as the effect of patents. PMID:21955914

  17. Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases.

    PubMed

    Wollbrett, Julien; Larmande, Pierre; de Lamotte, Frédéric; Ruiz, Manuel

    2013-04-15

    In recent years, a large amount of "-omics" data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic.

  18. Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

    PubMed Central

    2013-01-01

    Background In recent years, a large amount of “-omics” data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. Results We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. Conclusions BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic. PMID:23586394

  19. Ground-source heat pump case studies and utility programs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lienau, P.J.; Boyd, T.L.; Rogers, R.L.

    1995-04-01

    Ground-source heat pump systems are one of the promising new energy technologies that has shown rapid increase in usage over the past ten years in the United States. These systems offer substantial benefits to consumers and utilities in energy (kWh) and demand (kW) savings. The purpose of this study was to determine what existing monitored data was available mainly from electric utilities on heat pump performance, energy savings and demand reduction for residential, school and commercial building applications. In order to verify the performance, information was collected for 253 case studies from mainly utilities throughout the United States. The casemore » studies were compiled into a database. The database was organized into general information, system information, ground system information, system performance, and additional information. Information was developed on the status of demand-side management of ground-source heat pump programs for about 60 electric utility and rural electric cooperatives on marketing, incentive programs, barriers to market penetration, number units installed in service area, and benefits.« less

  20. Assessment & Commitment Tracking System (ACTS)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bryant, Robert A.; Childs, Teresa A.; Miller, Michael A.

    2004-12-20

    The ACTS computer code provides a centralized tool for planning and scheduling assessments, tracking and managing actions associated with assessments or that result from an event or condition, and "mining" data for reporting and analyzing information for improving performance. The ACTS application is designed to work with the MS SQL database management system. All database interfaces are written in SQL. The following software is used to develop and support the ACTS application: Cold Fusion HTML JavaScript Quest TOAD Microsoft Visual Source Safe (VSS) HTML Mailer for sending email Microsoft SQL Microsoft Internet Information Server

  1. Revitalizing the drug pipeline: AntibioticDB, an open access database to aid antibacterial research and development.

    PubMed

    Farrell, L J; Lo, R; Wanford, J J; Jenkins, A; Maxwell, A; Piddock, L J V

    2018-06-11

    The current state of antibiotic discovery, research and development is insufficient to respond to the need for new treatments for drug-resistant bacterial infections. The process has changed over the last decade, with most new agents that are in Phases 1-3, or recently approved, having been discovered in small- and medium-sized enterprises or academia. These agents have then been licensed or sold to large companies for further development with the goal of taking them to market. However, early drug discovery and development, including the possibility of developing previously discontinued agents, would benefit from a database of antibacterial compounds for scrutiny by the developers. This article describes the first free, open-access searchable database of antibacterial compounds, including discontinued agents, drugs under pre-clinical development and those in clinical trials: AntibioticDB (AntibioticDB.com). Data were obtained from publicly available sources. This article summarizes the compounds and drugs in AntibioticDB, including their drug class, mode of action, development status and propensity to select drug-resistant bacteria. AntibioticDB includes compounds currently in pre-clinical development and 834 that have been discontinued and that reached varying stages of development. These may serve as starting points for future research and development.

  2. Biomedical informatics: development of a comprehensive data warehouse for clinical and genomic breast cancer research.

    PubMed

    Hu, Hai; Brzeski, Henry; Hutchins, Joe; Ramaraj, Mohan; Qu, Long; Xiong, Richard; Kalathil, Surendran; Kato, Rand; Tenkillaya, Santhosh; Carney, Jerry; Redd, Rosann; Arkalgudvenkata, Sheshkumar; Shahzad, Kashif; Scott, Richard; Cheng, Hui; Meadow, Stephen; McMichael, John; Sheu, Shwu-Lin; Rosendale, David; Kvecher, Leonid; Ahern, Stephen; Yang, Song; Zhang, Yonghong; Jordan, Rick; Somiari, Stella B; Hooke, Jeffrey; Shriver, Craig D; Somiari, Richard I; Liebman, Michael N

    2004-10-01

    The Windber Research Institute is an integrated high-throughput research center employing clinical, genomic and proteomic platforms to produce terabyte levels of data. We use biomedical informatics technologies to integrate all of these operations. This report includes information on a multi-year, multi-phase hybrid data warehouse project currently under development in the Institute. The purpose of the warehouse is to host the terabyte-level of internal experimentally generated data as well as data from public sources. We have previously reported on the phase I development, which integrated limited internal data sources and selected public databases. Currently, we are completing phase II development, which integrates our internal automated data sources and develops visualization tools to query across these data types. This paper summarizes our clinical and experimental operations, the data warehouse development, and the challenges we have faced. In phase III we plan to federate additional manual internal and public data sources and then to develop and adapt more data analysis and mining tools. We expect that the final implementation of the data warehouse will greatly facilitate biomedical informatics research.

  3. Emissions & Measurements - Black Carbon | Science ...

    EPA Pesticide Factsheets

    Emissions and Measurement (EM) research activities performed within the National Risk Management Research Lab NRMRL) of EPA's Office of Research and Development (ORD) support measurement and laboratory analysis approaches to accurately characterize source emissions, and near source concentrations of air pollutants. They also support integrated Agency research programs (e.g., source to health outcomes) and the development of databases and inventories that assist Federal, state, and local air quality managers and industry implement and comply with air pollution standards. EM research underway in NRMRL supports the Agency's efforts to accurately characterize, analyze, measure and manage sources of air pollution. This pamphlet focuses on the EM research that NRMRL researchers conduct related to black carbon (BC). Black Carbon is a pollutant of concern to EPA due to its potential impact on human health and climate change. There are extensive uncertainties in emissions of BC from stationary and mobile sources. Emissions and Measurement (EM) research activities performed within the National Risk Management Research Lab NRMRL) of EPA's Office of Research and Development (ORD)

  4. USDA Potato Small RNA Database

    USDA-ARS?s Scientific Manuscript database

    Small RNAs (sRNAs) are now understood to be involved in gene regulation, function and development. High throughput sequencing (HTS) of sRNAs generates large data sets for analyzing the abundance, source and roles for specific sRNAs. These sRNAs result from transcript degradation as well as specific ...

  5. What's New in Software? Hot New Tool: The Hypertext.

    ERIC Educational Resources Information Center

    Hedley, Carolyn N.

    1989-01-01

    This article surveys recent developments in hypertext software, a highly interactive nonsequential reading/writing/database approach to research and teaching that allows paths to be created through related materials including text, graphics, video, and animation sources. Described are uses, advantages, and problems of hypertext. (PB)

  6. Development of a Harmonized Database of Reported and Predicted Consumer Product Ingredient Information

    EPA Science Inventory

    Near-field exposure to chemicals in consumer products has been identified as a significant source of exposure for many chemicals. Quantitative data on product chemical composition and weight fraction is a key parameter for characterizing this exposure. While data on product compo...

  7. Naval sensor data database (NSDD)

    NASA Astrophysics Data System (ADS)

    Robertson, Candace J.; Tubridy, Lisa H.

    1999-08-01

    The Naval Sensor Data database (NSDD) is a multi-year effort to archive, catalogue, and disseminate data from all types of sensors to the mine warfare, signal and image processing, and sensor development communities. The purpose is to improve and accelerate research and technology. Providing performers with the data required to develop and validate improvements in hardware, simulation, and processing will foster advances in sensor and system performance. The NSDD will provide a centralized source of sensor data in its associated ground truth, which will support an improved understanding will be benefited in the areas of signal processing, computer-aided detection and classification, data compression, data fusion, and geo-referencing, as well as sensor and sensor system design.

  8. A systematic review of administrative and clinical databases of infants admitted to neonatal units.

    PubMed

    Statnikov, Yevgeniy; Ibrahim, Buthaina; Modi, Neena

    2017-05-01

    High quality information, increasingly captured in clinical databases, is a useful resource for evaluating and improving newborn care. We conducted a systematic review to identify neonatal databases, and define their characteristics. We followed a preregistered protocol using MesH terms to search MEDLINE, EMBASE, CINAHL, Web of Science and OVID Maternity and Infant Care Databases for articles identifying patient level databases covering more than one neonatal unit. Full-text articles were reviewed and information extracted on geographical coverage, criteria for inclusion, data source, and maternal and infant characteristics. We identified 82 databases from 2037 publications. Of the country-specific databases there were 39 regional and 39 national. Sixty databases restricted entries to neonatal unit admissions by birth characteristic or insurance cover; 22 had no restrictions. Data were captured specifically for 53 databases; 21 administrative sources; 8 clinical sources. Two clinical databases hold the largest range of data on patient characteristics, USA's Pediatrix BabySteps Clinical Data Warehouse and UK's National Neonatal Research Database. A number of neonatal databases exist that have potential to contribute to evaluating neonatal care. The majority is created by entering data specifically for the database, duplicating information likely already captured in other administrative and clinical patient records. This repetitive data entry represents an unnecessary burden in an environment where electronic patient records are increasingly used. Standardisation of data items is necessary to facilitate linkage within and between countries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  9. The Chandra Source Catalog: Storage and Interfaces

    NASA Astrophysics Data System (ADS)

    van Stone, David; Harbo, Peter N.; Tibbetts, Michael S.; Zografou, Panagoula; Evans, Ian N.; Primini, Francis A.; Glotfelty, Kenny J.; Anderson, Craig S.; Bonaventura, Nina R.; Chen, Judy C.; Davis, John E.; Doe, Stephen M.; Evans, Janet D.; Fabbiano, Giuseppina; Galle, Elizabeth C.; Gibbs, Danny G., II; Grier, John D.; Hain, Roger; Hall, Diane M.; He, Xiang Qun (Helen); Houck, John C.; Karovska, Margarita; Kashyap, Vinay L.; Lauer, Jennifer; McCollough, Michael L.; McDowell, Jonathan C.; Miller, Joseph B.; Mitschang, Arik W.; Morgan, Douglas L.; Mossman, Amy E.; Nichols, Joy S.; Nowak, Michael A.; Plummer, David A.; Refsdal, Brian L.; Rots, Arnold H.; Siemiginowska, Aneta L.; Sundheim, Beth A.; Winkelman, Sherry L.

    2009-09-01

    The Chandra Source Catalog (CSC) is part of the Chandra Data Archive (CDA) at the Chandra X-ray Center. The catalog contains source properties and associated data objects such as images, spectra, and lightcurves. The source properties are stored in relational databases and the data objects are stored in files with their metadata stored in databases. The CDA supports different versions of the catalog: multiple fixed release versions and a live database version. There are several interfaces to the catalog: CSCview, a graphical interface for building and submitting queries and for retrieving data objects; a command-line interface for property and source searches using ADQL; and VO-compliant services discoverable though the VO registry. This poster describes the structure of the catalog and provides an overview of the interfaces.

  10. Standardization of search methods for guideline development: an international survey of evidence-based guideline development groups.

    PubMed

    Deurenberg, Rikie; Vlayen, Joan; Guillo, Sylvie; Oliver, Thomas K; Fervers, Beatrice; Burgers, Jako

    2008-03-01

    Effective literature searching is particularly important for clinical practice guideline development. Sophisticated searching and filtering mechanisms are needed to help ensure that all relevant research is reviewed. To assess the methods used for the selection of evidence for guideline development by evidence-based guideline development organizations. A semistructured questionnaire assessing the databases, search filters and evaluation methods used for literature retrieval was distributed to eight major organizations involved in evidence-based guideline development. All of the organizations used search filters as part of guideline development. The medline database was the primary source accessed for literature retrieval. The OVID or SilverPlatter interfaces were used in preference to the freely accessed PubMed interface. The Cochrane Library, embase, cinahl and psycinfo databases were also frequently used by the organizations. All organizations reported the intention to improve and validate their filters for finding literature specifically relevant for guidelines. In the first international survey of its kind, eight major guideline development organizations indicated a strong interest in identifying, improving and standardizing search filters to improve guideline development. It is to be hoped that this will result in the standardization of, and open access to, search filters, an improvement in literature searching outcomes and greater collaboration among guideline development organizations.

  11. Utilization of open source electronic health record around the world: A systematic review

    PubMed Central

    Aminpour, Farzaneh; Sadoughi, Farahnaz; Ahamdi, Maryam

    2014-01-01

    Many projects on developing Electronic Health Record (EHR) systems have been carried out in many countries. The current study was conducted to review the published data on the utilization of open source EHR systems in different countries all over the world. Using free text and keyword search techniques, six bibliographic databases were searched for related articles. The identified papers were screened and reviewed during a string of stages for the irrelevancy and validity. The findings showed that open source EHRs have been wildly used by source limited regions in all continents, especially in Sub-Saharan Africa and South America. It would create opportunities to improve national healthcare level especially in developing countries with minimal financial resources. Open source technology is a solution to overcome the problems of high-costs and inflexibility associated with the proprietary health information systems. PMID:24672566

  12. Fast in-database cross-matching of high-cadence, high-density source lists with an up-to-date sky model

    NASA Astrophysics Data System (ADS)

    Scheers, B.; Bloemen, S.; Mühleisen, H.; Schellart, P.; van Elteren, A.; Kersten, M.; Groot, P. J.

    2018-04-01

    Coming high-cadence wide-field optical telescopes will image hundreds of thousands of sources per minute. Besides inspecting the near real-time data streams for transient and variability events, the accumulated data archive is a wealthy laboratory for making complementary scientific discoveries. The goal of this work is to optimise column-oriented database techniques to enable the construction of a full-source and light-curve database for large-scale surveys, that is accessible by the astronomical community. We adopted LOFAR's Transients Pipeline as the baseline and modified it to enable the processing of optical images that have much higher source densities. The pipeline adds new source lists to the archive database, while cross-matching them with the known cataloguedsources in order to build a full light-curve archive. We investigated several techniques of indexing and partitioning the largest tables, allowing for faster positional source look-ups in the cross matching algorithms. We monitored all query run times in long-term pipeline runs where we processed a subset of IPHAS data that have image source density peaks over 170,000 per field of view (500,000 deg-2). Our analysis demonstrates that horizontal table partitions of declination widths of one-degree control the query run times. Usage of an index strategy where the partitions are densely sorted according to source declination yields another improvement. Most queries run in sublinear time and a few (< 20%) run in linear time, because of dependencies on input source-list and result-set size. We observed that for this logical database partitioning schema the limiting cadence the pipeline achieved with processing IPHAS data is 25 s.

  13. A Global Digital Database and Atlas of Quaternary Dune Fields and Sand Seas

    NASA Astrophysics Data System (ADS)

    Lancaster, N.; Halfen, A. F.

    2012-12-01

    Sand seas and dune fields are globally significant sedimentary deposits, which archive the effects of climate and sea level change on a variety of temporal and spatial scales. Dune systems provide a valuable source of information on past climate conditions, including evidence for periods of aridity and unique data on past wind regimes. Researchers have compiled vast quantities of geomorphic and chronological data from these dune systems for nearly half a century, however, these data remain disconnected, making comparisons of dune systems challenging at global and regional scales. The primary goal of this project is to develop a global digital database of chronologic information for periods of desert sand dune accumulation and stabilization, as well as, pertinent stratigraphic and geomorphic information. This database can then be used by scientists to 1) document the history of aeolian processes in arid regions with emphasis on dune systems in low and mid latitude deserts, 2) correlate periods of sand accumulation and stability with other terrestrial and marine paleoclimatic proxies and records, and 3) develop an improved understanding of the response of dune systems to climate change. The database currently resides in Microsoft Access format, which allows searching and filtering of data. The database includes 4 linked tables containing information on the site, chronological control (radiocarbon or luminescence), and the pertinent literature citations. Thus far the database contains information for 838 sites world wide, comprising 2598 luminescence and radiocarbon ages, though these numbers increase regularly as new data is added. The database is only available on request at this time, however, an online, GIS database is being developed and will be available in the near future. Data outputs from the online database will include PDF reports and Google Earth formatted data sets for quick viewing of data. Additionally, data will be available in a gridded format for wider use in data-model comparisons. Sites in database August 2012

  14. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

    PubMed

    Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L

    2016-11-04

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .

  15. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics

    PubMed Central

    Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.

    2016-01-01

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/. PMID:27577934

  16. Concierge: Personal Database Software for Managing Digital Research Resources

    PubMed Central

    Sakai, Hiroyuki; Aoyama, Toshihiro; Yamaji, Kazutsuna; Usui, Shiro

    2007-01-01

    This article introduces a desktop application, named Concierge, for managing personal digital research resources. Using simple operations, it enables storage of various types of files and indexes them based on content descriptions. A key feature of the software is a high level of extensibility. By installing optional plug-ins, users can customize and extend the usability of the software based on their needs. In this paper, we also introduce a few optional plug-ins: literature management, electronic laboratory notebook, and XooNlps client plug-ins. XooNIps is a content management system developed to share digital research resources among neuroscience communities. It has been adopted as the standard database system in Japanese neuroinformatics projects. Concierge, therefore, offers comprehensive support from management of personal digital research resources to their sharing in open-access neuroinformatics databases such as XooNIps. This interaction between personal and open-access neuroinformatics databases is expected to enhance the dissemination of digital research resources. Concierge is developed as an open source project; Mac OS X and Windows XP versions have been released at the official site (http://concierge.sourceforge.jp). PMID:18974800

  17. Distributed policy based access to networked heterogeneous ISR data sources

    NASA Astrophysics Data System (ADS)

    Bent, G.; Vyvyan, D.; Wood, David; Zerfos, Petros; Calo, Seraphin

    2010-04-01

    Within a coalition environment, ad hoc Communities of Interest (CoI's) come together, perhaps for only a short time, with different sensors, sensor platforms, data fusion elements, and networks to conduct a task (or set of tasks) with different coalition members taking different roles. In such a coalition, each organization will have its own inherent restrictions on how it will interact with the others. These are usually stated as a set of policies, including security and privacy policies. The capability that we want to enable for a coalition operation is to provide access to information from any coalition partner in conformance with the policies of all. One of the challenges in supporting such ad-hoc coalition operations is that of providing efficient access to distributed sources of data, where the applications requiring the data do not have knowledge of the location of the data within the network. To address this challenge the International Technology Alliance (ITA) program has been developing the concept of a Dynamic Distributed Federated Database (DDFD), also know as a Gaian Database. This type of database provides a means for accessing data across a network of distributed heterogeneous data sources where access to the information is controlled by a mixture of local and global policies. We describe how a network of disparate ISR elements can be expressed as a DDFD and how this approach enables sensor and other information sources to be discovered autonomously or semi-autonomously and/or combined, fused formally defined local and global policies.

  18. Advanced Modeling and Uncertainty Quantification for Flight Dynamics; Interim Results and Challenges

    NASA Technical Reports Server (NTRS)

    Hyde, David C.; Shweyk, Kamal M.; Brown, Frank; Shah, Gautam

    2014-01-01

    As part of the NASA Vehicle Systems Safety Technologies (VSST), Assuring Safe and Effective Aircraft Control Under Hazardous Conditions (Technical Challenge #3), an effort is underway within Boeing Research and Technology (BR&T) to address Advanced Modeling and Uncertainty Quantification for Flight Dynamics (VSST1-7). The scope of the effort is to develop and evaluate advanced multidisciplinary flight dynamics modeling techniques, including integrated uncertainties, to facilitate higher fidelity response characterization of current and future aircraft configurations approaching and during loss-of-control conditions. This approach is to incorporate multiple flight dynamics modeling methods for aerodynamics, structures, and propulsion, including experimental, computational, and analytical. Also to be included are techniques for data integration and uncertainty characterization and quantification. This research shall introduce new and updated multidisciplinary modeling and simulation technologies designed to improve the ability to characterize airplane response in off-nominal flight conditions. The research shall also introduce new techniques for uncertainty modeling that will provide a unified database model comprised of multiple sources, as well as an uncertainty bounds database for each data source such that a full vehicle uncertainty analysis is possible even when approaching or beyond Loss of Control boundaries. Methodologies developed as part of this research shall be instrumental in predicting and mitigating loss of control precursors and events directly linked to causal and contributing factors, such as stall, failures, damage, or icing. The tasks will include utilizing the BR&T Water Tunnel to collect static and dynamic data to be compared to the GTM extended WT database, characterizing flight dynamics in off-nominal conditions, developing tools for structural load estimation under dynamic conditions, devising methods for integrating various modeling elements into a real-time simulation capability, generating techniques for uncertainty modeling that draw data from multiple modeling sources, and providing a unified database model that includes nominal plus increments for each flight condition. This paper presents status of testing in the BR&T water tunnel and analysis of the resulting data and efforts to characterize these data using alternative modeling methods. Program challenges and issues are also presented.

  19. CycADS: an annotation database system to ease the development and update of BioCyc databases

    PubMed Central

    Vellozo, Augusto F.; Véron, Amélie S.; Baa-Puyoulet, Patrice; Huerta-Cepas, Jaime; Cottret, Ludovic; Febvay, Gérard; Calevro, Federica; Rahbé, Yvan; Douglas, Angela E.; Gabaldón, Toni; Sagot, Marie-France; Charles, Hubert; Colella, Stefano

    2011-01-01

    In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms. Database URL: http://www.cycadsys.org PMID:21474551

  20. Integrated radiologist's workstation enabling the radiologist as an effective clinical consultant

    NASA Astrophysics Data System (ADS)

    McEnery, Kevin W.; Suitor, Charles T.; Hildebrand, Stan; Downs, Rebecca; Thompson, Stephen K.; Shepard, S. Jeff

    2002-05-01

    Since February 2000, radiologists at the M. D. Anderson Cancer Center have accessed clinical information through an internally developed radiologist's clinical interpretation workstation called RadStation. This project provides a fully integrated digital dictation workstation with clinical data review. RadStation enables the radiologist as an effective clinical consultant with access to pertinent sources of clinical information at the time of dictation. Data sources not only include prior radiology reports from the radiology information system (RIS) but access to pathology data, laboratory data, history and physicals, clinic notes, and operative reports. With integrated clinical information access, a radiologists's interpretation not only comments on morphologic findings but also can enable evaluation of study findings in the context of pertinent clinical presentation and history. Image access is enabled through the integration of an enterprise image archive (Stentor, San Francisco). Database integration is achieved by a combination of real time HL7 messaging and queries to SQL-based legacy databases. A three-tier system architecture accommodates expanding access to additional databases including real-time patient schedule as well as patient medications and allergies.

  1. Functional Interaction Network Construction and Analysis for Disease Discovery.

    PubMed

    Wu, Guanming; Haw, Robin

    2017-01-01

    Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.

  2. Enhancing GADRAS Source Term Inputs for Creation of Synthetic Spectra.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Horne, Steven M.; Harding, Lee

    The Gamma Detector Response and Analysis Software (GADRAS) team has enhanced the source term input for the creation of synthetic spectra. These enhancements include the following: allowing users to programmatically provide source information to GADRAS through memory, rather than through a string limited to 256 characters; allowing users to provide their own source decay database information; and updating the default GADRAS decay database to fix errors and include coincident gamma information.

  3. The SBOL Stack: A Platform for Storing, Publishing, and Sharing Synthetic Biology Designs.

    PubMed

    Madsen, Curtis; McLaughlin, James Alastair; Mısırlı, Göksel; Pocock, Matthew; Flanagan, Keith; Hallinan, Jennifer; Wipat, Anil

    2016-06-17

    Recently, synthetic biologists have developed the Synthetic Biology Open Language (SBOL), a data exchange standard for descriptions of genetic parts, devices, modules, and systems. The goals of this standard are to allow scientists to exchange designs of biological parts and systems, to facilitate the storage of genetic designs in repositories, and to facilitate the description of genetic designs in publications. In order to achieve these goals, the development of an infrastructure to store, retrieve, and exchange SBOL data is necessary. To address this problem, we have developed the SBOL Stack, a Resource Description Framework (RDF) database specifically designed for the storage, integration, and publication of SBOL data. This database allows users to define a library of synthetic parts and designs as a service, to share SBOL data with collaborators, and to store designs of biological systems locally. The database also allows external data sources to be integrated by mapping them to the SBOL data model. The SBOL Stack includes two Web interfaces: the SBOL Stack API and SynBioHub. While the former is designed for developers, the latter allows users to upload new SBOL biological designs, download SBOL documents, search by keyword, and visualize SBOL data. Since the SBOL Stack is based on semantic Web technology, the inherent distributed querying functionality of RDF databases can be used to allow different SBOL stack databases to be queried simultaneously, and therefore, data can be shared between different institutes, centers, or other users.

  4. A Source-based Measurement Database for Occupational Exposure Assessment of Electromagnetic Fields in the INTEROCC Study: A Literature Review Approach

    PubMed Central

    Vila, Javier; Bowman, Joseph D.; Richardson, Lesley; Kincl, Laurel; Conover, Dave L.; McLean, Dave; Mann, Simon; Vecchia, Paolo; van Tongeren, Martie; Cardis, Elisabeth

    2016-01-01

    Introduction: To date, occupational exposure assessment of electromagnetic fields (EMF) has relied on occupation-based measurements and exposure estimates. However, misclassification due to between-worker variability remains an unsolved challenge. A source-based approach, supported by detailed subject data on determinants of exposure, may allow for a more individualized exposure assessment. Detailed information on the use of occupational sources of exposure to EMF was collected as part of the INTERPHONE-INTEROCC study. To support a source-based exposure assessment effort within this study, this work aimed to construct a measurement database for the occupational sources of EMF exposure identified, assembling available measurements from the scientific literature. Methods: First, a comprehensive literature search was performed for published and unpublished documents containing exposure measurements for the EMF sources identified, a priori as well as from answers of study subjects. Then, the measurements identified were assessed for quality and relevance to the study objectives. Finally, the measurements selected and complementary information were compiled into an Occupational Exposure Measurement Database (OEMD). Results: Currently, the OEMD contains 1624 sets of measurements (>3000 entries) for 285 sources of EMF exposure, organized by frequency band (0 Hz to 300 GHz) and dosimetry type. Ninety-five documents were selected from the literature (almost 35% of them are unpublished technical reports), containing measurements which were considered informative and valid for our purpose. Measurement data and complementary information collected from these documents came from 16 different countries and cover the time period between 1974 and 2013. Conclusion: We have constructed a database with measurements and complementary information for the most common sources of exposure to EMF in the workplace, based on the responses to the INTERPHONE-INTEROCC study questionnaire. This database covers the entire EMF frequency range and represents the most comprehensive resource of information on occupational EMF exposure. It is available at www.crealradiation.com/index.php/en/databases. PMID:26493616

  5. Semantically Interoperable XML Data

    PubMed Central

    Vergara-Niedermayr, Cristobal; Wang, Fusheng; Pan, Tony; Kurc, Tahsin; Saltz, Joel

    2013-01-01

    XML is ubiquitously used as an information exchange platform for web-based applications in healthcare, life sciences, and many other domains. Proliferating XML data are now managed through latest native XML database technologies. XML data sources conforming to common XML schemas could be shared and integrated with syntactic interoperability. Semantic interoperability can be achieved through semantic annotations of data models using common data elements linked to concepts from ontologies. In this paper, we present a framework and software system to support the development of semantic interoperable XML based data sources that can be shared through a Grid infrastructure. We also present our work on supporting semantic validated XML data through semantic annotations for XML Schema, semantic validation and semantic authoring of XML data. We demonstrate the use of the system for a biomedical database of medical image annotations and markups. PMID:25298789

  6. Online bibliographic sources in hydrology

    USGS Publications Warehouse

    Wild, Emily C.; Havener, W. Michael

    2001-01-01

    Traditional commercial bibliographic databases and indexes provide some access to hydrology materials produced by the government; however, these sources do not provide comprehensive coverage of relevant hydrologic publications. This paper discusses bibliographic information available from the federal government and state geological surveys, water resources agencies, and depositories. In addition to information in these databases, the paper describes the scope, styles of citing, subject terminology, and the ways these information sources are currently being searched, formally and informally, by hydrologists. Information available from the federal and state agencies and from the state depositories might be missed by limiting searches to commercially distributed databases.

  7. The development of large-scale de-identified biomedical databases in the age of genomics-principles and challenges.

    PubMed

    Dankar, Fida K; Ptitsyn, Andrey; Dankar, Samar K

    2018-04-10

    Contemporary biomedical databases include a wide range of information types from various observational and instrumental sources. Among the most important features that unite biomedical databases across the field are high volume of information and high potential to cause damage through data corruption, loss of performance, and loss of patient privacy. Thus, issues of data governance and privacy protection are essential for the construction of data depositories for biomedical research and healthcare. In this paper, we discuss various challenges of data governance in the context of population genome projects. The various challenges along with best practices and current research efforts are discussed through the steps of data collection, storage, sharing, analysis, and knowledge dissemination.

  8. Aero/fluids database system

    NASA Technical Reports Server (NTRS)

    Reardon, John E.; Violett, Duane L., Jr.

    1991-01-01

    The AFAS Database System was developed to provide the basic structure of a comprehensive database system for the Marshall Space Flight Center (MSFC) Structures and Dynamics Laboratory Aerophysics Division. The system is intended to handle all of the Aerophysics Division Test Facilities as well as data from other sources. The system was written for the DEC VAX family of computers in FORTRAN-77 and utilizes the VMS indexed file system and screen management routines. Various aspects of the system are covered, including a description of the user interface, lists of all code structure elements, descriptions of the file structures, a description of the security system operation, a detailed description of the data retrieval tasks, a description of the session log, and a description of the archival system.

  9. OPERA: A free and open source QSAR tool for predicting physicochemical properties and environmental fate endpoints

    EPA Science Inventory

    Collecting the chemical structures and data for necessary QSAR modeling is facilitated by available public databases and open data. However, QSAR model performance is dependent on the quality of data and modeling methodology used. This study developed robust QSAR models for physi...

  10. Interfaces and Expert Systems for Online Retrieval.

    ERIC Educational Resources Information Center

    Kehoe, Cynthia A.

    1985-01-01

    This paper reviews the history of separate online system interfaces which led to efforts to develop expert systems for searching databases, particularly for end users, and introduces the research on such expert systems. Appended is a bibliography of sources on interfaces and expert systems for online retrieval. (Author/EJS)

  11. A Virtual "Hello": A Web-Based Orientation to the Library.

    ERIC Educational Resources Information Center

    Borah, Eloisa Gomez

    1997-01-01

    Describes the development of Web-based library services and resources available at the Rosenfeld Library of the Anderson Graduate School of Management at University of California at Los Angeles. Highlights include library orientation sessions; virtual tours of the library; a database of basic business sources; and research strategies, including…

  12. Farm Labor Research Bibliography. California Agricultural Studies, 91-4.

    ERIC Educational Resources Information Center

    Brown, Cheryl L.; And Others

    This annotated bibliography is a printed version of the automated bibliography available through the Labor Market Division of the California State Department of Employment Development. The database focuses on farm labor issues and includes 1,611 sources of information including bibliographies, research studies, trade journals, and books published…

  13. Dietary Supplement Label Database (DSLD)

    Science.gov Websites

    Intakes (DRIs) Definitions Frequently Asked Questions (FAQ) Information Sources Release Notes Help Search full label derived information from dietary supplement products marketed in the U.S. with a Web-based user interface that provides ready access to label information. It was developed to serve the research

  14. Tracing Boundaries, Effacing Boundaries: Information Literacy as an Academic Discipline

    ERIC Educational Resources Information Center

    Veach, Grace

    2012-01-01

    Both librarianship and composition have been shaken by recent developments in higher education. In libraries ebooks and online databases threaten the traditional "library as warehouse model," while in composition, studies like The Citation Project show that students are not learning how to incorporate sources into their own writing…

  15. Digitizing Images for Curriculum 21: Phase II.

    ERIC Educational Resources Information Center

    Walker, Alice D.

    Although visual databases exist for the study of art, architecture, geography, health care, and other areas, readily accessible sources of quality images are not available for engineering faculty interested in developing multimedia modules or for student projects. Presented here is a brief review of Phase I of the Engineering Visual Database…

  16. Multi-Media and Databases for Historical Enquiry: A Report from the Trenches

    ERIC Educational Resources Information Center

    Hillis, Peter

    2003-01-01

    The Victorian period produced a diverse and rich range of historical source materials including census returns, photographs, film, personal reminiscences, music, cartoons, and posters. Recent changes to the history curriculum emphasise the acquisition of enquiry skills alongside developing knowledge and understanding, which necessitates reference…

  17. Ensembl 2004.

    PubMed

    Birney, E; Andrews, D; Bevan, P; Caccamo, M; Cameron, G; Chen, Y; Clarke, L; Coates, G; Cox, T; Cuff, J; Curwen, V; Cutts, T; Down, T; Durbin, R; Eyras, E; Fernandez-Suarez, X M; Gane, P; Gibbins, B; Gilbert, J; Hammond, M; Hotz, H; Iyer, V; Kahari, A; Jekosch, K; Kasprzyk, A; Keefe, D; Keenan, S; Lehvaslaiho, H; McVicker, G; Melsopp, C; Meidl, P; Mongin, E; Pettett, R; Potter, S; Proctor, G; Rae, M; Searle, S; Slater, G; Smedley, D; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Storey, R; Ureta-Vidal, A; Woodwark, C; Clamp, M; Hubbard, T

    2004-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.

  18. A highly sensitive search strategy for clinical trials in Literatura Latino Americana e do Caribe em Ciências da Saúde (LILACS) was developed.

    PubMed

    Manríquez, Juan J

    2008-04-01

    Systematic reviews should include as many articles as possible. However, many systematic reviews use only databases with high English language content as sources of trials. Literatura Latino Americana e do Caribe em Ciências da Saúde (LILACS) is an underused source of trials, and there is not a validated strategy for searching clinical trials to be used in this database. The objective of this study was to develop a sensitive search strategy for clinical trials in LILACS. An analytical survey was performed. Several single and multiple-term search strategies were tested for their ability to retrieve clinical trials in LILACS. Sensitivity, specificity, and accuracy of each single and multiple-term strategy were calculated using the results of a hand-search of 44 Chilean journals as gold standard. After combining the most sensitive, specific, and accurate single and multiple-term search strategy, a strategy with a sensitivity of 97.75% (95% confidence interval [CI]=95.98-99.53) and a specificity of 61.85 (95% CI=61.19-62.51) was obtained. LILACS is a source of trials that could improve systematic reviews. A new highly sensitive search strategy for clinical trials in LILACS has been developed. It is hoped this search strategy will improve and increase the utilization of LILACS in future systematic reviews.

  19. BIRS – Bioterrorism Information Retrieval System

    PubMed Central

    Tewari, Ashish Kumar; Rashi; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Jain, Chakresh Kumar

    2013-01-01

    Bioterrorism is the intended use of pathogenic strains of microbes to widen terror in a population. There is a definite need to promote research for development of vaccines, therapeutics and diagnostic methods as a part of preparedness to any bioterror attack in the future. BIRS is an open-access database of collective information on the organisms related to bioterrorism. The architecture of database utilizes the current open-source technology viz PHP ver 5.3.19, MySQL and IIS server under windows platform for database designing. Database stores information on literature, generic- information and unique pathways of about 10 microorganisms involved in bioterrorism. This may serve as a collective repository to accelerate the drug discovery and vaccines designing process against such bioterrorist agents (microbes). The available data has been validated from various online resources and literature mining in order to provide the user with a comprehensive information system. Availability The database is freely available at http://www.bioterrorism.biowaves.org PMID:23390356

  20. RDIS: The Rabies Disease Information System.

    PubMed

    Dharmalingam, Baskeran; Jothi, Lydia

    2015-01-01

    Rabies is a deadly viral disease causing acute inflammation or encephalitis of the brain in human beings and other mammals. Therefore, it is of interest to collect information related to the disease from several sources including known literature databases for further analysis and interpretation. Hence, we describe the development of a database called the Rabies Disease Information System (RDIS) for this purpose. The online database describes the etiology, epidemiology, pathogenesis and pathology of the disease using diagrammatic representations. It provides information on several carriers of the rabies viruses like dog, bat, fox and civet, and their distributions around the world. Information related to the urban and sylvatic cycles of transmission of the virus is also made available. The database also contains information related to available diagnostic methods and vaccines for human and other animals. This information is of use to medical, veterinary and paramedical practitioners, students, researchers, pet owners, animal lovers, livestock handlers, travelers and many others. The database is available for free http://rabies.mscwbif.org/home.html.

  1. Survey on the use of information sources in the field of aging.

    PubMed

    Bird, G; Heekin, J M

    1994-01-01

    This article presents the results of a survey conducted over the summer of 1992 on the use of information sources by professionals in the field of aging. In particular, factors affecting the use of electronic information sources were investigated. The data provide a demographic profile of North American gerontologists, with a predictably wide range of disciplines and types of practice represented. Several factors were found to have an impact on the gerontologists' utilization of electronic information sources. Respondents who used a larger-than-average number of computer applications were found to make relatively more use of electronic sources, including online searches, CD-ROM indexes, library OPACs, and other databases searched by remote access. Attendance at library workshops was found to increase the amount of end-user searching but not the amount of library-mediated searching. Respondents also reported which databases they used and which they considered most important. MEDLINE was the most frequently mentioned database across all disciplines, including the health and social sciences. Computer databases were ranked least important out of six listed sources of information, and only 5% of respondents reported having used an electronic current awareness profile.

  2. The Department of Defense Small Business Technology Transfer (STTR) FY 2000

    DTIC Science & Technology

    2000-01-04

    applications (e.g. drug design, pharmacogenomics, and modeling of cells and organs). DARPA - 6 PHASE I: Develop a high performance database...Army, and particularly the Dismounted Soldier, has need for high -energy, lightweight power sources. Polymer electrolyte membrane fuel cells (PEM FCs... efficiently processed fabricated, and tailored to resist high velocity impact and penetration should be developed. PHASE II: Prototype designs from Phase I

  3. SQL is Dead; Long-live SQL: Relational Database Technology in Science Contexts

    NASA Astrophysics Data System (ADS)

    Howe, B.; Halperin, D.

    2014-12-01

    Relational databases are often perceived as a poor fit in science contexts: Rigid schemas, poor support for complex analytics, unpredictable performance, significant maintenance and tuning requirements --- these idiosyncrasies often make databases unattractive in science contexts characterized by heterogeneous data sources, complex analysis tasks, rapidly changing requirements, and limited IT budgets. In this talk, I'll argue that although the value proposition of typical relational database systems are weak in science, the core ideas that power relational databases have become incredibly prolific in open source science software, and are emerging as a universal abstraction for both big data and small data. In addition, I'll talk about two open source systems we are building to "jailbreak" the core technology of relational databases and adapt them for use in science. The first is SQLShare, a Database-as-a-Service system supporting collaborative data analysis and exchange by reducing database use to an Upload-Query-Share workflow with no installation, schema design, or configuration required. The second is Myria, a service that supports much larger scale data, complex analytics, and supports multiple back end systems. Finally, I'll describe some of the ways our collaborators in oceanography, astronomy, biology, fisheries science, and more are using these systems to replace script-based workflows for reasons of performance, flexibility, and convenience.

  4. Task III: Development of an Effective Computational Methodology for Body Force Representation of High-speed Rotor 37

    NASA Technical Reports Server (NTRS)

    Tan, Choon-Sooi; Suder, Kenneth (Technical Monitor)

    2003-01-01

    A framework for an effective computational methodology for characterizing the stability and the impact of distortion in high-speed multi-stage compressor is being developed. The methodology consists of using a few isolated-blade row Navier-Stokes solutions for each blade row to construct a body force database. The purpose of the body force database is to replace each blade row in a multi-stage compressor by a body force distribution to produce same pressure rise and flow turning. To do this, each body force database is generated in such a way that it can respond to the changes in local flow conditions. Once the database is generated, no hrther Navier-Stokes computations are necessary. The process is repeated for every blade row in the multi-stage compressor. The body forces are then embedded as source terms in an Euler solver. The method is developed to have the capability to compute the performance in a flow that has radial as well as circumferential non-uniformity with a length scale larger than a blade pitch; thus it can potentially be used to characterize the stability of a compressor under design. It is these two latter features as well as the accompanying procedure to obtain the body force representation that distinguish the present methodology from the streamline curvature method. The overall computational procedures have been developed. A dimensional analysis was carried out to determine the local flow conditions for parameterizing the magnitudes of the local body force representation of blade rows. An Euler solver was modified to embed the body forces as source terms. The results from the dimensional analysis show that the body forces can be parameterized in terms of the two relative flow angles, the relative Mach number, and the Reynolds number. For flow in a high-speed transonic blade row, they can be parameterized in terms of the local relative Mach number alone.

  5. Building Databases for Education. ERIC Digest.

    ERIC Educational Resources Information Center

    Klausmeier, Jane A.

    This digest provides a brief explanation of what a database is; explains how a database can be used; identifies important factors that should be considered when choosing database management system software; and provides citations to sources for finding reviews and evaluations of database management software. The digest is concerned primarily with…

  6. Historical reconstructions of California wildfires vary by data source

    USGS Publications Warehouse

    Syphard, Alexandra D.; Keeley, Jon E.

    2016-01-01

    Historical data are essential for understanding how fire activity responds to different drivers. It is important that the source of data is commensurate with the spatial and temporal scale of the question addressed, but fire history databases are derived from different sources with different restrictions. In California, a frequently used fire history dataset is the State of California Fire and Resource Assessment Program (FRAP) fire history database, which circumscribes fire perimeters at a relatively fine scale. It includes large fires on both state and federal lands but only covers fires that were mapped or had other spatially explicit data. A different database is the state and federal governments’ annual reports of all fires. They are more complete than the FRAP database but are only spatially explicit to the level of county (California Department of Forestry and Fire Protection – Cal Fire) or forest (United States Forest Service – USFS). We found substantial differences between the FRAP database and the annual summaries, with the largest and most consistent discrepancy being in fire frequency. The FRAP database missed the majority of fires and is thus a poor indicator of fire frequency or indicators of ignition sources. The FRAP database is also deficient in area burned, especially before 1950. Even in contemporary records, the huge number of smaller fires not included in the FRAP database account for substantial cumulative differences in area burned. Wildfires in California account for nearly half of the western United States fire suppression budget. Therefore, the conclusions about data discrepancies and the implications for fire research are of broad importance.

  7. A unique database for gathering data from a mobile app and medical prescription software: a useful data source to collect and analyse patient-reported outcomes of depression and anxiety symptoms.

    PubMed

    Watanabe, Yoshinori; Hirano, Yoko; Asami, Yuko; Okada, Maki; Fujita, Kazuya

    2017-11-01

    A unique database named 'AN-SAPO' was developed by Iwato Corp. and Japan Brain Corp. in collaboration with the psychiatric clinics run by Himorogi Group in Japan. The AN-SAPO database includes patients' depression/anxiety score data from a mobile app named AN-SAPO and medical records from medical prescription software named 'ORCA'. On the mobile app, depression/anxiety severity can be evaluated by answering 20 brief questions and the scores are transferred to the AN-SAPO database together with the patients' medical records on ORCA. Currently, this database is used at the Himorogi Group's psychiatric clinics and has over 2000 patients' records accumulated since November 2013. Since the database covers patients' demographic data, prescribed drugs, and the efficacy and safety information, it could be a useful supporting tool for decision-making in clinical practice. We expect it to be utilised in wider areas of medical fields and for future pharmacovigilance and pharmacoepidemiological studies.

  8. Database on pharmacophore analysis of active principles, from medicinal plants

    PubMed Central

    Pitchai, Daisy; Manikkam, Rajalakshmi; Rajendran, Sasikala R; Pitchai, Gnanamani

    2010-01-01

    Plants continue to be a major source of medicines, as they have been throughout human history. In the present days, drug discovery from plants involves a multidisciplinary approach combining ethnobotanical, phytochemical and biological techniques to provide us new chemical compounds (lead molecules) for the development of drugs against various pharmacological targets, including cancer, diabetes and its secondary complications. In view of this need in current drug discovery from medicinal plants, here we describe another web database containing the information of pharmacophore analysis of active principles possessing antidiabetic, antimicrobial, anticancerous and antioxidant properties from medicinal plants. The database provides the botanical, taxonomic classification, biochemical as well as pharmacological properties of medicinal plants. Data on antidiabetic, antimicrobial, anti oxidative, anti tumor and anti inflammatory compounds, and their physicochemical properties, SMILES Notation, Lipinski's properties are included in our database. One of the proposed features in the database is the predicted ADMET values and the interaction of bioactive compounds to the target protein. The database alphabetically lists the compound name and also provides tabs separating for anti microbial, antitumor, antidiabetic, and antioxidative compounds. Availability http://www.hccbif.info / PMID:21346859

  9. Do-It-Yourself: A Special Library's Approach to Creating Dynamic Web Pages Using Commercial Off-The-Shelf Applications

    NASA Technical Reports Server (NTRS)

    Steeman, Gerald; Connell, Christopher

    2000-01-01

    Many librarians may feel that dynamic Web pages are out of their reach, financially and technically. Yet we are reminded in library and Web design literature that static home pages are a thing of the past. This paper describes how librarians at the Institute for Defense Analyses (IDA) library developed a database-driven, dynamic intranet site using commercial off-the-shelf applications. Administrative issues include surveying a library users group for interest and needs evaluation; outlining metadata elements; and, committing resources from managing time to populate the database and training in Microsoft FrontPage and Web-to-database design. Technical issues covered include Microsoft Access database fundamentals, lessons learned in the Web-to-database process (including setting up Database Source Names (DSNs), redesigning queries to accommodate the Web interface, and understanding Access 97 query language vs. Standard Query Language (SQL)). This paper also offers tips on editing Active Server Pages (ASP) scripting to create desired results. A how-to annotated resource list closes out the paper.

  10. Active fault databases: building a bridge between earthquake geologists and seismic hazard practitioners, the case of the QAFI v.3 database

    NASA Astrophysics Data System (ADS)

    García-Mayordomo, Julián; Martín-Banda, Raquel; Insua-Arévalo, Juan M.; Álvarez-Gómez, José A.; Martínez-Díaz, José J.; Cabral, João

    2017-08-01

    Active fault databases are a very powerful and useful tool in seismic hazard assessment, particularly when singular faults are considered seismogenic sources. Active fault databases are also a very relevant source of information for earth scientists, earthquake engineers and even teachers or journalists. Hence, active fault databases should be updated and thoroughly reviewed on a regular basis in order to keep a standard quality and uniformed criteria. Desirably, active fault databases should somehow indicate the quality of the geological data and, particularly, the reliability attributed to crucial fault-seismic parameters, such as maximum magnitude and recurrence interval. In this paper we explain how we tackled these issues during the process of updating and reviewing the Quaternary Active Fault Database of Iberia (QAFI) to its current version 3. We devote particular attention to describing the scheme devised for classifying the quality and representativeness of the geological evidence of Quaternary activity and the accuracy of the slip rate estimation in the database. Subsequently, we use this information as input for a straightforward rating of the level of reliability of maximum magnitude and recurrence interval fault seismic parameters. We conclude that QAFI v.3 is a much better database than version 2 either for proper use in seismic hazard applications or as an informative source for non-specialized users. However, we already envision new improvements for a future update.

  11. Identification and evaluation of fluvial-dominated deltaic (Class 1 oil) reservoirs in Oklahoma. Yearly technical progress report, January 1--December 31, 1994

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mankin, C.J.; Banken, M.K.

    The Oklahoma Geological Survey and the University of Oklahoma are engaged in a five-year program to identify and address Oklahoma`s oil recovery opportunities in fluvial-dominated deltaic (FDD) reservoirs. This program includes the systematic and comprehensive collection, evaluation, and distribution of information on all of Oklahoma`s FDD oil reservoirs and the recovery technologies that can be applied to those reservoirs with commercial success. To date, the lead geologists have defined the initial geographic extents of Oklahoma`s FDD plays, and compiled known information about those plays. Nine plays have been defined, all of them Pennsylvanian in age and most from the Cherokeemore » Group. A bibliographic database has been developed to record the literature sources and their related plays. Trend maps are being developed to identify the FDD portions of the relevant reservoirs, through accessing current production databases and through compiling the literature results. A reservoir database system also has been developed, to record specific reservoir data elements that are identified through the literature, and through public and private data sources. The project team is working with the Oklahoma Nomenclature Committee of the Mid-Continent Oil and Gas Association to update oil field boundary definitions in the project area. Also, team members are working with several private companies to develop demonstration reservoirs for the reservoir characterization and simulation activities. All of the information gathered through these efforts will be transferred to the Oklahoma petroleum industry through a series of publications and workshops. Additionally, plans are being developed, and hardware and software resources are being acquired, in preparation for the opening of a publicly-accessible computer users laboratory, one component of the technology transfer program.« less

  12. A precipitation database of station-based daily and monthly measurements for West Africa: Overview, quality control and harmonization

    NASA Astrophysics Data System (ADS)

    Bliefernicht, Jan; Waongo, Moussa; Annor, Thompson; Laux, Patrick; Lorenz, Manuel; Salack, Seyni; Kunstmann, Harald

    2017-04-01

    West Africa is a data sparse region. High quality and long-term precipitation data are often not readily available for applications in hydrology, agriculture, meteorology and other needs. To close this gap, we use multiple data sources to develop a precipitation database with long-term daily and monthly time series. This database was compiled from 16 archives including global databases e.g. from the Global Historical Climatology Network (GHCN), databases from research projects (e.g. the AMMA database) and databases of the national meteorological services of some West African countries. The collection consists of more than 2000 precipitation gauges with measurements dating from 1850 to 2015. Due to erroneous measurements (e.g. temporal offsets, unit conversion errors), missing values and inconsistent meta-data, the merging of this precipitation dataset is not straightforward and requires a thorough quality control and harmonization. To this end, we developed geostatistical-based algorithms for quality control of individual databases and harmonization to a joint database. The algorithms are based on a pairwise comparison of the correspondence of precipitation time series in dependence to the distance between stations. They were tested for precipitation time series from gages located in a rectangular domain covering Burkina Faso, Ghana, Benin and Togo. This harmonized and quality controlled precipitation database was recently used for several applications such as the validation of a high resolution regional climate model and the bias correction of precipitation projections provided the Coordinated Regional Climate Downscaling Experiment (CORDEX). In this presentation, we will give an overview of the novel daily and monthly precipitation database and the algorithms used for quality control and harmonization. We will also highlight the quality of global and regional archives (e.g. GHCN, GSOD, AMMA database) in comparison to the precipitation databases provided by the national meteorological services.

  13. A Bayesian Multivariate Receptor Model for Estimating Source Contributions to Particulate Matter Pollution using National Databases.

    PubMed

    Hackstadt, Amber J; Peng, Roger D

    2014-11-01

    Time series studies have suggested that air pollution can negatively impact health. These studies have typically focused on the total mass of fine particulate matter air pollution or the individual chemical constituents that contribute to it, and not source-specific contributions to air pollution. Source-specific contribution estimates are useful from a regulatory standpoint by allowing regulators to focus limited resources on reducing emissions from sources that are major contributors to air pollution and are also desired when estimating source-specific health effects. However, researchers often lack direct observations of the emissions at the source level. We propose a Bayesian multivariate receptor model to infer information about source contributions from ambient air pollution measurements. The proposed model incorporates information from national databases containing data on both the composition of source emissions and the amount of emissions from known sources of air pollution. The proposed model is used to perform source apportionment analyses for two distinct locations in the United States (Boston, Massachusetts and Phoenix, Arizona). Our results mirror previous source apportionment analyses that did not utilize the information from national databases and provide additional information about uncertainty that is relevant to the estimation of health effects.

  14. Catalogue of UV sources in the Galaxy

    NASA Astrophysics Data System (ADS)

    Beitia-Antero, L.; Gómez de Castro, A. I.

    2017-03-01

    The Galaxy Evolution Explorer (GALEX) ultraviolet (UV) database contains the largest photometric catalogue in the ultraviolet range; as a result GALEX photometric bands, Near UV band (NUV) and the Far UV band (FUV), have become standards. Nevertheless, the GALEX catalogue does not include bright UV sources due to the high sensitivity of its detectors, neither sources in the Galactic plane. In order to extend the GALEX database for future UV missions, we have obtained synthetic FUV and NUV photometry using the database of UV spectra generated by the International Ultraviolet Explorer (IUE). This database contains 63,755 spectra in the low dispersion mode (λ / δ λ ˜ 300) obtained during its 18-year lifetime. For stellar sources in the IUE database, we have selected spectra with high Signal-To-NoiseRatio (SNR) and computed FUV and NUV magnitudes using the GALEX transmission curves along with the conversion equations between flux and magnitudes provided by the mission. Besides, we have performed variability tests to determine whether the sources were variable (during the IUE observations). As a result, we have generated two different catalogues: one for non-variable stars and another one for variable sources. The former contains FUV and NUV magnitudes, while the latter gives the basic information and the FUV magnitude for each observation. The consistency of the magnitudes has been tested using White Dwarfs contained in both GALEX and IUE samples. The catalogues are available through the Centre des Donées Stellaires. The sources are distributed throughout the whole sky, with a special coverage of the Galactic plane.

  15. Data Mining to Chart the Arctic: Analysis of Approaches to Incorporate Outside Source Data into NOAA Office of Coast Survey Workflow

    NASA Astrophysics Data System (ADS)

    Rennoll, V.

    2016-02-01

    The National Centers for Environmental Information provide public access to a wealth of seafloor mapping data, both from National Ocean Service hydrographic surveys and outside source collections. Utilizing the outside source data to improve nautical charts created by the National Oceanic and Atmospheric Administration (NOAA) is an appealing alternative to traditional surveys, largely in areas with significant data gaps where hydrographic surveys are not planned. However, much of the outside data are collected in transit lines and lack traditional overlapping main scheme lines and crosslines. Spanning multiple years and vessels, these transit line data collections were obtained using disparate operating procedures and have inconsistent qualities. Here, a workflow was developed to ingest these variable depth data within a defined region by assessing their quality and utility for nautical charting. The workflow was evaluated with a navigationally significant area in the Bering Sea, where bathymetric data collected from ten vessels over a period of twelve years were available. The outside data were shown to be of sufficient quality through comparisons with existing NOAA surveys and then used to demonstrate where the data could provide new or updated information on nautical charts, and provide reconnaissance for future hydrographic planning. The utility assessment of the data, however, was hindered by lack of a verified survey-scale sounding database, against which the outside source data could be compared. Having developed the workflow, it is recommended that further outside data is ingested by NOAA's Office of Coast Survey and that a database is developed with full-scale chart soundings for outside data comparisons.

  16. An online database for informing ecological network models: http://kelpforest.ucsc.edu.

    PubMed

    Beas-Luna, Rodrigo; Novak, Mark; Carr, Mark H; Tinker, Martin T; Black, August; Caselle, Jennifer E; Hoban, Michael; Malone, Dan; Iles, Alison

    2014-01-01

    Ecological network models and analyses are recognized as valuable tools for understanding the dynamics and resiliency of ecosystems, and for informing ecosystem-based approaches to management. However, few databases exist that can provide the life history, demographic and species interaction information necessary to parameterize ecological network models. Faced with the difficulty of synthesizing the information required to construct models for kelp forest ecosystems along the West Coast of North America, we developed an online database (http://kelpforest.ucsc.edu/) to facilitate the collation and dissemination of such information. Many of the database's attributes are novel yet the structure is applicable and adaptable to other ecosystem modeling efforts. Information for each taxonomic unit includes stage-specific life history, demography, and body-size allometries. Species interactions include trophic, competitive, facilitative, and parasitic forms. Each data entry is temporally and spatially explicit. The online data entry interface allows researchers anywhere to contribute and access information. Quality control is facilitated by attributing each entry to unique contributor identities and source citations. The database has proven useful as an archive of species and ecosystem-specific information in the development of several ecological network models, for informing management actions, and for education purposes (e.g., undergraduate and graduate training). To facilitate adaptation of the database by other researches for other ecosystems, the code and technical details on how to customize this database and apply it to other ecosystems are freely available and located at the following link (https://github.com/kelpforest-cameo/databaseui).

  17. An Online Database for Informing Ecological Network Models: http://kelpforest.ucsc.edu

    PubMed Central

    Beas-Luna, Rodrigo; Novak, Mark; Carr, Mark H.; Tinker, Martin T.; Black, August; Caselle, Jennifer E.; Hoban, Michael; Malone, Dan; Iles, Alison

    2014-01-01

    Ecological network models and analyses are recognized as valuable tools for understanding the dynamics and resiliency of ecosystems, and for informing ecosystem-based approaches to management. However, few databases exist that can provide the life history, demographic and species interaction information necessary to parameterize ecological network models. Faced with the difficulty of synthesizing the information required to construct models for kelp forest ecosystems along the West Coast of North America, we developed an online database (http://kelpforest.ucsc.edu/) to facilitate the collation and dissemination of such information. Many of the database's attributes are novel yet the structure is applicable and adaptable to other ecosystem modeling efforts. Information for each taxonomic unit includes stage-specific life history, demography, and body-size allometries. Species interactions include trophic, competitive, facilitative, and parasitic forms. Each data entry is temporally and spatially explicit. The online data entry interface allows researchers anywhere to contribute and access information. Quality control is facilitated by attributing each entry to unique contributor identities and source citations. The database has proven useful as an archive of species and ecosystem-specific information in the development of several ecological network models, for informing management actions, and for education purposes (e.g., undergraduate and graduate training). To facilitate adaptation of the database by other researches for other ecosystems, the code and technical details on how to customize this database and apply it to other ecosystems are freely available and located at the following link (https://github.com/kelpforest-cameo/databaseui). PMID:25343723

  18. An online database for informing ecological network models: http://kelpforest.ucsc.edu

    USGS Publications Warehouse

    Beas-Luna, Rodrigo; Tinker, M. Tim; Novak, Mark; Carr, Mark H.; Black, August; Caselle, Jennifer E.; Hoban, Michael; Malone, Dan; Iles, Alison C.

    2014-01-01

    Ecological network models and analyses are recognized as valuable tools for understanding the dynamics and resiliency of ecosystems, and for informing ecosystem-based approaches to management. However, few databases exist that can provide the life history, demographic and species interaction information necessary to parameterize ecological network models. Faced with the difficulty of synthesizing the information required to construct models for kelp forest ecosystems along the West Coast of North America, we developed an online database (http://kelpforest.ucsc.edu/) to facilitate the collation and dissemination of such information. Many of the database's attributes are novel yet the structure is applicable and adaptable to other ecosystem modeling efforts. Information for each taxonomic unit includes stage-specific life history, demography, and body-size allometries. Species interactions include trophic, competitive, facilitative, and parasitic forms. Each data entry is temporally and spatially explicit. The online data entry interface allows researchers anywhere to contribute and access information. Quality control is facilitated by attributing each entry to unique contributor identities and source citations. The database has proven useful as an archive of species and ecosystem-specific information in the development of several ecological network models, for informing management actions, and for education purposes (e.g., undergraduate and graduate training). To facilitate adaptation of the database by other researches for other ecosystems, the code and technical details on how to customize this database and apply it to other ecosystems are freely available and located at the following link (https://github.com/kelpforest-cameo/data​baseui).

  19. Tephrabase: A tephrochronological data

    NASA Astrophysics Data System (ADS)

    Newton, Anthony

    2015-04-01

    Development of Tephrabase, a tephrochronological database,, began over 20 years ago and was it launched in June 1995 as one of the earliest scientific databases on the web. Tephrabase was designed from the start to include a wide range of tephrochronological data including location, depth of the layer, geochemical composition (major to trace elements), physical properties (colour, grainsize, and mineral components), dating (both absolute/historical and radiometric), details of eruptions and the history of volcanic centres, as well as a reference database. Currently, Tephrabase contains details of over 1000 sites where tephra layers have been found, 3500 tephra layers, 3500 geochemical analyses and 2500 references. Tephrabase was originally developed to include tephra layers in Iceland and those of Icelandic origin found in NW Europe, it also now includes data on tephra layers from central Mexico and from the Laacher See eruption. The latter was developed as a supplement to the Iceland-centric nature of the rest of Tephrabase. A further extension to Tephrabase has seen the development of an automated method of producing tephra stratigraphic columns, calculating sediment accumulation rates between dated tephra layers in multiple profiles and mapping tephra layers across the landscape. Whilst Tephrabase has been successful and continues to be developed and updated, there are several issues which need to be. More tephrochronological databases need to be developed and these should allow connected/shared searches. This would provide worldwide coverage, but also the flexibility to develop spin off small-scale extensions, such as those described above. Data uploading needs to be improved and simplified. This includes the need to clarify issues of quality control. Again, a common standards led approach to this seems appropriate. Researchers also need to be encouraged to contribute data to these databases. Tephrabase was designed to include a variety of data, including physical properties and trace element compositions of the tephra layers. However, Tephrabase is conspicuous by not containing these data. Tephrabase and other databases need to include these. Tephra databases need to not only record details about tephra layers, but should also be tools to understand environmental change and understand volcanic histories. These can be achieved through development of databases themselves and through the creations of portals which draw data from multiple data sources.

  20. Status, upgrades, and advances of RTS2: the open source astronomical observatory manager

    NASA Astrophysics Data System (ADS)

    Kubánek, Petr

    2016-07-01

    RTS2 is an open source observatory control system. Being developed from early 2000, it continue to receive new features in last two years. RTS2 is a modulat, network-based distributed control system, featuring telescope drivers with advanced tracking and pointing capabilities, fast camera drivers and high level modules for "business logic" of the observatory, connected to a SQL database. Running on all continents of the planet, it accumulated a lot to control parts or full observatory setups.

  1. Database of potential sources for earthquakes larger than magnitude 6 in Northern California

    USGS Publications Warehouse

    ,

    1996-01-01

    The Northern California Earthquake Potential (NCEP) working group, composed of many contributors and reviewers in industry, academia and government, has pooled its collective expertise and knowledge of regional tectonics to identify potential sources of large earthquakes in northern California. We have created a map and database of active faults, both surficial and buried, that forms the basis for the northern California portion of the national map of probabilistic seismic hazard. The database contains 62 potential sources, including fault segments and areally distributed zones. The working group has integrated constraints from broadly based plate tectonic and VLBI models with local geologic slip rates, geodetic strain rate, and microseismicity. Our earthquake source database derives from a scientific consensus that accounts for conflict in the diverse data. Our preliminary product, as described in this report brings to light many gaps in the data, including a need for better information on the proportion of deformation in fault systems that is aseismic.

  2. Funding of Parkinson research from industry and US federal and foundation sources.

    PubMed

    Dorsey, E Ray; Thompson, Joel P; Frasier, Mark; Sherer, Todd; Fiske, Brian; Nicholson, Sean; Johnston, S Claiborne; Holloway, Robert G; Moses, Hamilton

    2009-04-15

    Funding for biomedical and neuroscience research has increased over the last decade but without a concomitant increase in new therapies. This study's objectives were to determine the level and principal sources of recent funding for Parkinson disease (PD) research and to determine the current state of PD drug development. We determined the level and principal sources of recent funding for PD research from the following sources: US federal agencies, large PD foundations based in the United States, and global industry. We assessed the status of PD drug development through the use of a proprietary drug pipeline database. Funding for PD research from the sources examined was approximately $1.1 billion in 2003 and $1.2 billion in 2005. Industry accounted for 77% of support from 2003 to 2005. The number of drugs in development for PD increased from 67 in 2003 to 97 in 2007. Of the companies with at least one compound in development for PD in 2007, most were small (62% had annual revenue of less than $100 million), and most (53%) were based outside the United States. These companies will likely require partnerships to drive successful development of new PD therapies.

  3. Use of large healthcare databases for rheumatology clinical research.

    PubMed

    Desai, Rishi J; Solomon, Daniel H

    2017-03-01

    Large healthcare databases, which contain data collected during routinely delivered healthcare to patients, can serve as a valuable resource for generating actionable evidence to assist medical and healthcare policy decision-making. In this review, we summarize use of large healthcare databases in rheumatology clinical research. Large healthcare data are critical to evaluate medication safety and effectiveness in patients with rheumatologic conditions. Three major sources of large healthcare data are: first, electronic medical records, second, health insurance claims, and third, patient registries. Each of these sources offers unique advantages, but also has some inherent limitations. To address some of these limitations and maximize the utility of these data sources for evidence generation, recent efforts have focused on linking different data sources. Innovations such as randomized registry trials, which aim to facilitate design of low-cost randomized controlled trials built on existing infrastructure provided by large healthcare databases, are likely to make clinical research more efficient in coming years. Harnessing the power of information contained in large healthcare databases, while paying close attention to their inherent limitations, is critical to generate a rigorous evidence-base for medical decision-making and ultimately enhancing patient care.

  4. TOWARD THE DEVELOPMENT OF A CONSENSUS MATERIALS DATABASE FOR PRESSURE TECHNOLGY APPLICATIONS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Swindeman, Robert W; Ren, Weiju

    The ASME construction code books specify materials and fabrication procedures that are acceptable for pressure technology applications. However, with few exceptions, the materials properties provided in the ASME code books provide no statistics or other information pertaining to material variability. Such information is central to the prediction and prevention of failure events. Many sources of materials data exist that provide variability information but such sources do not necessarily represent a consensus of experts with respect to the reported trends that are represented. Such a need has been identified by the ASME Standards Technology, LLC and initial steps have been takenmore » to address these needs: however, these steps are limited to project-specific applications only, such as the joint DOE-ASME project on materials for Generation IV nuclear reactors. In contrast to light-water reactor technology, the experience base for the Generation IV nuclear reactors is somewhat lacking and heavy reliance must be placed on model development and predictive capability. The database for model development is being assembled and includes existing code alloys such as alloy 800H and 9Cr-1Mo-V steel. Ownership and use rights are potential barriers that must be addressed.« less

  5. Teaching information literacy skills to sophomore-level biology majors.

    PubMed

    Thompson, Leigh; Blankinship, Lisa Ann

    2015-05-01

    Many undergraduate students lack a sound understanding of information literacy. The skills that comprise information literacy are particularly important when combined with scientific writing for biology majors as they are the foundation skills necessary to complete upper-division biology course assignments, better train students for research projects, and prepare students for graduate and professional education. To help undergraduate biology students develop and practice information literacy and scientific writing skills, a series of three one-hour hands-on library sessions, discussions, and homework assignments were developed for Biological Literature, a one-credit, one-hour-per-week, required sophomore-level course. The embedded course librarian developed a learning exercise that reviewed how to conduct database and web searches, the difference between primary and secondary sources, source credibility, and how to access articles through the university's databases. Students used the skills gained in the library training sessions for later writing assignments including a formal lab report and annotated bibliography. By focusing on improving information literacy skills as well as providing practice in scientific writing, Biological Literature students are better able to meet the rigors of upper-division biology courses and communicate research findings in a more professional manner.

  6. Teaching Information Literacy Skills to Sophomore-Level Biology Majors

    PubMed Central

    Thompson, Leigh; Blankinship, Lisa Ann

    2015-01-01

    Many undergraduate students lack a sound understanding of information literacy. The skills that comprise information literacy are particularly important when combined with scientific writing for biology majors as they are the foundation skills necessary to complete upper-division biology course assignments, better train students for research projects, and prepare students for graduate and professional education. To help undergraduate biology students develop and practice information literacy and scientific writing skills, a series of three one-hour hands-on library sessions, discussions, and homework assignments were developed for Biological Literature, a one-credit, one-hour-per-week, required sophomore-level course. The embedded course librarian developed a learning exercise that reviewed how to conduct database and web searches, the difference between primary and secondary sources, source credibility, and how to access articles through the university’s databases. Students used the skills gained in the library training sessions for later writing assignments including a formal lab report and annotated bibliography. By focusing on improving information literacy skills as well as providing practice in scientific writing, Biological Literature students are better able to meet the rigors of upper-division biology courses and communicate research findings in a more professional manner. PMID:25949754

  7. Infovigilance: reporting errors in official drug information sources.

    PubMed

    Fusier, Isabelle; Tollier, Corinne; Husson, Marie-Caroline

    2005-06-01

    The French drug database Thériaque (http://www.theriaque.org) developed by the (Centre National Hospitalier d'Information sur le Médicament) (CNHIM), is responsible for the dissemination of independent information about all drugs available in France. Each month the CNHIM pharmacists report problems due to inaccuracies in these sources to the French drug agency. In daily practice we devised the term "infovigilance": "Activity of error or inaccuracy notification in information sources which could be responsible for medication errors". The aim of this study was to evaluate the impact of CNHIM infovigilance on the contents of the Summary of Product Characteristics (SPCs). The study was a prospective study from 09/11/2001 to 31/12/2002. The problems related to the quality of information were classified into four types (inaccuracy/confusion, error/lack of information, discordance between SPC sections and discordance between generic SPCs). (1) Number of notifications and number of SPCs integrated into the database during the study period. (2) Percentage of notifications for each type: with or without potential patient impact, with or without later correction of the SPC, per section. 2.7% (85/3151) of SPCs integrated into the database were concerned by a notification of a problem. Notifications according to type of problem were inaccuracy/confusion (32%), error/lack of information (13%), discordance between SPC sections (27%) and discordance between generic SPCs (28%). 55% of problems were evaluated as 'likely to have an impact on the patient' and 45% as 'unlikely to have an impact on the patient'. 22 of problems which have been reported to the French drug agency were corrected and new updated SPCs were published with the corrections. Our efforts to improve the quality of drug information sources through a continuous "infovigilance" process need to be continued and extended to other information sources.

  8. Critical Care Health Informatics Collaborative (CCHIC): Data, tools and methods for reproducible research: A multi-centre UK intensive care database.

    PubMed

    Harris, Steve; Shi, Sinan; Brealey, David; MacCallum, Niall S; Denaxas, Spiros; Perez-Suarez, David; Ercole, Ari; Watkinson, Peter; Jones, Andrew; Ashworth, Simon; Beale, Richard; Young, Duncan; Brett, Stephen; Singer, Mervyn

    2018-04-01

    To build and curate a linkable multi-centre database of high resolution longitudinal electronic health records (EHR) from adult Intensive Care Units (ICU). To develop a set of open-source tools to make these data 'research ready' while protecting patient's privacy with a particular focus on anonymisation. We developed a scalable EHR processing pipeline for extracting, linking, normalising and curating and anonymising EHR data. Patient and public involvement was sought from the outset, and approval to hold these data was granted by the NHS Health Research Authority's Confidentiality Advisory Group (CAG). The data are held in a certified Data Safe Haven. We followed sustainable software development principles throughout, and defined and populated a common data model that links to other clinical areas. Longitudinal EHR data were loaded into the CCHIC database from eleven adult ICUs at 5 UK teaching hospitals. From January 2014 to January 2017, this amounted to 21,930 and admissions (18,074 unique patients). Typical admissions have 70 data-items pertaining to admission and discharge, and a median of 1030 (IQR 481-2335) time-varying measures. Training datasets were made available through virtual machine images emulating the data processing environment. An open source R package, cleanEHR, was developed and released that transforms the data into a square table readily analysable by most statistical packages. A simple language agnostic configuration file will allow the user to select and clean variables, and impute missing data. An audit trail makes clear the provenance of the data at all times. Making health care data available for research is problematic. CCHIC is a unique multi-centre longitudinal and linkable resource that prioritises patient privacy through the highest standards of data security, but also provides tools to clean, organise, and anonymise the data. We believe the development of such tools are essential if we are to meet the twin requirements of respecting patient privacy and working for patient benefit. The CCHIC database is now in use by health care researchers from academia and industry. The 'research ready' suite of data preparation tools have facilitated access, and linkage to national databases of secondary care is underway. Copyright © 2018 Elsevier B.V. All rights reserved.

  9. A Semantic Transformation Methodology for the Secondary Use of Observational Healthcare Data in Postmarketing Safety Studies.

    PubMed

    Pacaci, Anil; Gonul, Suat; Sinaci, A Anil; Yuksel, Mustafa; Laleci Erturkmen, Gokce B

    2018-01-01

    Background: Utilization of the available observational healthcare datasets is key to complement and strengthen the postmarketing safety studies. Use of common data models (CDM) is the predominant approach in order to enable large scale systematic analyses on disparate data models and vocabularies. Current CDM transformation practices depend on proprietarily developed Extract-Transform-Load (ETL) procedures, which require knowledge both on the semantics and technical characteristics of the source datasets and target CDM. Purpose: In this study, our aim is to develop a modular but coordinated transformation approach in order to separate semantic and technical steps of transformation processes, which do not have a strict separation in traditional ETL approaches. Such an approach would discretize the operations to extract data from source electronic health record systems, alignment of the source, and target models on the semantic level and the operations to populate target common data repositories. Approach: In order to separate the activities that are required to transform heterogeneous data sources to a target CDM, we introduce a semantic transformation approach composed of three steps: (1) transformation of source datasets to Resource Description Framework (RDF) format, (2) application of semantic conversion rules to get the data as instances of ontological model of the target CDM, and (3) population of repositories, which comply with the specifications of the CDM, by processing the RDF instances from step 2. The proposed approach has been implemented on real healthcare settings where Observational Medical Outcomes Partnership (OMOP) CDM has been chosen as the common data model and a comprehensive comparative analysis between the native and transformed data has been conducted. Results: Health records of ~1 million patients have been successfully transformed to an OMOP CDM based database from the source database. Descriptive statistics obtained from the source and target databases present analogous and consistent results. Discussion and Conclusion: Our method goes beyond the traditional ETL approaches by being more declarative and rigorous. Declarative because the use of RDF based mapping rules makes each mapping more transparent and understandable to humans while retaining logic-based computability. Rigorous because the mappings would be based on computer readable semantics which are amenable to validation through logic-based inference methods.

  10. The volatile compound BinBase mass spectral database.

    PubMed

    Skogerson, Kirsten; Wohlgemuth, Gert; Barupal, Dinesh K; Fiehn, Oliver

    2011-08-04

    Volatile compounds comprise diverse chemical groups with wide-ranging sources and functions. These compounds originate from major pathways of secondary metabolism in many organisms and play essential roles in chemical ecology in both plant and animal kingdoms. In past decades, sampling methods and instrumentation for the analysis of complex volatile mixtures have improved; however, design and implementation of database tools to process and store the complex datasets have lagged behind. The volatile compound BinBase (vocBinBase) is an automated peak annotation and database system developed for the analysis of GC-TOF-MS data derived from complex volatile mixtures. The vocBinBase DB is an extension of the previously reported metabolite BinBase software developed to track and identify derivatized metabolites. The BinBase algorithm uses deconvoluted spectra and peak metadata (retention index, unique ion, spectral similarity, peak signal-to-noise ratio, and peak purity) from the Leco ChromaTOF software, and annotates peaks using a multi-tiered filtering system with stringent thresholds. The vocBinBase algorithm assigns the identity of compounds existing in the database. Volatile compound assignments are supported by the Adams mass spectral-retention index library, which contains over 2,000 plant-derived volatile compounds. Novel molecules that are not found within vocBinBase are automatically added using strict mass spectral and experimental criteria. Users obtain fully annotated data sheets with quantitative information for all volatile compounds for studies that may consist of thousands of chromatograms. The vocBinBase database may also be queried across different studies, comprising currently 1,537 unique mass spectra generated from 1.7 million deconvoluted mass spectra of 3,435 samples (18 species). Mass spectra with retention indices and volatile profiles are available as free download under the CC-BY agreement (http://vocbinbase.fiehnlab.ucdavis.edu). The BinBase database algorithms have been successfully modified to allow for tracking and identification of volatile compounds in complex mixtures. The database is capable of annotating large datasets (hundreds to thousands of samples) and is well-suited for between-study comparisons such as chemotaxonomy investigations. This novel volatile compound database tool is applicable to research fields spanning chemical ecology to human health. The BinBase source code is freely available at http://binbase.sourceforge.net/ under the LGPL 2.0 license agreement.

  11. The volatile compound BinBase mass spectral database

    PubMed Central

    2011-01-01

    Background Volatile compounds comprise diverse chemical groups with wide-ranging sources and functions. These compounds originate from major pathways of secondary metabolism in many organisms and play essential roles in chemical ecology in both plant and animal kingdoms. In past decades, sampling methods and instrumentation for the analysis of complex volatile mixtures have improved; however, design and implementation of database tools to process and store the complex datasets have lagged behind. Description The volatile compound BinBase (vocBinBase) is an automated peak annotation and database system developed for the analysis of GC-TOF-MS data derived from complex volatile mixtures. The vocBinBase DB is an extension of the previously reported metabolite BinBase software developed to track and identify derivatized metabolites. The BinBase algorithm uses deconvoluted spectra and peak metadata (retention index, unique ion, spectral similarity, peak signal-to-noise ratio, and peak purity) from the Leco ChromaTOF software, and annotates peaks using a multi-tiered filtering system with stringent thresholds. The vocBinBase algorithm assigns the identity of compounds existing in the database. Volatile compound assignments are supported by the Adams mass spectral-retention index library, which contains over 2,000 plant-derived volatile compounds. Novel molecules that are not found within vocBinBase are automatically added using strict mass spectral and experimental criteria. Users obtain fully annotated data sheets with quantitative information for all volatile compounds for studies that may consist of thousands of chromatograms. The vocBinBase database may also be queried across different studies, comprising currently 1,537 unique mass spectra generated from 1.7 million deconvoluted mass spectra of 3,435 samples (18 species). Mass spectra with retention indices and volatile profiles are available as free download under the CC-BY agreement (http://vocbinbase.fiehnlab.ucdavis.edu). Conclusions The BinBase database algorithms have been successfully modified to allow for tracking and identification of volatile compounds in complex mixtures. The database is capable of annotating large datasets (hundreds to thousands of samples) and is well-suited for between-study comparisons such as chemotaxonomy investigations. This novel volatile compound database tool is applicable to research fields spanning chemical ecology to human health. The BinBase source code is freely available at http://binbase.sourceforge.net/ under the LGPL 2.0 license agreement. PMID:21816034

  12. Validity of breast, lung and colorectal cancer diagnoses in administrative databases: a systematic review protocol.

    PubMed

    Abraha, Iosief; Giovannini, Gianni; Serraino, Diego; Fusco, Mario; Montedori, Alessandro

    2016-03-18

    Breast, lung and colorectal cancers constitute the most common cancers worldwide and their epidemiology, related health outcomes and quality indicators can be studied using administrative healthcare databases. To constitute a reliable source for research, administrative healthcare databases need to be validated. The aim of this protocol is to perform the first systematic review of studies reporting the validation of International Classification of Diseases 9th and 10th revision codes to identify breast, lung and colorectal cancer diagnoses in administrative healthcare databases. This review protocol has been developed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocol (PRISMA-P) 2015 statement. We will search the following databases: MEDLINE, EMBASE, Web of Science and the Cochrane Library, using appropriate search strategies. We will include validation studies that used administrative data to identify breast, lung and colorectal cancer diagnoses or studies that evaluated the validity of breast, lung and colorectal cancer codes in administrative data. The following inclusion criteria will be used: (1) the presence of a reference standard case definition for the disease of interest; (2) the presence of at least one test measure (eg, sensitivity, positive predictive values, etc) and (3) the use of data source from an administrative database. Pairs of reviewers will independently abstract data using standardised forms and will assess quality using a checklist based on the Standards for Reporting of Diagnostic accuracy (STARD) criteria. Ethics approval is not required. We will submit results of this study to a peer-reviewed journal for publication. The results will serve as a guide to identify appropriate case definitions and algorithms of breast, lung and colorectal cancers for researchers involved in validating administrative healthcare databases as well as for outcome research on these conditions that used administrative healthcare databases. CRD42015026881. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  13. Enhancing Disaster Management: Development of a Spatial Database of Day Care Centers in the USA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singh, Nagendra; Tuttle, Mark A.; Bhaduri, Budhendra L.

    Children under the age of five constitute around 7% of the total U.S. population and represent a segment of the population, which is totally dependent on others for day-to-day activities. A significant proportion of this population spends time in some form of day care arrangement while their parents are away from home. Accounting for those children during emergencies is of high priority, which requires a broad understanding of the locations of such day care centers. As concentrations of at risk population, the spatial location of day care centers is critical for any type of emergency preparedness and response (EPR). However,more » until recently, the U.S. emergency preparedness and response community did not have access to a comprehensive spatial database of day care centers at the national scale. This paper describes an approach for the development of the first comprehensive spatial database of day care center locations throughout the USA utilizing a variety of data harvesting techniques to integrate information from widely disparate data sources followed by geolocating for spatial precision. In the context of disaster management, such spatially refined demographic databases hold tremendous potential for improving high resolution population distribution and dynamics models and databases.« less

  14. Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes.

    PubMed

    Winsor, Geoffrey L; Van Rossum, Thea; Lo, Raymond; Khaira, Bhavjinder; Whiteside, Matthew D; Hancock, Robert E W; Brinkman, Fiona S L

    2009-01-01

    Pseudomonas aeruginosa is a well-studied opportunistic pathogen that is particularly known for its intrinsic antimicrobial resistance, diverse metabolic capacity, and its ability to cause life threatening infections in cystic fibrosis patients. The Pseudomonas Genome Database (http://www.pseudomonas.com) was originally developed as a resource for peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome. In order to facilitate cross-strain and cross-species genome comparisons with other Pseudomonas species of importance, we have now expanded the database capabilities to include all Pseudomonas species, and have developed or incorporated methods to facilitate high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. A choice of simple and more flexible user-friendly Boolean search features allows researchers to search and compare annotations or sequences within or between genomes. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. This database aims to continue to provide a high quality, annotated genome resource for the research community and is available under an open source license.

  15. Enhancing Disaster Management: Development of a Spatial Database of Day Care Centers in the USA

    DOE PAGES

    Singh, Nagendra; Tuttle, Mark A.; Bhaduri, Budhendra L.

    2015-07-30

    Children under the age of five constitute around 7% of the total U.S. population and represent a segment of the population, which is totally dependent on others for day-to-day activities. A significant proportion of this population spends time in some form of day care arrangement while their parents are away from home. Accounting for those children during emergencies is of high priority, which requires a broad understanding of the locations of such day care centers. As concentrations of at risk population, the spatial location of day care centers is critical for any type of emergency preparedness and response (EPR). However,more » until recently, the U.S. emergency preparedness and response community did not have access to a comprehensive spatial database of day care centers at the national scale. This paper describes an approach for the development of the first comprehensive spatial database of day care center locations throughout the USA utilizing a variety of data harvesting techniques to integrate information from widely disparate data sources followed by geolocating for spatial precision. In the context of disaster management, such spatially refined demographic databases hold tremendous potential for improving high resolution population distribution and dynamics models and databases.« less

  16. An online database for IHN virus in Pacific Salmonid fish: MEAP-IHNV

    USGS Publications Warehouse

    Kurath, Gael

    2012-01-01

    The MEAP-IHNV database provides access to detailed data for anyone interested in IHNV molecular epidemiology, such as fish health professionals, fish culture facility managers, and academic researchers. The flexible search capabilities enable the user to generate various output formats, including tables and maps, which should assist users in developing and testing hypotheses about how IHNV moves across landscapes and changes over time. The MEAP-IHNV database is available online at http://gis.nacse.org/ihnv/ (fig. 1). The database contains records that provide background information and genetic sequencing data for more than 1,000 individual field isolates of the fish virus Infectious hematopoietic necrosis virus (IHNV), and is updated approximately annually. It focuses on IHNV isolates collected throughout western North America from 1966 to the present. The database also includes a small number of IHNV isolates from Eastern Russia. By engaging the expertise of the broader community of colleagues interested in IHNV, our goal is to enhance the overall understanding of IHNV epidemiology, including defining sources of disease outbreaks and viral emergence events, identifying virus traffic patterns and potential reservoirs, and understanding how human management of salmonid fish culture affects disease. Ultimately, this knowledge can be used to develop new strategies to reduce the effect of IHN disease in cultured and wild fish.

  17. International Database of Volcanic Ash Impacts

    NASA Astrophysics Data System (ADS)

    Wallace, K.; Cameron, C.; Wilson, T. M.; Jenkins, S.; Brown, S.; Leonard, G.; Deligne, N.; Stewart, C.

    2015-12-01

    Volcanic ash creates extensive impacts to people and property, yet we lack a global ash impacts catalog to organize, distribute, and archive this important information. Critical impact information is often stored in ephemeral news articles or other isolated resources, which cannot be queried or located easily. A global ash impacts database would improve 1) warning messages, 2) public and lifeline emergency preparation, and 3) eruption response and recovery. Ashfall can have varying consequences, such as disabling critical lifeline infrastructure (e.g. electrical generation and transmission, water supplies, telecommunications, aircraft and airports) or merely creating limited and expensive inconvenience to local communities. Impacts to the aviation sector can be a far-reaching global issue. The international volcanic ash impacts community formed a committee to develop a database to catalog the impacts of volcanic ash. We identify three user populations for this database: 1) research teams, who would use the database to assist in systematic collection, recording, and storage of ash impact data, and to prioritize impact assessment trips and lab experiments 2) volcanic risk assessment scientists who rely on impact data for assessments (especially vulnerability/fragility assessments); a complete dataset would have utility for global, regional, national and local scale risk assessments, and 3) citizen science volcanic hazard reporting. Publication of an international ash impacts database will encourage standardization and development of best practices for collecting and reporting impact information. Data entered will be highly categorized, searchable, and open source. Systematic cataloging of impact data will allow users to query the data and extract valuable information to aid in the development of improved emergency preparedness, response and recovery measures.

  18. Development of a 2001 National Land Cover Database for the United States

    USGS Publications Warehouse

    Homer, Collin G.; Huang, Chengquan; Yang, Limin; Wylie, Bruce K.; Coan, Michael

    2004-01-01

    Multi-Resolution Land Characterization 2001 (MRLC 2001) is a second-generation Federal consortium designed to create an updated pool of nation-wide Landsat 5 and 7 imagery and derive a second-generation National Land Cover Database (NLCD 2001). The objectives of this multi-layer, multi-source database are two fold: first, to provide consistent land cover for all 50 States, and second, to provide a data framework which allows flexibility in developing and applying each independent data component to a wide variety of other applications. Components in the database include the following: (1) normalized imagery for three time periods per path/row, (2) ancillary data, including a 30 m Digital Elevation Model (DEM) derived into slope, aspect and slope position, (3) perpixel estimates of percent imperviousness and percent tree canopy, (4) 29 classes of land cover data derived from the imagery, ancillary data, and derivatives, (5) classification rules, confidence estimates, and metadata from the land cover classification. This database is now being developed using a Mapping Zone approach, with 66 Zones in the continental United States and 23 Zones in Alaska. Results from three initial mapping Zones show single-pixel land cover accuracies ranging from 73 to 77 percent, imperviousness accuracies ranging from 83 to 91 percent, tree canopy accuracies ranging from 78 to 93 percent, and an estimated 50 percent increase in mapping efficiency over previous methods. The database has now entered the production phase and is being created using extensive partnering in the Federal government with planned completion by 2006.

  19. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.

    PubMed

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.

  20. Determining the Ambient Air Boundary for Potential Permit Application in Support of Alaska Industrial Development and Export Authority's Restart of Healy Clean Coal Project

    EPA Pesticide Factsheets

    This document may be of assistance in applying the New Source Review (NSR) air permitting regulations including the Prevention of Significant Deterioration (PSD) requirements. This document is part of the NSR Policy and Guidance Database. Some documents in the database are a scanned or retyped version of a paper photocopy of the original. Although we have taken considerable effort to quality assure the documents, some may contain typographical errors. Contact the office that issued the document if you need a copy of the original.

  1. The Ins and Outs of USDA Nutrient Composition

    USDA-ARS?s Scientific Manuscript database

    The USDA National Nutrient Database for Standard Reference (SR) is the major source of food composition data in the United States, providing the foundation for most food composition databases in the public and private sectors. Sources of data used in SR include analytical studies, food manufacturer...

  2. Filling Terrorism Gaps: VEOs, Evaluating Databases, and Applying Risk Terrain Modeling to Terrorism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hagan, Ross F.

    2016-08-29

    This paper aims to address three issues: the lack of literature differentiating terrorism and violent extremist organizations (VEOs), terrorism incident databases, and the applicability of Risk Terrain Modeling (RTM) to terrorism. Current open source literature and publicly available government sources do not differentiate between terrorism and VEOs; furthermore, they fail to define them. Addressing the lack of a comprehensive comparison of existing terrorism data sources, a matrix comparing a dozen terrorism databases is constructed, providing insight toward the array of data available. RTM, a method for spatial risk analysis at a micro level, has some applicability to terrorism research, particularlymore » for studies looking at risk indicators of terrorism. Leveraging attack data from multiple databases, combined with RTM, offers one avenue for closing existing research gaps in terrorism literature.« less

  3. The Binding Database: data management and interface design.

    PubMed

    Chen, Xi; Lin, Yuhmei; Liu, Ming; Gilson, Michael K

    2002-01-01

    The large and growing body of experimental data on biomolecular binding is of enormous value in developing a deeper understanding of molecular biology, in developing new therapeutics, and in various molecular design applications. However, most of these data are found only in the published literature and are therefore difficult to access and use. No existing public database has focused on measured binding affinities and has provided query capabilities that include chemical structure and sequence homology searches. We have created Binding DataBase (BindingDB), a public, web-accessible database of measured binding affinities. BindingDB is based upon a relational data specification for describing binding measurements via Isothermal Titration Calorimetry (ITC) and enzyme inhibition. A corresponding XML Document Type Definition (DTD) is used to create and parse intermediate files during the on-line deposition process and will also be used for data interchange, including collection of data from other sources. The on-line query interface, which is constructed with Java Servlet technology, supports standard SQL queries as well as searches for molecules by chemical structure and sequence homology. The on-line deposition interface uses Java Server Pages and JavaBean objects to generate dynamic HTML and to store intermediate results. The resulting data resource provides a range of functionality with brisk response-times, and lends itself well to continued development and enhancement.

  4. Investigating the Potential Impacts of Energy Production in the Marcellus Shale Region Using the Shale Network Database and CUAHSI-Supported Data Tools

    NASA Astrophysics Data System (ADS)

    Brazil, L.

    2017-12-01

    The Shale Network's extensive database of water quality observations enables educational experiences about the potential impacts of resource extraction with real data. Through open source tools that are developed and maintained by the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI), researchers, educators, and citizens can access and analyze the very same data that the Shale Network team has used in peer-reviewed publications about the potential impacts of hydraulic fracturing on water. The development of the Shale Network database has been made possible through collection efforts led by an academic team and involving numerous individuals from government agencies, citizen science organizations, and private industry. Thus far, CUAHSI-supported data tools have been used to engage high school students, university undergraduate and graduate students, as well as citizens so that all can discover how energy production impacts the Marcellus Shale region, which includes Pennsylvania and other nearby states. This presentation will describe these data tools, how the Shale Network has used them in developing educational material, and the resources available to learn more.

  5. NASA gateway requirements analysis

    NASA Technical Reports Server (NTRS)

    Duncan, Denise R.; Doby, John S.; Shockley, Cynthia W.

    1991-01-01

    NASA devotes approximately 40 percent of its budget to R&D. Twelve NASA Research Centers and their contractors conduct this R&D, which ranges across many disciplines and is fueled by information about previous endeavors. Locating the right information is crucial. While NASA researchers use peer contacts as their primary source of scientific and technical information (STI), on-line bibliographic data bases - both Government-owned and commercial - are also frequently consulted. Once identified, the STI must be delivered in a usable format. This report assesses the appropriateness of developing an intelligent gateway interface for the NASA R&D community as a means of obtaining improved access to relevant STI resources outside of NASA's Remote Console (RECON) on-line bibliographic database. A study was conducted to determine (1) the information requirements of the R&D community, (2) the information sources to meet those requirements, and (3) ways of facilitating access to those information sources. Findings indicate that NASA researchers need more comprehensive STI coverage of disciplines not now represented in the RECON database. This augmented subject coverage should preferably be provided by both domestic and foreign STI sources. It was also found that NASA researchers frequently request rapid delivery of STI, in its original format. Finally, it was found that researchers need a better system for alerting them to recent developments in their areas of interest. A gateway that provides access to domestic and international information sources can also solve several shortcomings in the present STI delivery system. NASA should further test the practicality of a gateway as a mechanism for improved STI access.

  6. Experiment Document for 01-E077 Microgravity Investigation of Crew Reactions in 0-G (MICRO-G)

    NASA Technical Reports Server (NTRS)

    Newman, Dava J.

    2003-01-01

    The Experiment Document (ED) serves the following purposes: a) It provides a vehicle for Principal Investigators (PIS) to formally specify the requirements for performing their experiments. b) It provides a technical Statement of Work (SOW). c) It provides experiment investigators and hardware developers with a convenient source of information about Human Life Sciences (HLS) requirements for the development and/or integration of flight experiment projects. d) It is the primary source of experiment specifications for the HLS Research Program Office (RPO). Inputs from this document will be placed into a controlled database that will be used to generate other documents.

  7. Medical libraries, bioinformatics, and networked information: a coming convergence?

    PubMed

    Lynch, C

    1999-10-01

    Libraries will be changed by technological and social developments that are fueled by information technology, bioinformatics, and networked information. Libraries in highly focused settings such as the health sciences are at a pivotal point in their development as the synthesis of historically diverse and independent information sources transforms health care institutions. Boundaries are breaking down between published literature and research data, between research databases and clinical patient data, and between consumer health information and professional literature. This paper focuses on the dynamics that are occurring with networked information sources and the roles that libraries will need to play in the world of medical informatics in the early twenty-first century.

  8. Reliability and validity assessment of administrative databases in measuring the quality of rectal cancer management.

    PubMed

    Corbellini, Carlo; Andreoni, Bruno; Ansaloni, Luca; Sgroi, Giovanni; Martinotti, Mario; Scandroglio, Ildo; Carzaniga, Pierluigi; Longoni, Mauro; Foschi, Diego; Dionigi, Paolo; Morandi, Eugenio; Agnello, Mauro

    2018-01-01

    Measurement and monitoring of the quality of care using a core set of quality measures are increasing in health service research. Although administrative databases include limited clinical data, they offer an attractive source for quality measurement. The purpose of this study, therefore, was to evaluate the completeness of different administrative data sources compared to a clinical survey in evaluating rectal cancer cases. Between May 2012 and November 2014, a clinical survey was done on 498 Lombardy patients who had rectal cancer and underwent surgical resection. These collected data were compared with the information extracted from administrative sources including Hospital Discharge Dataset, drug database, daycare activity data, fee-exemption database, and regional screening program database. The agreement evaluation was performed using a set of 12 quality indicators. Patient complexity was a difficult indicator to measure for lack of clinical data. Preoperative staging was another suboptimal indicator due to the frequent missing administrative registration of tests performed. The agreement between the 2 data sources regarding chemoradiotherapy treatments was high. Screening detection, minimally invasive techniques, length of stay, and unpreventable readmissions were detected as reliable quality indicators. Postoperative morbidity could be a useful indicator but its agreement was lower, as expected. Healthcare administrative databases are large and real-time collected repositories of data useful in measuring quality in a healthcare system. Our investigation reveals that the reliability of indicators varies between them. Ideally, a combination of data from both sources could be used in order to improve usefulness of less reliable indicators.

  9. Loss estimation and damage forecast using database provided

    NASA Astrophysics Data System (ADS)

    Pyrchenko, V.; Byrova, V.; Petrasov, A.

    2009-04-01

    There is a wide spectrum of development of natural hazards is observed in Russian territory. It the necessity of investigation of numerous events of dangerous natural processes, researches of mechanisms of their development and interaction with each other (synergetic amplification or new hazards emerging) with the purpose of the forecast of possible losses. Employees of Laboratory of the analysis of geological risk IEG RAS have created a database about displays of natural hazards in territory of Russia, which contains the information on 1310 cases of their display during 1991 - 2008. The wide range of the used sources has determined certain difficulties in creation of Database and has demanded to develop a special new technique of unification of the information received at different times. One of points of this technique is classification of negative consequences of display of the natural hazards, considering a death-roll, wounded mans, victims and direct economic damage. This Database has allowed to track dynamics of natural hazards and the emergency situations caused by them (ES) for the considered period, and also to define laws of their development in territory of Russia in time and space. It gives the chance to create theoretical, methodological and methodical bases of forecasting of possible losses with a certain degree of probability for territory of Russia and for its separate regions that guarantees in the future maintenance of adequate, operative and efficient pre-emptive decision-making.

  10. Schizophrenia and the Family.

    ERIC Educational Resources Information Center

    Cook, Barbara J.

    This document addresses the problems faced by families with a schizophrenic member, based on a survey of the literature from a variety of sources including: (1) the Alden Library of Ohio University; (2) the ERIC database; (3) the American Association of Counseling and Development (AACD); and (4) the National Alliance for the Mentally Ill (NAMI).…

  11. Development of an Expanded, High Reliability Cost and Performance Database for In Situ Remediation Technologies

    DTIC Science & Technology

    2016-03-01

    Tinker DRA-3 Chem. Ox. Potassium permanganate 10 2.2 Advantages and Limitations Potential advantages and disadvantages of our dataset, and...Washington DC. Thomson, N.R., E.D. Hood, and G.J. Farquhar, 2007. “ Permanganate Treatment of an Emplaced DNAPL Source,” Ground Water Monitoring

  12. Teaching Undergraduate Software Engineering Using Open Source Development Tools

    DTIC Science & Technology

    2012-01-01

    ware. Some example appliances are: a LAMP stack, Redmine, MySQL database, Moodle, Tom- cat on Apache, and Bugzilla. Some of the important features...Ada, C, C++, PHP , Py- thon, etc., and also supports a wide range of SDKs such as Google’s Android SDK and the Google Web Toolkit SDK. Additionally

  13. Data, Data Everywhere--Not a Report in Sight!

    ERIC Educational Resources Information Center

    Norman, Wendy

    2003-01-01

    Presents six steps of data warehouse development that result in valuable, long-term reporting solutions, discussing how to choose the right reporting vehicle. The six steps are: defining one's needs; mapping the source for each element; extracting the data; cleaning and verifying the data; moving the data into a relational database; and developing…

  14. Cyberinfrastructure for the Unified Study of Earth Structure and Earthquake Sources in Complex Geologic Environments

    NASA Astrophysics Data System (ADS)

    Zhao, L.; Chen, P.; Jordan, T. H.; Olsen, K. B.; Maechling, P.; Faerman, M.

    2004-12-01

    The Southern California Earthquake Center (SCEC) is developing a Community Modeling Environment (CME) to facilitate the computational pathways of physics-based seismic hazard analysis (Maechling et al., this meeting). Major goals are to facilitate the forward modeling of seismic wavefields in complex geologic environments, including the strong ground motions that cause earthquake damage, and the inversion of observed waveform data for improved models of Earth structure and fault rupture. Here we report on a unified approach to these coupled inverse problems that is based on the ability to generate and manipulate wavefields in densely gridded 3D Earth models. A main element of this approach is a database of receiver Green tensors (RGT) for the seismic stations, which comprises all of the spatial-temporal displacement fields produced by the three orthogonal unit impulsive point forces acting at each of the station locations. Once the RGT database is established, synthetic seismograms for any earthquake can be simply calculated by extracting a small, source-centered volume of the RGT from the database and applying the reciprocity principle. The partial derivatives needed for point- and finite-source inversions can be generated in the same way. Moreover, the RGT database can be employed in full-wave tomographic inversions launched from a 3D starting model, because the sensitivity (Fréchet) kernels for travel-time and amplitude anomalies observed at seismic stations in the database can be computed by convolving the earthquake-induced displacement field with the station RGTs. We illustrate all elements of this unified analysis with an RGT database for 33 stations of the California Integrated Seismic Network in and around the Los Angeles Basin, which we computed for the 3D SCEC Community Velocity Model (SCEC CVM3.0) using a fourth-order staggered-grid finite-difference code. For a spatial grid spacing of 200 m and a time resolution of 10 ms, the calculations took ~19,000 node-hours on the Linux cluster at USC's High-Performance Computing Center. The 33-station database with a volume of ~23.5 TB was archived in the SCEC digital library at the San Diego Supercomputer Center using the Storage Resource Broker (SRB). From a laptop, anyone with access to this SRB collection can compute synthetic seismograms for an arbitrary source in the CVM in a matter of minutes. Efficient approaches have been implemented to use this RGT database in the inversions of waveforms for centroid and finite moment tensors and tomographic inversions to improve the CVM. Our experience with these large problems suggests areas where the cyberinfrastructure currently available for geoscience computation needs to be improved.

  15. [Effects of soil data and map scale on assessment of total phosphorus storage in upland soils.

    PubMed

    Li, Heng Rong; Zhang, Li Ming; Li, Xiao di; Yu, Dong Sheng; Shi, Xue Zheng; Xing, Shi He; Chen, Han Yue

    2016-06-01

    Accurate assessment of total phosphorus storage in farmland soils is of great significance to sustainable agricultural and non-point source pollution control. However, previous studies haven't considered the estimation errors from mapping scales and various databases with different sources of soil profile data. In this study, a total of 393×10 4 hm 2 of upland in the 29 counties (or cities) of North Jiangsu was cited as a case for study. Analysis was performed of how the four sources of soil profile data, namely, "Soils of County", "Soils of Prefecture", "Soils of Province" and "Soils of China", and the six scales, i.e. 1:50000, 1:250000, 1:500000, 1:1000000, 1:4000000 and1:10000000, used in the 24 soil databases established for the four soil journals, affected assessment of soil total phosphorus. Compared with the most detailed 1:50000 soil database established with 983 upland soil profiles, relative deviation of the estimates of soil total phosphorus density (STPD) and soil total phosphorus storage (STPS) from the other soil databases varied from 4.8% to 48.9% and from 1.6% to 48.4%, respectively. The estimated STPD and STPS based on the 1:50000 database of "Soils of County" and most of the estimates based on the databases of each scale in "Soils of County" and "Soils of Prefecture" were different, with the significance levels of P<0.001 or P<0.05. Extremely significant differences (P<0.001) existed between the estimates based on the 1:50000 database of "Soils of County" and the estimates based on the databases of each scale in "Soils of Province" and "Soils of China". This study demonstrated the significance of appropriate soil data sources and appropriate mapping scales in estimating STPS.

  16. WE-D-9A-06: Open Source Monitor Calibration and Quality Control Software for Enterprise Display Management

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bevins, N; Vanderhoek, M; Lang, S

    2014-06-15

    Purpose: Medical display monitor calibration and quality control present challenges to medical physicists. The purpose of this work is to demonstrate and share experiences with an open source package that allows for both initial monitor setup and routine performance evaluation. Methods: A software package, pacsDisplay, has been developed over the last decade to aid in the calibration of all monitors within the radiology group in our health system. The software is used to calibrate monitors to follow the DICOM Grayscale Standard Display Function (GSDF) via lookup tables installed on the workstation. Additional functionality facilitates periodic evaluations of both primary andmore » secondary medical monitors to ensure satisfactory performance. This software is installed on all radiology workstations, and can also be run as a stand-alone tool from a USB disk. Recently, a database has been developed to store and centralize the monitor performance data and to provide long-term trends for compliance with internal standards and various accrediting organizations. Results: Implementation and utilization of pacsDisplay has resulted in improved monitor performance across the health system. Monitor testing is now performed at regular intervals and the software is being used across multiple imaging modalities. Monitor performance characteristics such as maximum and minimum luminance, ambient luminance and illuminance, color tracking, and GSDF conformity are loaded into a centralized database for system performance comparisons. Compliance reports for organizations such as MQSA, ACR, and TJC are generated automatically and stored in the same database. Conclusion: An open source software solution has simplified and improved the standardization of displays within our health system. This work serves as an example method for calibrating and testing monitors within an enterprise health system.« less

  17. New extension software modules to enhance searching and display of transcriptome data in Tripal databases

    PubMed Central

    Chen, Ming; Henry, Nathan; Almsaeed, Abdullah; Zhou, Xiao; Wegrzyn, Jill; Ficklin, Stephen

    2017-01-01

    Abstract Tripal is an open source software package for developing biological databases with a focus on genetic and genomic data. It consists of a set of core modules that deliver essential functions for loading and displaying data records and associated attributes including organisms, sequence features and genetic markers. Beyond the core modules, community members are encouraged to contribute extension modules to build on the Tripal core and to customize Tripal for individual community needs. To expand the utility of the Tripal software system, particularly for RNASeq data, we developed two new extension modules. Tripal Elasticsearch enables fast, scalable searching of the entire content of a Tripal site as well as the construction of customized advanced searches of specific data types. We demonstrate the use of this module for searching assembled transcripts by functional annotation. A second module, Tripal Analysis Expression, houses and displays records from gene expression assays such as RNA sequencing. This includes biological source materials (biomaterials), gene expression values and protocols used to generate the data. In the case of an RNASeq experiment, this would reflect the individual organisms and tissues used to produce sequencing libraries, the normalized gene expression values derived from the RNASeq data analysis and a description of the software or code used to generate the expression values. The module will load data from common flat file formats including standard NCBI Biosample XML. Data loading, display options and other configurations can be controlled by authorized users in the Drupal administrative backend. Both modules are open source, include usage documentation, and can be found in the Tripal organization’s GitHub repository. Database URL: Tripal Elasticsearch module: https://github.com/tripal/tripal_elasticsearch Tripal Analysis Expression module: https://github.com/tripal/tripal_analysis_expression PMID:29220446

  18. A genotypic and phenotypic information source for marker-assisted selection of cereals: the CEREALAB database

    PubMed Central

    Milc, Justyna; Sala, Antonio; Bergamaschi, Sonia; Pecchioni, Nicola

    2011-01-01

    The CEREALAB database aims to store genotypic and phenotypic data obtained by the CEREALAB project and to integrate them with already existing data sources in order to create a tool for plant breeders and geneticists. The database can help them in unravelling the genetics of economically important phenotypic traits; in identifying and choosing molecular markers associated to key traits; and in choosing the desired parentals for breeding programs. The database is divided into three sub-schemas corresponding to the species of interest: wheat, barley and rice; each sub-schema is then divided into two sub-ontologies, regarding genotypic and phenotypic data, respectively. Database URL: http://www.cerealab.unimore.it/jws/cerealab.jnlp PMID:21247929

  19. DEPOT: A Database of Environmental Parameters, Organizations and Tools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    CARSON,SUSAN D.; HUNTER,REGINA LEE; MALCZYNSKI,LEONARD A.

    2000-12-19

    The Database of Environmental Parameters, Organizations, and Tools (DEPOT) has been developed by the Department of Energy (DOE) as a central warehouse for access to data essential for environmental risk assessment analyses. Initial efforts have concentrated on groundwater and vadose zone transport data and bioaccumulation factors. DEPOT seeks to provide a source of referenced data that, wherever possible, includes the level of uncertainty associated with these parameters. Based on the amount of data available for a particular parameter, uncertainty is expressed as a standard deviation or a distribution function. DEPOT also provides DOE site-specific performance assessment data, pathway-specific transport data,more » and links to environmental regulations, disposal site waste acceptance criteria, other environmental parameter databases, and environmental risk assessment models.« less

  20. Analysis of commercial and public bioactivity databases.

    PubMed

    Tiikkainen, Pekka; Franke, Lutz

    2012-02-27

    Activity data for small molecules are invaluable in chemoinformatics. Various bioactivity databases exist containing detailed information of target proteins and quantitative binding data for small molecules extracted from journals and patents. In the current work, we have merged several public and commercial bioactivity databases into one bioactivity metabase. The molecular presentation, target information, and activity data of the vendor databases were standardized. The main motivation of the work was to create a single relational database which allows fast and simple data retrieval by in-house scientists. Second, we wanted to know the amount of overlap between databases by commercial and public vendors to see whether the former contain data complementing the latter. Third, we quantified the degree of inconsistency between data sources by comparing data points derived from the same scientific article cited by more than one vendor. We found that each data source contains unique data which is due to different scientific articles cited by the vendors. When comparing data derived from the same article we found that inconsistencies between the vendors are common. In conclusion, using databases of different vendors is still useful since the data overlap is not complete. It should be noted that this can be partially explained by the inconsistencies and errors in the source data.

  1. EPA’s SPECIATE 4.4 Database: Bridging Data Sources and Data Users

    EPA Science Inventory

    SPECIATE is the U.S. Environmental Protection Agency's (EPA)repository of volatile organic gas and particulate matter (PM) speciation profiles for air pollution sources. EPA released SPECIATE 4.4 in early 2014 and, in total, the SPECIATE 4.4 database includes 5,728 PM, VOC, total...

  2. Inside a VAMDC data node—putting standards into practical software

    NASA Astrophysics Data System (ADS)

    Regandell, Samuel; Marquart, Thomas; Piskunov, Nikolai

    2018-03-01

    Access to molecular and atomic data is critical for many forms of remote sensing analysis across different fields. Many atomic and molecular databases are however highly specialised for their intended application, complicating querying and combination data between sources. The Virtual Atomic and Molecular Data Centre, VAMDC, is an electronic infrastructure that allows each database to register as a ‘node’. Through services such as VAMDC’s portal website, users can then access and query all nodes in a homogenised way. Today all major Atomic and Molecular databases are attached to VAMDC This article describes the software tools we developed to help data providers create and manage a VAMDC node. It gives an overview of the VAMDC infrastructure and of the various standards it uses. The article then discusses the development choices made and how the standards are implemented in practice. It concludes with a full example of implementing a VAMDC node using a real-life case as well as future plans for the node software.

  3. Information management systems for pharmacogenomics.

    PubMed

    Thallinger, Gerhard G; Trajanoski, Slave; Stocker, Gernot; Trajanoski, Zlatko

    2002-09-01

    The value of high-throughput genomic research is dramatically enhanced by association with key patient data. These data are generally available but of disparate quality and not typically directly associated. A system that could bring these disparate data sources into a common resource connected with functional genomic data would be tremendously advantageous. However, the integration of clinical and accurate interpretation of the generated functional genomic data requires the development of information management systems capable of effectively capturing the data as well as tools to make that data accessible to the laboratory scientist or to the clinician. In this review these challenges and current information technology solutions associated with the management, storage and analysis of high-throughput data are highlighted. It is suggested that the development of a pharmacogenomic data management system which integrates public and proprietary databases, clinical datasets, and data mining tools embedded in a high-performance computing environment should include the following components: parallel processing systems, storage technologies, network technologies, databases and database management systems (DBMS), and application services.

  4. Investigating the Potential Impacts of Energy Production in the Marcellus Shale Region Using the Shale Network Database

    NASA Astrophysics Data System (ADS)

    Brantley, S.; Brazil, L.

    2017-12-01

    The Shale Network's extensive database of water quality observations enables educational experiences about the potential impacts of resource extraction with real data. Through tools that are open source and free to use, researchers, educators, and citizens can access and analyze the very same data that the Shale Network team has used in peer-reviewed publications about the potential impacts of hydraulic fracturing on water. The development of the Shale Network database has been made possible through efforts led by an academic team and involving numerous individuals from government agencies, citizen science organizations, and private industry. Thus far, these tools and data have been used to engage high school students, university undergraduate and graduate students, as well as citizens so that all can discover how energy production impacts the Marcellus Shale region, which includes Pennsylvania and other nearby states. This presentation will describe these data tools, how the Shale Network has used them in developing lesson plans, and the resources available to learn more.

  5. Lessons Learned from Deploying an Analytical Task Management Database

    NASA Technical Reports Server (NTRS)

    O'Neil, Daniel A.; Welch, Clara; Arceneaux, Joshua; Bulgatz, Dennis; Hunt, Mitch; Young, Stephen

    2007-01-01

    Defining requirements, missions, technologies, and concepts for space exploration involves multiple levels of organizations, teams of people with complementary skills, and analytical models and simulations. Analytical activities range from filling a To-Be-Determined (TBD) in a requirement to creating animations and simulations of exploration missions. In a program as large as returning to the Moon, there are hundreds of simultaneous analysis activities. A way to manage and integrate efforts of this magnitude is to deploy a centralized database that provides the capability to define tasks, identify resources, describe products, schedule deliveries, and generate a variety of reports. This paper describes a web-accessible task management system and explains the lessons learned during the development and deployment of the database. Through the database, managers and team leaders can define tasks, establish review schedules, assign teams, link tasks to specific requirements, identify products, and link the task data records to external repositories that contain the products. Data filters and spreadsheet export utilities provide a powerful capability to create custom reports. Import utilities provide a means to populate the database from previously filled form files. Within a four month period, a small team analyzed requirements, developed a prototype, conducted multiple system demonstrations, and deployed a working system supporting hundreds of users across the aeros pace community. Open-source technologies and agile software development techniques, applied by a skilled team enabled this impressive achievement. Topics in the paper cover the web application technologies, agile software development, an overview of the system's functions and features, dealing with increasing scope, and deploying new versions of the system.

  6. InverPep: A database of invertebrate antimicrobial peptides.

    PubMed

    Gómez, Esteban A; Giraldo, Paula; Orduz, Sergio

    2017-03-01

    The aim of this work was to construct InverPep, a database specialised in experimentally validated antimicrobial peptides (AMPs) from invertebrates. AMP data contained in InverPep were manually curated from other databases and the scientific literature. MySQL was integrated with the development platform Laravel; this framework allows to integrate programming in PHP with HTML and was used to design the InverPep web page's interface. InverPep contains 18 separated fields, including InverPep code, phylum and species source, peptide name, sequence, peptide length, secondary structure, molar mass, charge, isoelectric point, hydrophobicity, Boman index, aliphatic index and percentage of hydrophobic amino acids. CALCAMPI, an algorithm to calculate the physicochemical properties of multiple peptides simultaneously, was programmed in PERL language. To date, InverPep contains 702 experimentally validated AMPs from invertebrate species. All of the peptides contain information associated with their source, physicochemical properties, secondary structure, biological activity and links to external literature. Most AMPs in InverPep have a length between 10 and 50 amino acids, a positive charge, a Boman index between 0 and 2 kcal/mol, and 30-50% hydrophobic amino acids. InverPep includes 33 AMPs not reported in other databases. Besides, CALCAMPI and statistical analysis of InverPep data is presented. The InverPep database is available in English and Spanish. InverPep is a useful database to study invertebrate AMPs and its information could be used for the design of new peptides. The user-friendly interface of InverPep and its information can be freely accessed via a web-based browser at http://ciencias.medellin.unal.edu.co/gruposdeinvestigacion/prospeccionydisenobiomoleculas/InverPep/public/home_en. Copyright © 2016 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.

  7. Global Inventory of Gas Geochemistry Data from Fossil Fuel, Microbial and Burning Sources, version 2017

    NASA Astrophysics Data System (ADS)

    Sherwood, Owen A.; Schwietzke, Stefan; Arling, Victoria A.; Etiope, Giuseppe

    2017-08-01

    The concentration of atmospheric methane (CH4) has more than doubled over the industrial era. To help constrain global and regional CH4 budgets, inverse (top-down) models incorporate data on the concentration and stable carbon (δ13C) and hydrogen (δ2H) isotopic ratios of atmospheric CH4. These models depend on accurate δ13C and δ2H end-member source signatures for each of the main emissions categories. Compared with meticulous measurement and calibration of isotopic CH4 in the atmosphere, there has been relatively less effort to characterize globally representative isotopic source signatures, particularly for fossil fuel sources. Most global CH4 budget models have so far relied on outdated source signature values derived from globally nonrepresentative data. To correct this deficiency, we present a comprehensive, globally representative end-member database of the δ13C and δ2H of CH4 from fossil fuel (conventional natural gas, shale gas, and coal), modern microbial (wetlands, rice paddies, ruminants, termites, and landfills and/or waste) and biomass burning sources. Gas molecular compositional data for fossil fuel categories are also included with the database. The database comprises 10 706 samples (8734 fossil fuel, 1972 non-fossil) from 190 published references. Mean (unweighted) δ13C signatures for fossil fuel CH4 are significantly lighter than values commonly used in CH4 budget models, thus highlighting potential underestimation of fossil fuel CH4 emissions in previous CH4 budget models. This living database will be updated every 2-3 years to provide the atmospheric modeling community with the most complete CH4 source signature data possible. Database digital object identifier (DOI): https://doi.org/10.15138/G3201T.

  8. Ensuring Safety of Navigation: A Three-Tiered Approach

    NASA Astrophysics Data System (ADS)

    Johnson, S. D.; Thompson, M.; Brazier, D.

    2014-12-01

    The primary responsibility of the Hydrographic Department at the Naval Oceanographic Office (NAVOCEANO) is to support US Navy surface and sub-surface Safety of Navigation (SoN) requirements. These requirements are interpreted, surveys are conducted, and accurate products are compiled and archived for future exploitation. For a number of years NAVOCEANO has employed a two-tiered data-basing structure to support SoN. The first tier (Data Warehouse, or DWH) provides access to the full-resolution sonar and lidar data. DWH preserves the original data such that any scale product can be built. The second tier (Digital Bathymetric Database - Variable resolution, or DBDB-V) served as the final archive for SoN chart scale, gridded products compiled from source bathymetry. DBDB-V has been incorporated into numerous DoD tactical decision aids and serves as the foundation bathymetry for ocean modeling. With the evolution of higher density survey systems and the addition of high-resolution gridded bathymetry product requirements, a two-tiered model did not provide an efficient solution for SoN. The two-tiered approach required scientists to exploit full-resolution data in order to build any higher resolution product. A new perspective on the archival and exploitation of source data was required. This new perspective has taken the form of a third tier, the Navigation Surface Database (NSDB). NSDB is an SQLite relational database populated with International Hydrographic Organization (IHO), S-102 compliant Bathymetric Attributed Grids (BAGs). BAGs archived within NSDB are developed at the highest resolution that the collection sensor system can support and contain nodal estimates for depth, uncertainty, separation values and metadata. Gridded surface analysis efforts culminate in the generation of the source resolution BAG files and their storage within NSDB. Exploitation of these resources eliminates the time and effort needed to re-grid and re-analyze native source file formats.

  9. Semi-automatic Data Integration using Karma

    NASA Astrophysics Data System (ADS)

    Garijo, D.; Kejriwal, M.; Pierce, S. A.; Houser, P. I. Q.; Peckham, S. D.; Stanko, Z.; Hardesty Lewis, D.; Gil, Y.; Pennington, D. D.; Knoblock, C.

    2017-12-01

    Data integration applications are ubiquitous in scientific disciplines. A state-of-the-art data integration system accepts both a set of data sources and a target ontology as input, and semi-automatically maps the data sources in terms of concepts and relationships in the target ontology. Mappings can be both complex and highly domain-specific. Once such a semantic model, expressing the mapping using community-wide standard, is acquired, the source data can be stored in a single repository or database using the semantics of the target ontology. However, acquiring the mapping is a labor-prone process, and state-of-the-art artificial intelligence systems are unable to fully automate the process using heuristics and algorithms alone. Instead, a more realistic goal is to develop adaptive tools that minimize user feedback (e.g., by offering good mapping recommendations), while at the same time making it intuitive and easy for the user to both correct errors and to define complex mappings. We present Karma, a data integration system that has been developed over multiple years in the information integration group at the Information Sciences Institute, a research institute at the University of Southern California's Viterbi School of Engineering. Karma is a state-of-the-art data integration tool that supports an interactive graphical user interface, and has been featured in multiple domains over the last five years, including geospatial, biological, humanities and bibliographic applications. Karma allows a user to import their own ontology and datasets using widely used formats such as RDF, XML, CSV and JSON, can be set up either locally or on a server, supports a native backend database for prototyping queries, and can even be seamlessly integrated into external computational pipelines, including those ingesting data via streaming data sources, Web APIs and SQL databases. We illustrate a Karma workflow at a conceptual level, along with a live demo, and show use cases of Karma specifically for the geosciences. In particular, we show how Karma can be used intuitively to obtain the mapping model between case study data sources and a publicly available and expressive target ontology that has been designed to capture a broad set of concepts in geoscience with standardized, easily searchable names.

  10. Ventilator-Related Adverse Events: A Taxonomy and Findings From 3 Incident Reporting Systems.

    PubMed

    Pham, Julius Cuong; Williams, Tamara L; Sparnon, Erin M; Cillie, Tam K; Scharen, Hilda F; Marella, William M

    2016-05-01

    In 2009, researchers from Johns Hopkins University's Armstrong Institute for Patient Safety and Quality; public agencies, including the FDA; and private partners, including the Emergency Care Research Institute and the University HealthSystem Consortium (UHC) Safety Intelligence Patient Safety Organization, sought to form a public-private partnership for the promotion of patient safety (P5S) to advance patient safety through voluntary partnerships. The study objective was to test the concept of the P5S to advance our understanding of safety issues related to ventilator events, to develop a common classification system for categorizing adverse events related to mechanical ventilators, and to perform a comparison of adverse events across different adverse event reporting systems. We performed a cross-sectional analysis of ventilator-related adverse events reported in 2012 from the following incident reporting systems: the Pennsylvania Patient Safety Authority's Patient Safety Reporting System, UHC's Safety Intelligence Patient Safety Organization database, and the FDA's Manufacturer and User Facility Device Experience database. Once each organization had its dataset of ventilator-related adverse events, reviewers read the narrative descriptions of each event and classified it according to the developed common taxonomy. A Pennsylvania Patient Safety Authority, FDA, and UHC search provided 252, 274, and 700 relevant reports, respectively. The 3 event types most commonly reported to the UHC and the Pennsylvania Patient Safety Authority's Patient Safety Reporting System databases were airway/breathing circuit issue, human factor issues, and ventilator malfunction events. The top 3 event types reported to the FDA were ventilator malfunction, power source issue, and alarm failure. Overall, we found that (1) through the development of a common taxonomy, adverse events from 3 reporting systems can be evaluated, (2) the types of events reported in each database were related to the purpose of the database and the source of the reports, resulting in significant differences in reported event categories across the 3 systems, and (3) a public-private collaboration for investigating ventilator-related adverse events under the P5S model is feasible. Copyright © 2016 by Daedalus Enterprises.

  11. Implementation of the CDC translational informatics platform--from genetic variants to the national Swedish Rheumatology Quality Register.

    PubMed

    Abugessaisa, Imad; Gomez-Cabrero, David; Snir, Omri; Lindblad, Staffan; Klareskog, Lars; Malmström, Vivianne; Tegnér, Jesper

    2013-04-02

    Sequencing of the human genome and the subsequent analyses have produced immense volumes of data. The technological advances have opened new windows into genomics beyond the DNA sequence. In parallel, clinical practice generate large amounts of data. This represents an underused data source that has much greater potential in translational research than is currently realized. This research aims at implementing a translational medicine informatics platform to integrate clinical data (disease diagnosis, diseases activity and treatment) of Rheumatoid Arthritis (RA) patients from Karolinska University Hospital and their research database (biobanks, genotype variants and serology) at the Center for Molecular Medicine, Karolinska Institutet. Requirements engineering methods were utilized to identify user requirements. Unified Modeling Language and data modeling methods were used to model the universe of discourse and data sources. Oracle11g were used as the database management system, and the clinical development center (CDC) was used as the application interface. Patient data were anonymized, and we employed authorization and security methods to protect the system. We developed a user requirement matrix, which provided a framework for evaluating three translation informatics systems. The implementation of the CDC successfully integrated biological research database (15172 DNA, serum and synovial samples, 1436 cell samples and 65 SNPs per patient) and clinical database (5652 clinical visit) for the cohort of 379 patients presents three profiles. Basic functionalities provided by the translational medicine platform are research data management, development of bioinformatics workflow and analysis, sub-cohort selection, and re-use of clinical data in research settings. Finally, the system allowed researchers to extract subsets of attributes from cohorts according to specific biological, clinical, or statistical features. Research and clinical database integration is a real challenge and a road-block in translational research. Through this research we addressed the challenges and demonstrated the usefulness of CDC. We adhered to ethical regulations pertaining to patient data, and we determined that the existing software solutions cannot meet the translational research needs at hand. We used RA as a test case since we have ample data on active and longitudinal cohort.

  12. Implementation of the CDC translational informatics platform - from genetic variants to the national Swedish Rheumatology Quality Register

    PubMed Central

    2013-01-01

    Background Sequencing of the human genome and the subsequent analyses have produced immense volumes of data. The technological advances have opened new windows into genomics beyond the DNA sequence. In parallel, clinical practice generate large amounts of data. This represents an underused data source that has much greater potential in translational research than is currently realized. This research aims at implementing a translational medicine informatics platform to integrate clinical data (disease diagnosis, diseases activity and treatment) of Rheumatoid Arthritis (RA) patients from Karolinska University Hospital and their research database (biobanks, genotype variants and serology) at the Center for Molecular Medicine, Karolinska Institutet. Methods Requirements engineering methods were utilized to identify user requirements. Unified Modeling Language and data modeling methods were used to model the universe of discourse and data sources. Oracle11g were used as the database management system, and the clinical development center (CDC) was used as the application interface. Patient data were anonymized, and we employed authorization and security methods to protect the system. Results We developed a user requirement matrix, which provided a framework for evaluating three translation informatics systems. The implementation of the CDC successfully integrated biological research database (15172 DNA, serum and synovial samples, 1436 cell samples and 65 SNPs per patient) and clinical database (5652 clinical visit) for the cohort of 379 patients presents three profiles. Basic functionalities provided by the translational medicine platform are research data management, development of bioinformatics workflow and analysis, sub-cohort selection, and re-use of clinical data in research settings. Finally, the system allowed researchers to extract subsets of attributes from cohorts according to specific biological, clinical, or statistical features. Conclusions Research and clinical database integration is a real challenge and a road-block in translational research. Through this research we addressed the challenges and demonstrated the usefulness of CDC. We adhered to ethical regulations pertaining to patient data, and we determined that the existing software solutions cannot meet the translational research needs at hand. We used RA as a test case since we have ample data on active and longitudinal cohort. PMID:23548156

  13. Collaborative data model and data base development for paleoenvironmental and archaeological domain using Semantic MediaWiki

    NASA Astrophysics Data System (ADS)

    Willmes, C.

    2017-12-01

    In the frame of the Collaborative Research Centre 806 (CRC 806) an interdisciplinary research project, that needs to manage data, information and knowledge from heterogeneous domains, such as archeology, cultural sciences, and the geosciences, a collaborative internal knowledge base system was developed. The system is based on the open source MediaWiki software, that is well known as the software that enables Wikipedia, for its facilitation of a web based collaborative knowledge and information management platform. This software is additionally enhanced with the Semantic MediaWiki (SMW) extension, that allows to store and manage structural data within the Wiki platform, as well as it facilitates complex query and API interfaces to the structured data stored in the SMW data base. Using an additional open source software called mobo, it is possible to improve the data model development process, as well as automated data imports, from small spreadsheets to large relational databases. Mobo is a command line tool that helps building and deploying SMW structure in an agile, Schema-Driven Development way, and allows to manage and collaboratively develop the data model formalizations, that are formalized in JSON-Schema format, using version control systems like git. The combination of a well equipped collaborative web platform facilitated by Mediawiki, the possibility to store and query structured data in this collaborative database provided by SMW, as well as the possibility for automated data import and data model development enabled by mobo, result in a powerful but flexible system to build and develop a collaborative knowledge base system. Furthermore, SMW allows the application of Semantic Web technology, the structured data can be exported into RDF, thus it is possible to set a triple-store including a SPARQL endpoint on top of the database. The JSON-Schema based data models, can be enhanced into JSON-LD, to facilitate and profit from the possibilities of Linked Data technology.

  14. PathCase-SB architecture and database design

    PubMed Central

    2011-01-01

    Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889

  15. Mental health and psychiatry research in Brazil: scientific production from 1999 to 2003.

    PubMed

    Razzouk, Denise; Zorzetto, Ricardo; Dubugras, Maria Thereza; Gerolin, Jerônimo; Mari, Jair de Jesus

    2006-08-01

    To assess the extent of mental health scientific production in Brazil from 1999 to 2003, and to identify the nature of the publications generated, their sources of finance and the ways of publicly disseminating the research findings. Searches for publications were conducted in the Medline and PsychInfo databases for the period 1999-2003. A semi-structured questionnaire developed by an international team was applied to 626 mental health researchers, covering each interviewee's educational background, research experience, access to funding sources, public impact and research priorities. The sample was composed by 626 mental health researchers identified from 792 publications indexed on Medline and PsychInfo databases for the period above, and from a list of reviewers of Revista Brasileira de Psiquiatria. In Brazil, 792 publications were produced by 525 authors between 1999 and 2003 (441 indexed in Medline and 398 in the ISI database). The main topics were: depression (29.1%), substance misuse (14.6%), psychoses (10%), childhood disorders (7%) and dementia (6.7%). Among the 626 Brazilian mental health researchers, 329 answered the questionnaire. There were steadily increasing numbers of Brazilian articles on mental health published in foreign journals from 1999 to 2003: the number of articles in Medline tripled and it doubled in the ISI database. The content of these articles corresponded to the priorities within mental health, but there is a need for better interlinking between researchers and mental health policymakers.

  16. A statistical analysis of the global historical volcanic fatalities record

    USGS Publications Warehouse

    Auker, Melanie Rose; Sparks, Robert Stephen John; Siebert, Lee; Crosweller, H. S.; Ewert, John W.

    2013-01-01

    A new database of volcanic fatalities is presented and analysed, covering the period 1600 to 2010 AD. Data are from four sources: the Smithsonian Institution, Witham (2005), CRED EM-DAT and Munich RE. The data were combined and formatted, with a weighted average fatality figure used where more than one source reports an event; the former two databases were weighted twice as strongly as the latter two. More fatal incidents are contained within our database than similar previous works; approximately 46% of the fatal incidents are listed in only one of the four sources, and fewer than 10% are in all four. 278,880 fatalities are recorded in the database, resultant from 533 fatal incidents. The fatality count is dominated by a handful of disasters, though the majority of fatal incidents have caused fewer than ten fatalities. Number and empirical probability of fatalities are broadly correlated with VEI, but are more strongly influenced by population density around volcanoes and the occurrence and extent of lahars (mudflows) and pyroclastic density currents, which have caused 50% of fatalities. Indonesia, the Philippines, and the West Indies dominate the spatial distribution of fatalities, and there is some negative correlation between regional development and number of fatalities. With the largest disasters removed, over 90% of fatalities occurred between 5 km and 30 km from volcanoes, though the most devastating eruptions impacted far beyond these distances. A new measure, the Volcano Fatality Index, is defined to explore temporal changes in societal vulnerability to volcanic hazards. The measure incorporates population growth and recording improvements with the fatality data, and shows prima facie evidence that vulnerability to volcanic hazards has fallen during the last two centuries. Results and interpretations are limited in scope by the underlying fatalities data, which are affected by under-recording, uncertainty, and bias. Attempts have been made to estimate the extent of these issues, and to remove their effects where possible.The data analysed here are provided as supplementary material. An updated version of the Smithsonian fatality database fully integrated with this database will be publicly available in the near future and subsequently incorporate new data.

  17. Integrated database for rapid mass movements in Norway

    NASA Astrophysics Data System (ADS)

    Jaedicke, C.; Lied, K.; Kronholm, K.

    2009-03-01

    Rapid gravitational slope mass movements include all kinds of short term relocation of geological material, snow or ice. Traditionally, information about such events is collected separately in different databases covering selected geographical regions and types of movement. In Norway the terrain is susceptible to all types of rapid gravitational slope mass movements ranging from single rocks hitting roads and houses to large snow avalanches and rock slides where entire mountainsides collapse into fjords creating flood waves and endangering large areas. In addition, quick clay slides occur in desalinated marine sediments in South Eastern and Mid Norway. For the authorities and inhabitants of endangered areas, the type of threat is of minor importance and mitigation measures have to consider several types of rapid mass movements simultaneously. An integrated national database for all types of rapid mass movements built around individual events has been established. Only three data entries are mandatory: time, location and type of movement. The remaining optional parameters enable recording of detailed information about the terrain, materials involved and damages caused. Pictures, movies and other documentation can be uploaded into the database. A web-based graphical user interface has been developed allowing new events to be entered, as well as editing and querying for all events. An integration of the database into a GIS system is currently under development. Datasets from various national sources like the road authorities and the Geological Survey of Norway were imported into the database. Today, the database contains 33 000 rapid mass movement events from the last five hundred years covering the entire country. A first analysis of the data shows that the most frequent type of recorded rapid mass movement is rock slides and snow avalanches followed by debris slides in third place. Most events are recorded in the steep fjord terrain of the Norwegian west coast, but major events are recorded all over the country. Snow avalanches account for most fatalities, while large rock slides causing flood waves and huge quick clay slides are the most damaging individual events in terms of damage to infrastructure and property and for causing multiple fatalities. The quality of the data is strongly influenced by the personal engagement of local observers and varying observation routines. This database is a unique source for statistical analysis including, risk analysis and the relation between rapid mass movements and climate. The database of rapid mass movement events will also facilitate validation of national hazard and risk maps.

  18. 75 FR 4827 - Submission for OMB Review; Comment Request Clinical Trials Reporting Program (CTRP) Database (NCI)

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-01-29

    ...; Comment Request Clinical Trials Reporting Program (CTRP) Database (NCI) Summary: Under the provisions of... Collection: Title: Clinical Trials Reporting Program (CTRP) Database. Type of Information Collection Request... Program (CTRP) Database, to serve as a single, definitive source of information about all NCI-supported...

  19. WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions

    PubMed Central

    Karr, Jonathan R.; Phillips, Nolan C.; Covert, Markus W.

    2014-01-01

    Mechanistic ‘whole-cell’ models are needed to develop a complete understanding of cell physiology. However, extracting biological insights from whole-cell models requires running and analyzing large numbers of simulations. We developed WholeCellSimDB, a database for organizing whole-cell simulations. WholeCellSimDB was designed to enable researchers to search simulation metadata to identify simulations for further analysis, and quickly slice and aggregate simulation results data. In addition, WholeCellSimDB enables users to share simulations with the broader research community. The database uses a hybrid relational/hierarchical data format architecture to efficiently store and retrieve both simulation setup metadata and results data. WholeCellSimDB provides a graphical Web-based interface to search, browse, plot and export simulations; a JavaScript Object Notation (JSON) Web service to retrieve data for Web-based visualizations; a command-line interface to deposit simulations; and a Python API to retrieve data for advanced analysis. Overall, we believe WholeCellSimDB will help researchers use whole-cell models to advance basic biological science and bioengineering. Database URL: http://www.wholecellsimdb.org Source code repository URL: http://github.com/CovertLab/WholeCellSimDB PMID:25231498

  20. Expediting topology data gathering for the TOPDB database.

    PubMed

    Dobson, László; Langó, Tamás; Reményi, István; Tusnády, Gábor E

    2015-01-01

    The Topology Data Bank of Transmembrane Proteins (TOPDB, http://topdb.enzim.ttk.mta.hu) contains experimentally determined topology data of transmembrane proteins. Recently, we have updated TOPDB from several sources and utilized a newly developed topology prediction algorithm to determine the most reliable topology using the results of experiments as constraints. In addition to collecting the experimentally determined topology data published in the last couple of years, we gathered topographies defined by the TMDET algorithm using 3D structures from the PDBTM. Results of global topology analysis of various organisms as well as topology data generated by high throughput techniques, like the sequential positions of N- or O-glycosylations were incorporated into the TOPDB database. Moreover, a new algorithm was developed to integrate scattered topology data from various publicly available databases and a new method was introduced to measure the reliability of predicted topologies. We show that reliability values highly correlate with the per protein topology accuracy of the utilized prediction method. Altogether, more than 52,000 new topology data and more than 2600 new transmembrane proteins have been collected since the last public release of the TOPDB database. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Fiber pixelated image database

    NASA Astrophysics Data System (ADS)

    Shinde, Anant; Perinchery, Sandeep Menon; Matham, Murukeshan Vadakke

    2016-08-01

    Imaging of physically inaccessible parts of the body such as the colon at micron-level resolution is highly important in diagnostic medical imaging. Though flexible endoscopes based on the imaging fiber bundle are used for such diagnostic procedures, their inherent honeycomb-like structure creates fiber pixelation effects. This impedes the observer from perceiving the information from an image captured and hinders the direct use of image processing and machine intelligence techniques on the recorded signal. Significant efforts have been made by researchers in the recent past in the development and implementation of pixelation removal techniques. However, researchers have often used their own set of images without making source data available which subdued their usage and adaptability universally. A database of pixelated images is the current requirement to meet the growing diagnostic needs in the healthcare arena. An innovative fiber pixelated image database is presented, which consists of pixelated images that are synthetically generated and experimentally acquired. Sample space encompasses test patterns of different scales, sizes, and shapes. It is envisaged that this proposed database will alleviate the current limitations associated with relevant research and development and would be of great help for researchers working on comb structure removal algorithms.

  2. Methods for Estimating Annual Wastewater Nutrient Loads in the Southeastern United States

    USGS Publications Warehouse

    McMahon, Gerard; Tervelt, Larinda; Donehoo, William

    2007-01-01

    This report describes an approach for estimating annual total nitrogen and total phosphorus loads from point-source dischargers in the southeastern United States. Nutrient load estimates for 2002 were used in the calibration and application of a regional nutrient model, referred to as the SPARROW (SPAtially Referenced Regression On Watershed attributes) watershed model. Loads from dischargers permitted under the National Pollutant Discharge Elimination System were calculated using data from the U.S. Environmental Protection Agency Permit Compliance System database and individual state databases. Site information from both state and U.S. Environmental Protection Agency databases, including latitude and longitude and monitored effluent data, was compiled into a project database. For sites with a complete effluent-monitoring record, effluent-flow and nutrient-concentration data were used to develop estimates of annual point-source nitrogen and phosphorus loads. When flow data were available but nutrient-concentration data were missing or incomplete, typical pollutant-concentration values of total nitrogen and total phosphorus were used to estimate load. In developing typical pollutant-concentration values, the major factors assumed to influence wastewater nutrient-concentration variability were the size of the discharger (the amount of flow), the season during which discharge occurred, and the Standard Industrial Classification code of the discharger. One insight gained from this study is that in order to gain access to flow, concentration, and location data, close communication and collaboration are required with the agencies that collect and manage the data. In addition, the accuracy and usefulness of the load estimates depend on the willingness of the states and the U.S. Environmental Protection Agency to provide guidance and review for at least a subset of the load estimates that may be problematic.

  3. BioMart: a data federation framework for large collaborative projects.

    PubMed

    Zhang, Junjun; Haider, Syed; Baran, Joachim; Cros, Anthony; Guberman, Jonathan M; Hsu, Jack; Liang, Yong; Yao, Long; Kasprzyk, Arek

    2011-01-01

    BioMart is a freely available, open source, federated database system that provides a unified access to disparate, geographically distributed data sources. It is designed to be data agnostic and platform independent, such that existing databases can easily be incorporated into the BioMart framework. BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects between different research groups. BioMart contains several levels of query optimization to efficiently manage large data sets and offers a diverse selection of graphical user interfaces and application programming interfaces to ensure that queries can be performed in whatever manner is most convenient for the user. The software has now been adopted by a large number of different biological databases spanning a wide range of data types and providing a rich source of annotation available to bioinformaticians and biologists alike.

  4. [Data sources, the data used, and the modality for collection].

    PubMed

    Mercier, G; Costa, N; Dutot, C; Riche, V-P

    2018-03-01

    The hospital costing process implies access to various sources of data. Whether a micro-costing or a gross-costing approach is used, the choice of the methodology is based on a compromise between the cost of data collection, data accuracy, and data transferability. This work describes the data sources available in France and the access modalities that are used, as well as the main advantages and shortcomings of: (1) the local unit costs, (2) the hospital analytical accounting, (3) the Angers database, (4) the National Health Cost Studies, (5) the INTER CHR/U databases, (6) the Program for Medicalizing Information Systems, and (7) the public health insurance databases. Copyright © 2018 Elsevier Masson SAS. All rights reserved.

  5. A collection of open source applications for mass spectrometry data mining.

    PubMed

    Gallardo, Óscar; Ovelleiro, David; Gay, Marina; Carrascal, Montserrat; Abian, Joaquin

    2014-10-01

    We present several bioinformatics applications for the identification and quantification of phosphoproteome components by MS. These applications include a front-end graphical user interface that combines several Thermo RAW formats to MASCOT™ Generic Format extractors (EasierMgf), two graphical user interfaces for search engines OMSSA and SEQUEST (OmssaGui and SequestGui), and three applications, one for the management of databases in FASTA format (FastaTools), another for the integration of search results from up to three search engines (Integrator), and another one for the visualization of mass spectra and their corresponding database search results (JsonVisor). These applications were developed to solve some of the common problems found in proteomic and phosphoproteomic data analysis and were integrated in the workflow for data processing and feeding on our LymPHOS database. Applications were designed modularly and can be used standalone. These tools are written in Perl and Python programming languages and are supported on Windows platforms. They are all released under an Open Source Software license and can be freely downloaded from our software repository hosted at GoogleCode. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. TISSUES 2.0: an integrative web resource on mammalian tissue expression

    PubMed Central

    Palasca, Oana; Santos, Alberto; Stolte, Christian; Gorodkin, Jan; Jensen, Lars Juhl

    2018-01-01

    Abstract Physiological and molecular similarities between organisms make it possible to translate findings from simpler experimental systems—model organisms—into more complex ones, such as human. This translation facilitates the understanding of biological processes under normal or disease conditions. Researchers aiming to identify the similarities and differences between organisms at the molecular level need resources collecting multi-organism tissue expression data. We have developed a database of gene–tissue associations in human, mouse, rat and pig by integrating multiple sources of evidence: transcriptomics covering all four species and proteomics (human only), manually curated and mined from the scientific literature. Through a scoring scheme, these associations are made comparable across all sources of evidence and across organisms. Furthermore, the scoring produces a confidence score assigned to each of the associations. The TISSUES database (version 2.0) is publicly accessible through a user-friendly web interface and as part of the STRING app for Cytoscape. In addition, we analyzed the agreement between datasets, across and within organisms, and identified that the agreement is mainly affected by the quality of the datasets rather than by the technologies used or organisms compared. Database URL: http://tissues.jensenlab.org/ PMID:29617745

  7. Database of tsunami scenario simulations for Western Iberia: a tool for the TRIDEC Project Decision Support System for tsunami early warning

    NASA Astrophysics Data System (ADS)

    Armigliato, Alberto; Pagnoni, Gianluca; Zaniboni, Filippo; Tinti, Stefano

    2013-04-01

    TRIDEC is a EU-FP7 Project whose main goal is, in general terms, to develop suitable strategies for the management of crises possibly arising in the Earth management field. The general paradigms adopted by TRIDEC to develop those strategies include intelligent information management, the capability of managing dynamically increasing volumes and dimensionality of information in complex events, and collaborative decision making in systems that are typically very loosely coupled. The two areas where TRIDEC applies and tests its strategies are tsunami early warning and industrial subsurface development. In the field of tsunami early warning, TRIDEC aims at developing a Decision Support System (DSS) that integrates 1) a set of seismic, geodetic and marine sensors devoted to the detection and characterisation of possible tsunamigenic sources and to monitoring the time and space evolution of the generated tsunami, 2) large-volume databases of pre-computed numerical tsunami scenarios, 3) a proper overall system architecture. Two test areas are dealt with in TRIDEC: the western Iberian margin and the eastern Mediterranean. In this study, we focus on the western Iberian margin with special emphasis on the Portuguese coasts. The strategy adopted in TRIDEC plans to populate two different databases, called "Virtual Scenario Database" (VSDB) and "Matching Scenario Database" (MSDB), both of which deal only with earthquake-generated tsunamis. In the VSDB we simulate numerically few large-magnitude events generated by the major known tectonic structures in the study area. Heterogeneous slip distributions on the earthquake faults are introduced to simulate events as "realistically" as possible. The members of the VSDB represent the unknowns that the TRIDEC platform must be able to recognise and match during the early crisis management phase. On the other hand, the MSDB contains a very large number (order of thousands) of tsunami simulations performed starting from many different simple earthquake sources of different magnitudes and located in the "vicinity" of the virtual scenario earthquake. In the DSS perspective, the members of the MSDB have to be suitably combined based on the information coming from the sensor networks, and the results are used during the crisis evolution phase to forecast the degree of exposition of different coastal areas. We provide examples from both databases whose members are computed by means of the in-house software called UBO-TSUFD, implementing the non-linear shallow-water equations and solving them over a set of nested grids that guarantee a suitable spatial resolution (few tens of meters) in specific, suitably chosen, coastal areas.

  8. Move Over, Word Processors--Here Come the Databases.

    ERIC Educational Resources Information Center

    Olds, Henry F., Jr.; Dickenson, Anne

    1985-01-01

    Discusses the use of beginning, intermediate, and advanced databases for instructional purposes. A table listing seven databases with information on ease of use, smoothness of operation, data capacity, speed, source, and program features is included. (JN)

  9. SPOT 5/HRS: A Key Source for Navigation Database

    DTIC Science & Technology

    2003-09-02

    SUBTITLE SPOT 5 / HRS: A Key Source for Navigation Database 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT ......strategic objective. Nice data ….. What after ?? Filière SPOT Marc BERNARD Page 15 Producing from HRS u Partnership with IGN ( French

  10. Database of Sources of Environmental Releases of Dioxin-Like Compounds in the United States

    EPA Science Inventory

    The Database of Sources of Environmental Releases of Dioxin-like Compounds in the United States (US)FERN Ethnomedicinal Plant Database: Exploring Fern Ethnomedicinal Plants Knowledge for Computational Drug Discovery.

    PubMed

    Thakar, Sambhaji B; Ghorpade, Pradnya N; Kale, Manisha V; Sonawane, Kailas D

    2015-01-01

    Fern plants are known for their ethnomedicinal applications. Huge amount of fern medicinal plants information is scattered in the form of text. Hence, database development would be an appropriate endeavor to cope with the situation. So by looking at the importance of medicinally useful fern plants, we developed a web based database which contains information about several group of ferns, their medicinal uses, chemical constituents as well as protein/enzyme sequences isolated from different fern plants. Fern ethnomedicinal plant database is an all-embracing, content management web-based database system, used to retrieve collection of factual knowledge related to the ethnomedicinal fern species. Most of the protein/enzyme sequences have been extracted from NCBI Protein sequence database. The fern species, family name, identification, taxonomy ID from NCBI, geographical occurrence, trial for, plant parts used, ethnomedicinal importance, morphological characteristics, collected from various scientific literatures and journals available in the text form. NCBI's BLAST, InterPro, phylogeny, Clustal W web source has also been provided for the future comparative studies. So users can get information related to fern plants and their medicinal applications at one place. This Fern ethnomedicinal plant database includes information of 100 fern medicinal species. This web based database would be an advantageous to derive information specifically for computational drug discovery, botanists or botanical interested persons, pharmacologists, researchers, biochemists, plant biotechnologists, ayurvedic practitioners, doctors/pharmacists, traditional medicinal users, farmers, agricultural students and teachers from universities as well as colleges and finally fern plant lovers. This effort would be useful to provide essential knowledge for the users about the adventitious applications for drug discovery, applications, conservation of fern species around the world and finally to create social awareness.

  11. JBioWH: an open-source Java framework for bioinformatics data integration

    PubMed Central

    Vera, Roberto; Perez-Riverol, Yasset; Perez, Sonia; Ligeti, Balázs; Kertész-Farkas, Attila; Pongor, Sándor

    2013-01-01

    The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh PMID:23846595

  12. JBioWH: an open-source Java framework for bioinformatics data integration.

    PubMed

    Vera, Roberto; Perez-Riverol, Yasset; Perez, Sonia; Ligeti, Balázs; Kertész-Farkas, Attila; Pongor, Sándor

    2013-01-01

    The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh.

  13. HepSEQ: International Public Health Repository for Hepatitis B

    PubMed Central

    Gnaneshan, Saravanamuttu; Ijaz, Samreen; Moran, Joanne; Ramsay, Mary; Green, Jonathan

    2007-01-01

    HepSEQ is a repository for an extensive library of public health and molecular data relating to hepatitis B virus (HBV) infection collected from international sources. It is hosted by the Centre for Infections, Health Protection Agency (HPA), England, United Kingdom. This repository has been developed as a web-enabled, quality-controlled database to act as a tool for surveillance, HBV case management and for research. The web front-end for the database system can be accessed from . The format of the database system allows for comprehensive molecular, clinical and epidemiological data to be deposited into a functional database, to search and manipulate the stored data and to extract and visualize the information on epidemiological, virological, clinical, nucleotide sequence and mutational aspects of HBV infection through web front-end. Specific tools, built into the database, can be utilized to analyse deposited data and provide information on HBV genotype, identify mutations with known clinical significance (e.g. vaccine escape, precore and antiviral-resistant mutations) and carry out sequence homology searches against other deposited strains. Further mechanisms are also in place to allow specific tailored searches of the database to be undertaken. PMID:17130143

  14. Requests for post-registration studies (PRS), patients follow-up in actual practice: Changes in the role of databases.

    PubMed

    Berdaï, Driss; Thomas-Delecourt, Florence; Szwarcensztein, Karine; d'Andon, Anne; Collignon, Cécile; Comet, Denis; Déal, Cécile; Dervaux, Benoît; Gaudin, Anne-Françoise; Lamarque-Garnier, Véronique; Lechat, Philippe; Marque, Sébastien; Maugendre, Philippe; Méchin, Hubert; Moore, Nicholas; Nachbaur, Gaëlle; Robain, Mathieu; Roussel, Christophe; Tanti, André; Thiessard, Frantz

    2018-02-01

    Early market access of health products is associated with a larger number of requests for information by the health authorities. Compared with these expectations, the growing expansion of health databases represents an opportunity for responding to questions raised by the authorities. The computerised nature of the health system provides numerous sources of data, and first and foremost medical/administrative databases such as the French National Inter-Scheme Health Insurance Information System (SNIIRAM) database. These databases, although developed for other purposes, have already been used for many years with regard to post-registration studies (PRS). The use thereof will continue to increase with the recent creation of the French National Health Data System (SNDS [2016 health system reform law]). At the same time, other databases are available in France, offering an illustration of "product use under actual practice conditions" by patients and health professionals (cohorts, specific registries, data warehouses, etc.). Based on a preliminary analysis of requests for PRS, approximately two-thirds appeared to have found at least a partial response in existing databases. Using these databases has a number of disadvantages, but also numerous advantages, which are listed. In order to facilitate access and optimise their use, it seemed important to draw up recommendations aiming to facilitate these developments and guarantee the conditions for their technical validity. The recommendations drawn up notably include the need for measures aiming to promote the visibility of research conducted on databases in the field of PRS. Moreover, it seemed worthwhile to promote the interoperability of health data warehouses, to make it possible to match information originating from field studies with information originating from databases, and to develop and share algorithms aiming to identify criteria of interest (proxies). Methodological documents, such as the French National Authority for Health (HAS) recommendations on "Les études post-inscription sur les technologies de santé (médicaments, dispositifs médicaux et actes). Principes et méthodes" [Post-registration studies on health technologies (medicinal products, medical devices and procedures). Principles and methods] should be updated to incorporate these developments. Copyright © 2018 Société française de pharmacologie et de thérapeutique. Published by Elsevier Masson SAS. All rights reserved.

  15. ECLSS evolution: Advanced instrumentation interface requirements. Volume 3: Appendix C

    NASA Technical Reports Server (NTRS)

    1991-01-01

    An Advanced ECLSS (Environmental Control and Life Support System) Technology Interfaces Database was developed primarily to provide ECLSS analysts with a centralized and portable source of ECLSS technologies interface requirements data. The database contains 20 technologies which were previously identified in the MDSSC ECLSS Technologies database. The primary interfaces of interest in this database are fluid, electrical, data/control interfaces, and resupply requirements. Each record contains fields describing the function and operation of the technology. Fields include: an interface diagram, description applicable design points and operating ranges, and an explaination of data, as required. A complete set of data was entered for six of the twenty components including Solid Amine Water Desorbed (SAWD), Thermoelectric Integrated Membrane Evaporation System (TIMES), Electrochemical Carbon Dioxide Concentrator (EDC), Solid Polymer Electrolysis (SPE), Static Feed Electrolysis (SFE), and BOSCH. Additional data was collected for Reverse Osmosis Water Reclaimation-Potable (ROWRP), Reverse Osmosis Water Reclaimation-Hygiene (ROWRH), Static Feed Solid Polymer Electrolyte (SFSPE), Trace Contaminant Control System (TCCS), and Multifiltration Water Reclamation - Hygiene (MFWRH). A summary of the database contents is presented in this report.

  16. TRENDS: A flight test relational database user's guide and reference manual

    NASA Technical Reports Server (NTRS)

    Bondi, M. J.; Bjorkman, W. S.; Cross, J. L.

    1994-01-01

    This report is designed to be a user's guide and reference manual for users intending to access rotocraft test data via TRENDS, the relational database system which was developed as a tool for the aeronautical engineer with no programming background. This report has been written to assist novice and experienced TRENDS users. TRENDS is a complete system for retrieving, searching, and analyzing both numerical and narrative data, and for displaying time history and statistical data in graphical and numerical formats. This manual provides a 'guided tour' and a 'user's guide' for the new and intermediate-skilled users. Examples for the use of each menu item within TRENDS is provided in the Menu Reference section of the manual, including full coverage for TIMEHIST, one of the key tools. This manual is written around the XV-15 Tilt Rotor database, but does include an appendix on the UH-60 Blackhawk database. This user's guide and reference manual establishes a referrable source for the research community and augments NASA TM-101025, TRENDS: The Aeronautical Post-Test, Database Management System, Jan. 1990, written by the same authors.

  17. Systematic analysis of snake neurotoxins' functional classification using a data warehousing approach.

    PubMed

    Siew, Joyce Phui Yee; Khan, Asif M; Tan, Paul T J; Koh, Judice L Y; Seah, Seng Hong; Koo, Chuay Yeng; Chai, Siaw Ching; Armugam, Arunmozhiarasi; Brusic, Vladimir; Jeyaseelan, Kandiah

    2004-12-12

    Sequence annotations, functional and structural data on snake venom neurotoxins (svNTXs) are scattered across multiple databases and literature sources. Sequence annotations and structural data are available in the public molecular databases, while functional data are almost exclusively available in the published articles. There is a need for a specialized svNTXs database that contains NTX entries, which are organized, well annotated and classified in a systematic manner. We have systematically analyzed svNTXs and classified them using structure-function groups based on their structural, functional and phylogenetic properties. Using conserved motifs in each phylogenetic group, we built an intelligent module for the prediction of structural and functional properties of unknown NTXs. We also developed an annotation tool to aid the functional prediction of newly identified NTXs as an additional resource for the venom research community. We created a searchable online database of NTX proteins sequences (http://research.i2r.a-star.edu.sg/Templar/DB/snake_neurotoxin). This database can also be found under Swiss-Prot Toxin Annotation Project website (http://www.expasy.org/sprot/).

  18. Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse.

    PubMed

    Soranno, Patricia A; Bissell, Edward G; Cheruvelil, Kendra S; Christel, Samuel T; Collins, Sarah M; Fergus, C Emi; Filstrup, Christopher T; Lapierre, Jean-Francois; Lottig, Noah R; Oliver, Samantha K; Scott, Caren E; Smith, Nicole J; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A; Gries, Corinna; Henry, Emily N; Skaff, Nick K; Stanley, Emily H; Stow, Craig A; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E

    2015-01-01

    Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km(2)). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.

  19. Building a multi-scaled geospatial temporal ecology database from disparate data sources: Fostering open science through data reuse

    USGS Publications Warehouse

    Soranno, Patricia A.; Bissell, E.G.; Cheruvelil, Kendra S.; Christel, Samuel T.; Collins, Sarah M.; Fergus, C. Emi; Filstrup, Christopher T.; Lapierre, Jean-Francois; Lotting, Noah R.; Oliver, Samantha K.; Scott, Caren E.; Smith, Nicole J.; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A.; Gries, Corinna; Henry, Emily N.; Skaff, Nick K.; Stanley, Emily H.; Stow, Craig A.; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E.

    2015-01-01

    Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km2). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.

  1. The establishment of the atmospheric emission inventories of the ESCOMPTE program

    NASA Astrophysics Data System (ADS)

    François, S.; Grondin, E.; Fayet, S.; Ponche, J.-L.

    2005-03-01

    Within the frame of the ESCOMPTE program, a spatial emission inventory and an emission database aimed at tropospheric photochemistry intercomparison modeling has been developed under the scientific supervision of the LPCA with the help of the regional coordination of Air Quality network AIRMARAIX. This inventory has been established for all categories of sources (stationary, mobile and biogenic sources) over a domain of 19,600 km 2 centered on the cities of Marseilles-Aix-en-Provence in the southeastern part of France with a spatial resolution of 1 km 2. A yearly inventory for 1999 has been established, and hourly emission inventories for 23 days of June and July 2000 and 2001, corresponding to the intensive measurement periods, have been produced. The 104 chemical species in the inventory have been selected to be relevant with respect to photochemistry modeling according to available data. The entire list of species in the inventory numbers 216 which will allow other future applications of this database. This database is presently the most detailed and complete regional emission database in France. In addition, the database structure and the emission calculation modules have been designed to ensure a better sustainability and upgradeability, being provided with appropriate maintenance software. The general organization and method is summarized and the results obtained for both yearly and hourly emissions are detailed and discussed. Some comparisons have been performed with the existing results in this region to ensure the congruency of the results. This leads to confirm the relevance and the consistency of the ESCOMPTE emission inventory.

  2. The successes and challenges of open-source biopharmaceutical innovation.

    PubMed

    Allarakhia, Minna

    2014-05-01

    Increasingly, open-source-based alliances seek to provide broad access to data, research-based tools, preclinical samples and downstream compounds. The challenge is how to create value from open-source biopharmaceutical innovation. This value creation may occur via transparency and usage of data across the biopharmaceutical value chain as stakeholders move dynamically between open source and open innovation. In this article, several examples are used to trace the evolution of biopharmaceutical open-source initiatives. The article specifically discusses the technological challenges associated with the integration and standardization of big data; the human capacity development challenges associated with skill development around big data usage; and the data-material access challenge associated with data and material access and usage rights, particularly as the boundary between open source and open innovation becomes more fluid. It is the author's opinion that the assessment of when and how value creation will occur, through open-source biopharmaceutical innovation, is paramount. The key is to determine the metrics of value creation and the necessary technological, educational and legal frameworks to support the downstream outcomes of now big data-based open-source initiatives. The continued focus on the early-stage value creation is not advisable. Instead, it would be more advisable to adopt an approach where stakeholders transform open-source initiatives into open-source discovery, crowdsourcing and open product development partnerships on the same platform.

  3. Do Staphylococcus epidermidis Genetic Clusters Predict Isolation Sources?

    PubMed Central

    Tolo, Isaiah; Thomas, Jonathan C.; Fischer, Rebecca S. B.; Brown, Eric L.; Gray, Barry M.

    2016-01-01

    Staphylococcus epidermidis is a ubiquitous colonizer of human skin and a common cause of medical device-associated infections. The extent to which the population genetic structure of S. epidermidis distinguishes commensal from pathogenic isolates is unclear. Previously, Bayesian clustering of 437 multilocus sequence types (STs) in the international database revealed a population structure of six genetic clusters (GCs) that may reflect the species' ecology. Here, we first verified the presence of six GCs, including two (GC3 and GC5) with significant admixture, in an updated database of 578 STs. Next, a single nucleotide polymorphism (SNP) assay was developed that accurately assigned 545 (94%) of 578 STs to GCs. Finally, the hypothesis that GCs could distinguish isolation sources was tested by SNP typing and GC assignment of 154 isolates from hospital patients with bacteremia and those with blood culture contaminants and from nonhospital carriage. GC5 was isolated almost exclusively from hospital sources. GC1 and GC6 were isolated from all sources but were overrepresented in isolates from nonhospital and infection sources, respectively. GC2, GC3, and GC4 were relatively rare in this collection. No association was detected between fdh-positive isolates (GC2 and GC4) and nonhospital sources. Using a machine learning algorithm, GCs predicted hospital and nonhospital sources with 80% accuracy and predicted infection and contaminant sources with 45% accuracy, which was comparable to the results seen with a combination of five genetic markers (icaA, IS256, sesD [bhp], mecA, and arginine catabolic mobile element [ACME]). Thus, analysis of population structure with subgenomic data shows the distinction of hospital and nonhospital sources and the near-inseparability of sources within a hospital. PMID:27076664

  4. Central Appalachian basin natural gas database: distribution, composition, and origin of natural gases

    USGS Publications Warehouse

    Román Colón, Yomayra A.; Ruppert, Leslie F.

    2015-01-01

    The U.S. Geological Survey (USGS) has compiled a database consisting of three worksheets of central Appalachian basin natural gas analyses and isotopic compositions from published and unpublished sources of 1,282 gas samples from Kentucky, Maryland, New York, Ohio, Pennsylvania, Tennessee, Virginia, and West Virginia. The database includes field and reservoir names, well and State identification number, selected geologic reservoir properties, and the composition of natural gases (methane; ethane; propane; butane, iso-butane [i-butane]; normal butane [n-butane]; iso-pentane [i-pentane]; normal pentane [n-pentane]; cyclohexane, and hexanes). In the first worksheet, location and American Petroleum Institute (API) numbers from public or published sources are provided for 1,231 of the 1,282 gas samples. A second worksheet of 186 gas samples was compiled from published sources and augmented with public location information and contains carbon, hydrogen, and nitrogen isotopic measurements of natural gas. The third worksheet is a key for all abbreviations in the database. The database can be used to better constrain the stratigraphic distribution, composition, and origin of natural gas in the central Appalachian basin.

  5. Missing Modality Transfer Learning via Latent Low-Rank Constraint.

    PubMed

    Ding, Zhengming; Shao, Ming; Fu, Yun

    2015-11-01

    Transfer learning is usually exploited to leverage previously well-learned source domain for evaluating the unknown target domain; however, it may fail if no target data are available in the training stage. This problem arises when the data are multi-modal. For example, the target domain is in one modality, while the source domain is in another. To overcome this, we first borrow an auxiliary database with complete modalities, then consider knowledge transfer across databases and across modalities within databases simultaneously in a unified framework. The contributions are threefold: 1) a latent factor is introduced to uncover the underlying structure of the missing modality from the known data; 2) transfer learning in two directions allows the data alignment between both modalities and databases, giving rise to a very promising recovery; and 3) an efficient solution with theoretical guarantees to the proposed latent low-rank transfer learning algorithm. Comprehensive experiments on multi-modal knowledge transfer with missing target modality verify that our method can successfully inherit knowledge from both auxiliary database and source modality, and therefore significantly improve the recognition performance even when test modality is inaccessible in the training stage.

  6. Two Different Communication Genres and Implications for Vocabulary Development and Learning to Read

    ERIC Educational Resources Information Center

    Massaro, Dominic W.

    2015-01-01

    This study examined potential differences in vocabulary found in picture books and adult's speech to children and to other adults. Using a small sample of various sources of speech and print, Hayes observed that print had a more extensive vocabulary than speech. The current analyses of two different spoken language databases and an assembled…

  7. The AACRAO 2003 Academic Record and Transcript Guide. AACRAO Professional Development & Education Series.

    ERIC Educational Resources Information Center

    American Association of Collegiate Registrars and Admissions Officers, Washington, DC.

    This guide is a source of information on a wide range of issues involving student records and transcripts. It focuses on the necessity of reconciling the need to provide accurate information promptly to various constituencies and the need to safeguard privacy. Recommendations are provided for database and transcript elements, and current issues…

  8. Using Patent Classification to Discover Chemical Information in a Free Patent Database: Challenges and Opportunities

    ERIC Educational Resources Information Center

    Ha¨rtinger, Stefan; Clarke, Nigel

    2016-01-01

    Developing skills for searching the patent literature is an essential element of chemical information literacy programs at the university level. The present article creates awareness of patents as a rich source of chemical information. Patent classification is introduced as a key-component in comprehensive search strategies. The free Espacenet…

  9. The Protein Information Resource: an integrated public resource of functional annotation of proteins

    PubMed Central

    Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.

    2002-01-01

    The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247

  10. AQUIS: A PC-based source information manager

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, A.E.; Huber, C.C.; Tschanz, J.

    1993-05-01

    The Air Quality Utility Information System (AQUIS) was developed to calculate emissions and track them along with related information about sources, stacks, controls, and permits. The system runs on IBM- compatible personal computers with dBASE IV and tracks more than 1, 200 data items distributed among various source categories. AQUIS is currently operating at 11 US Air Force facilities, which have up to 1, 000 sources, and two headquarters. The system provides a flexible reporting capability that permits users who are unfamiliar with database structure to design and prepare reports containing user- specified information. In addition to the criteria pollutants,more » AQUIS calculates compound-specific emissions and allows users to enter their own emission estimates.« less

  11. AQUIS: A PC-based source information manager

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, A.E.; Huber, C.C.; Tschanz, J.

    1993-01-01

    The Air Quality Utility Information System (AQUIS) was developed to calculate emissions and track them along with related information about sources, stacks, controls, and permits. The system runs on IBM- compatible personal computers with dBASE IV and tracks more than 1, 200 data items distributed among various source categories. AQUIS is currently operating at 11 US Air Force facilities, which have up to 1, 000 sources, and two headquarters. The system provides a flexible reporting capability that permits users who are unfamiliar with database structure to design and prepare reports containing user- specified information. In addition to the criteria pollutants,more » AQUIS calculates compound-specific emissions and allows users to enter their own emission estimates.« less

  12. The eNanoMapper database for nanomaterial safety information

    PubMed Central

    Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon

    2015-01-01

    Summary Background: The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. Results: The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. Conclusion: We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the “representational state transfer” (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure–activity relationships for nanomaterials (NanoQSAR). PMID:26425413

  13. The eNanoMapper database for nanomaterial safety information.

    PubMed

    Jeliazkova, Nina; Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon

    2015-01-01

    The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the "representational state transfer" (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure-activity relationships for nanomaterials (NanoQSAR).

  14. Biopython: freely available Python tools for computational molecular biology and bioinformatics.

    PubMed

    Cock, Peter J A; Antao, Tiago; Chang, Jeffrey T; Chapman, Brad A; Cox, Cymon J; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J L

    2009-06-01

    The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Biopython is freely available, with documentation and source code at (www.biopython.org) under the Biopython license.

  15. Alaska IPASS database preparation manual.

    Treesearch

    P. McHugh; D. Olson; C. Schallau

    1989-01-01

    Describes the data, their sources, and the calibration procedures used in compiling a database for the Alaska IPASS (interactive policy analysis simulation system) model. Although this manual is for Alaska, it provides generic instructions for analysts preparing databases for other geographical areas.

  16. Saudi anti-human cancer plants database (SACPD): A collection of plants with anti-human cancer activities

    PubMed Central

    Al-Zahrani, Ateeq Ahmed

    2018-01-01

    Several anticancer drugs have been developed from natural products such as plants. Successful experiments in inhibiting the growth of human cancer cell lines using Saudi plants were published over the last three decades. Up to date, there is no Saudi anticancer plants database as a comprehensive source for the interesting data generated from these experiments. Therefore, there was a need for creating a database to collect, organize, search and retrieve such data. As a result, the current paper describes the generation of the Saudi anti-human cancer plants database (SACPD). The database contains most of the reported information about the naturally growing Saudi anticancer plants. SACPD comprises the scientific and local names of 91 plant species that grow naturally in Saudi Arabia. These species belong to 38 different taxonomic families. In Addition, 18 species that represent16 family of medicinal plants and are intensively sold in the local markets in Saudi Arabia were added to the database. The website provides interesting details, including plant part containing the anticancer bioactive compounds, plants locations and cancer/cell type against which they exhibit their anticancer activity. Our survey revealed that breast, liver and leukemia were the most studied cancer cell lines in Saudi Arabia with percentages of 27%, 19% and 15%, respectively. The current SACPD represents a nucleus around which more development efforts can expand to accommodate all future submissions about new Saudi plant species with anticancer activities. SACPD will provide an excellent starting point for researchers and pharmaceutical companies who are interested in developing new anticancer drugs. SACPD is available online at https://teeqrani1.wixsite.com/sapd PMID:29774137

  17. dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts

    PubMed Central

    Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre

    2013-01-01

    The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284

  18. Saudi anti-human cancer plants database (SACPD): A collection of plants with anti-human cancer activities.

    PubMed

    Al-Zahrani, Ateeq Ahmed

    2018-01-30

    Several anticancer drugs have been developed from natural products such as plants. Successful experiments in inhibiting the growth of human cancer cell lines using Saudi plants were published over the last three decades. Up to date, there is no Saudi anticancer plants database as a comprehensive source for the interesting data generated from these experiments. Therefore, there was a need for creating a database to collect, organize, search and retrieve such data. As a result, the current paper describes the generation of the Saudi anti-human cancer plants database (SACPD). The database contains most of the reported information about the naturally growing Saudi anticancer plants. SACPD comprises the scientific and local names of 91 plant species that grow naturally in Saudi Arabia. These species belong to 38 different taxonomic families. In Addition, 18 species that represent16 family of medicinal plants and are intensively sold in the local markets in Saudi Arabia were added to the database. The website provides interesting details, including plant part containing the anticancer bioactive compounds, plants locations and cancer/cell type against which they exhibit their anticancer activity. Our survey revealed that breast, liver and leukemia were the most studied cancer cell lines in Saudi Arabia with percentages of 27%, 19% and 15%, respectively. The current SACPD represents a nucleus around which more development efforts can expand to accommodate all future submissions about new Saudi plant species with anticancer activities. SACPD will provide an excellent starting point for researchers and pharmaceutical companies who are interested in developing new anticancer drugs. SACPD is available online at https://teeqrani1.wixsite.com/sapd.

  19. A national look at carbon capture and storage-National carbon sequestration database and geographical information system (NatCarb)

    USGS Publications Warehouse

    Carr, T.R.; Iqbal, A.; Callaghan, N.; ,; Look, K.; Saving, S.; Nelson, K.

    2009-01-01

    The US Department of Energy's Regional Carbon Sequestration Partnerships (RCSPs) are responsible for generating geospatial data for the maps displayed in the Carbon Sequestration Atlas of the United States and Canada. Key geospatial data (carbon sources, potential storage sites, transportation, land use, etc.) are required for the Atlas, and for efficient implementation of carbon sequestration on a national and regional scale. The National Carbon Sequestration Database and Geographical Information System (NatCarb) is a relational database and geographic information system (GIS) that integrates carbon storage data generated and maintained by the RCSPs and various other sources. The purpose of NatCarb is to provide a national view of the carbon capture and storage potential in the U.S. and Canada. The digital spatial database allows users to estimate the amount of CO2 emitted by sources (such as power plants, refineries and other fossil-fuel-consuming industries) in relation to geologic formations that can provide safe, secure storage sites over long periods of time. The NatCarb project is working to provide all stakeholders with improved online tools for the display and analysis of CO2 carbon capture and storage data. NatCarb is organizing and enhancing the critical information about CO2 sources and developing the technology needed to access, query, model, analyze, display, and distribute natural resource data related to carbon management. Data are generated, maintained and enhanced locally at the RCSP level, or at specialized data warehouses, and assembled, accessed, and analyzed in real-time through a single geoportal. NatCarb is a functional demonstration of distributed data-management systems that cross the boundaries between institutions and geographic areas. It forms the first step toward a functioning National Carbon Cyberinfrastructure (NCCI). NatCarb provides access to first-order information to evaluate the costs, economic potential and societal issues of CO2 capture and storage, including public perception and regulatory aspects. NatCarb online access has been modified to address the broad needs of a spectrum of users. NatCarb includes not only GIS and database query tools for high-end user, but simplified display for the general public using readily available web tools such as Google Earth???and Google Maps???. Not only is NatCarb connected to all the RCSPs, but data are also pulled from public servers including the U.S. Geological Survey-EROS Data Center and from the Geography Network. Data for major CO2 sources have been obtained from U.S. Environmental Protection Agency (EPA) databases, and data on major coal basins and coalbed methane wells were obtained from the Energy Information Administration (EIA). ?? 2009 Elsevier Ltd. All rights reserved.

  20. MyMolDB: a micromolecular database solution with open source and free components.

    PubMed

    Xia, Bing; Tai, Zheng-Fu; Gu, Yu-Cheng; Li, Bang-Jing; Ding, Li-Sheng; Zhou, Yan

    2011-10-01

    To manage chemical structures in small laboratories is one of the important daily tasks. Few solutions are available on the internet, and most of them are closed source applications. The open-source applications typically have limited capability and basic cheminformatics functionalities. In this article, we describe an open-source solution to manage chemicals in research groups based on open source and free components. It has a user-friendly interface with the functions of chemical handling and intensive searching. MyMolDB is a micromolecular database solution that supports exact, substructure, similarity, and combined searching. This solution is mainly implemented using scripting language Python with a web-based interface for compound management and searching. Almost all the searches are in essence done with pure SQL on the database by using the high performance of the database engine. Thus, impressive searching speed has been archived in large data sets for no external Central Processing Unit (CPU) consuming languages were involved in the key procedure of the searching. MyMolDB is an open-source software and can be modified and/or redistributed under GNU General Public License version 3 published by the Free Software Foundation (Free Software Foundation Inc. The GNU General Public License, Version 3, 2007. Available at: http://www.gnu.org/licenses/gpl.html). The software itself can be found at http://code.google.com/p/mymoldb/. Copyright © 2011 Wiley Periodicals, Inc.

  1. Assessing the number of fire fatalities in a defined population.

    PubMed

    Jonsson, Anders; Bergqvist, Anders; Andersson, Ragnar

    2015-12-01

    Fire-related fatalities and injuries have become a growing governmental concern in Sweden, and a national vision zero strategy has been adopted stating that nobody should get killed or seriously injured from fires. There is considerable uncertainty, however, regarding the numbers of both deaths and injuries due to fires. Different national sources present different numbers, even on deaths, which obstructs reliable surveillance of the problem over time. We assume the situation is similar in other countries. This study seeks to assess the true number of fire-related deaths in Sweden by combining sources, and to verify the coverage of each individual source. By doing so, we also wish to demonstrate the possibilities of improved surveillance practices. Data from three national sources were collected and matched; a special database on fatal fires held by The Swedish Contingencies Agency (nationally responsible for fire prevention), a database on forensic medical examinations held by the National Board of Forensic Medicine, and the cause of death register held by the Swedish National Board of Health and Welfare. The results disclose considerable underreporting in the single sources. The national database on fatal fires, serving as the principal source for policy making on fire prevention matters, underestimates the true situation by 20%. Its coverage of residential fires appears to be better than other fires. Systematic safety work and informed policy-making presuppose access to correct and reliable numbers. By combining several different sources, as suggested in this study, the national database on fatal fires is now considerably improved and includes regular matching with complementary sources.

  2. The antiSMASH database, a comprehensive database of microbial secondary metabolite biosynthetic gene clusters.

    PubMed

    Blin, Kai; Medema, Marnix H; Kottmann, Renzo; Lee, Sang Yup; Weber, Tilmann

    2017-01-04

    Secondary metabolites produced by microorganisms are the main source of bioactive compounds that are in use as antimicrobial and anticancer drugs, fungicides, herbicides and pesticides. In the last decade, the increasing availability of microbial genomes has established genome mining as a very important method for the identification of their biosynthetic gene clusters (BGCs). One of the most popular tools for this task is antiSMASH. However, so far, antiSMASH is limited to de novo computing results for user-submitted genomes and only partially connects these with BGCs from other organisms. Therefore, we developed the antiSMASH database, a simple but highly useful new resource to browse antiSMASH-annotated BGCs in the currently 3907 bacterial genomes in the database and perform advanced search queries combining multiple search criteria. antiSMASH-DB is available at http://antismash-db.secondarymetabolites.org/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. A methodology to compile food metrics related to diet sustainability into a single food database: Application to the French case.

    PubMed

    Gazan, Rozenn; Barré, Tangui; Perignon, Marlène; Maillot, Matthieu; Darmon, Nicole; Vieux, Florent

    2018-01-01

    The holistic approach required to assess diet sustainability is hindered by lack of comprehensive databases compiling relevant food metrics. Those metrics are generally scattered in different data sources with various levels of aggregation hampering their matching. The objective was to develop a general methodology to compile food metrics describing diet sustainability dimensions into a single database and to apply it to the French context. Each step of the methodology is detailed: indicators and food metrics identification and selection, food list definition, food matching and values assignment. For the French case, nutrient and contaminant content, bioavailability factors, distribution of dietary intakes, portion sizes, food prices, greenhouse gas emission, acidification and marine eutrophication estimates were allocated to 212 commonly consumed generic foods. This generic database compiling 279 metrics will allow the simultaneous evaluation of the four dimensions of diet sustainability, namely health, economic, social and environmental, dimensions. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. Biological knowledge bases using Wikis: combining the flexibility of Wikis with the structure of databases.

    PubMed

    Brohée, Sylvain; Barriot, Roland; Moreau, Yves

    2010-09-01

    In recent years, the number of knowledge bases developed using Wiki technology has exploded. Unfortunately, next to their numerous advantages, classical Wikis present a critical limitation: the invaluable knowledge they gather is represented as free text, which hinders their computational exploitation. This is in sharp contrast with the current practice for biological databases where the data is made available in a structured way. Here, we present WikiOpener an extension for the classical MediaWiki engine that augments Wiki pages by allowing on-the-fly querying and formatting resources external to the Wiki. Those resources may provide data extracted from databases or DAS tracks, or even results returned by local or remote bioinformatics analysis tools. This also implies that structured data can be edited via dedicated forms. Hence, this generic resource combines the structure of biological databases with the flexibility of collaborative Wikis. The source code and its documentation are freely available on the MediaWiki website: http://www.mediawiki.org/wiki/Extension:WikiOpener.

  5. RiceAtlas, a spatial database of global rice calendars and production.

    PubMed

    Laborte, Alice G; Gutierrez, Mary Anne; Balanza, Jane Girly; Saito, Kazuki; Zwart, Sander J; Boschetti, Mirco; Murty, M V R; Villano, Lorena; Aunario, Jorrel Khalil; Reinke, Russell; Koo, Jawoo; Hijmans, Robert J; Nelson, Andrew

    2017-05-30

    Knowing where, when, and how much rice is planted and harvested is crucial information for understanding the effects of policy, trade, and global and technological change on food security. We developed RiceAtlas, a spatial database on the seasonal distribution of the world's rice production. It consists of data on rice planting and harvesting dates by growing season and estimates of monthly production for all rice-producing countries. Sources used for planting and harvesting dates include global and regional databases, national publications, online reports, and expert knowledge. Monthly production data were estimated based on annual or seasonal production statistics, and planting and harvesting dates. RiceAtlas has 2,725 spatial units. Compared with available global crop calendars, RiceAtlas is nearly ten times more spatially detailed and has nearly seven times more spatial units, with at least two seasons of calendar data, making RiceAtlas the most comprehensive and detailed spatial database on rice calendar and production.

  6. Hydroacoustic propagation grids for the CTBT knowledge databaes BBN technical memorandum W1303

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    J. Angell

    1998-05-01

    The Hydroacoustic Coverage Assessment Model (HydroCAM) has been used to develop components of the hydroacoustic knowledge database required by operational monitoring systems, particularly the US National Data Center (NDC). The database, which consists of travel time, amplitude correction and travel time standard deviation grids, is planned to support source location, discrimination and estimation functions of the monitoring network. The grids will also be used under the current BBN subcontract to support an analysis of the performance of the International Monitoring System (IMS) and national sensor systems. This report describes the format and contents of the hydroacoustic knowledgebase grids, and themore » procedures and model parameters used to generate these grids. Comparisons between the knowledge grids, measured data and other modeled results are presented to illustrate the strengths and weaknesses of the current approach. A recommended approach for augmenting the knowledge database with a database of expected spectral/waveform characteristics is provided in the final section of the report.« less

  7. Faster sequence homology searches by clustering subsequences.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2015-04-15

    Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX. The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  8. Database Design to Ensure Anonymous Study of Medical Errors: A Report from the ASIPS collaborative

    PubMed Central

    Pace, Wilson D.; Staton, Elizabeth W.; Higgins, Gregory S.; Main, Deborah S.; West, David R.; Harris, Daniel M.

    2003-01-01

    Medical error reporting systems are important information sources for designing strategies to improve the safety of health care. Applied Strategies for Improving Patient Safety (ASIPS) is a multi-institutional, practice-based research project that collects and analyzes data on primary care medical errors and develops interventions to reduce error. The voluntary ASIPS Patient Safety Reporting System captures anonymous and confidential reports of medical errors. Confidential reports, which are quickly de-identified, provide better detail than do anonymous reports; however, concerns exist about the confidentiality of those reports should the database be subject to legal discovery or other security breaches. Standard database elements, for example, serial ID numbers, date/time stamps, and backups, could enable an outsider to link an ASIPS report to a specific medical error. The authors present the design and implementation of a database and administrative system that reduce this risk, facilitate research, and maintain near anonymity of the events, practices, and clinicians. PMID:12925548

  9. Measuring health system resource use for economic evaluation: a comparison of data sources.

    PubMed

    Pollicino, Christine; Viney, Rosalie; Haas, Marion

    2002-01-01

    A key challenge for evaluators and health system planners is the identification, measurement and valuation of resource use for economic evaluation. Accurately capturing all significant resource use is particularly difficult in the Australian context where there is no comprehensive database from which researchers can draw. Evaluators and health system planners need to consider different approaches to data collection for estimating resource use for economic evaluation, and the relative merits of the different data sources available. This paper illustrates the issues that arise in using different data sources using a sub-sample of the data being collected for an economic evaluation. Specifically, it compares the use of Australia's largest administrative database on resource use, the Health Insurance Commission database, with the use of patient-supplied data. The extent of agreement and discrepancies between the two data sources is investigated. Findings from this study and recommendations as to how to deal with different data sources are presented.

  10. RDFBuilder: a tool to automatically build RDF-based interfaces for MAGE-OM microarray data sources.

    PubMed

    Anguita, Alberto; Martin, Luis; Garcia-Remesal, Miguel; Maojo, Victor

    2013-07-01

    This paper presents RDFBuilder, a tool that enables RDF-based access to MAGE-ML-compliant microarray databases. We have developed a system that automatically transforms the MAGE-OM model and microarray data stored in the ArrayExpress database into RDF format. Additionally, the system automatically enables a SPARQL endpoint. This allows users to execute SPARQL queries for retrieving microarray data, either from specific experiments or from more than one experiment at a time. Our system optimizes response times by caching and reusing information from previous queries. In this paper, we describe our methods for achieving this transformation. We show that our approach is complementary to other existing initiatives, such as Bio2RDF, for accessing and retrieving data from the ArrayExpress database. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  11. The research infrastructure of Chinese foundations, a database for Chinese civil society studies

    PubMed Central

    Ma, Ji; Wang, Qun; Dong, Chao; Li, Huafang

    2017-01-01

    This paper provides technical details and user guidance on the Research Infrastructure of Chinese Foundations (RICF), a database of Chinese foundations, civil society, and social development in general. The structure of the RICF is deliberately designed and normalized according to the Three Normal Forms. The database schema consists of three major themes: foundations’ basic organizational profile (i.e., basic profile, board member, supervisor, staff, and related party tables), program information (i.e., program information, major program, program relationship, and major recipient tables), and financial information (i.e., financial position, financial activities, cash flow, activity overview, and large donation tables). The RICF’s data quality can be measured by four criteria: data source reputation and credibility, completeness, accuracy, and timeliness. Data records are properly versioned, allowing verification and replication for research purposes. PMID:28742065

  12. The visible human project®: From body to bits.

    PubMed

    Ackerman, Michael J

    2016-08-01

    In the middle 1990's the U.S. National Library sponsored the acquisition and development of the Visible Human Project® data base. This image database contains anatomical cross-sectional images which allow the reconstruction of three dimensional male and female anatomy to an accuracy of less than 1.0 mm. The male anatomy is contained in a 15 gigabyte database, the female in a 39 gigabyte database. This talk will describe why and how this project was accomplished and demonstrate some of the products which the Visible Human dataset has made possible. I will conclude by describing how the Visible Human Project, completed over 20 years ago, has led the National Library of Medicine to a series of image research projects including an open source image processing toolkit which is included in several commercial products.

  13. Data Applicability of Heritage and New Hardware For Launch Vehicle Reliability Models

    NASA Technical Reports Server (NTRS)

    Al Hassan, Mohammad; Novack, Steven

    2015-01-01

    Bayesian reliability requires the development of a prior distribution to represent degree of belief about the value of a parameter (such as a component's failure rate) before system specific data become available from testing or operations. Generic failure data are often provided in reliability databases as point estimates (mean or median). A component's failure rate is considered a random variable where all possible values are represented by a probability distribution. The applicability of the generic data source is a significant source of uncertainty that affects the spread of the distribution. This presentation discusses heuristic guidelines for quantifying uncertainty due to generic data applicability when developing prior distributions mainly from reliability predictions.

  14. MIPS: a database for protein sequences, homology data and yeast genome information.

    PubMed Central

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  15. cPath: open source software for collecting, storing, and querying biological pathways.

    PubMed

    Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

    2006-11-13

    Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling.

  16. Design considerations, architecture, and use of the Mini-Sentinel distributed data system.

    PubMed

    Curtis, Lesley H; Weiner, Mark G; Boudreau, Denise M; Cooper, William O; Daniel, Gregory W; Nair, Vinit P; Raebel, Marsha A; Beaulieu, Nicolas U; Rosofsky, Robert; Woodworth, Tiffany S; Brown, Jeffrey S

    2012-01-01

    We describe the design, implementation, and use of a large, multiorganizational distributed database developed to support the Mini-Sentinel Pilot Program of the US Food and Drug Administration (FDA). As envisioned by the US FDA, this implementation will inform and facilitate the development of an active surveillance system for monitoring the safety of medical products (drugs, biologics, and devices) in the USA. A common data model was designed to address the priorities of the Mini-Sentinel Pilot and to leverage the experience and data of participating organizations and data partners. A review of existing common data models informed the process. Each participating organization designed a process to extract, transform, and load its source data, applying the common data model to create the Mini-Sentinel Distributed Database. Transformed data were characterized and evaluated using a series of programs developed centrally and executed locally by participating organizations. A secure communications portal was designed to facilitate queries of the Mini-Sentinel Distributed Database and transfer of confidential data, analytic tools were developed to facilitate rapid response to common questions, and distributed querying software was implemented to facilitate rapid querying of summary data. As of July 2011, information on 99,260,976 health plan members was included in the Mini-Sentinel Distributed Database. The database includes 316,009,067 person-years of observation time, with members contributing, on average, 27.0 months of observation time. All data partners have successfully executed distributed code and returned findings to the Mini-Sentinel Operations Center. This work demonstrates the feasibility of building a large, multiorganizational distributed data system in which organizations retain possession of their data that are used in an active surveillance system. Copyright © 2012 John Wiley & Sons, Ltd.

  17. Social media based NPL system to find and retrieve ARM data: Concept paper

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra

    Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra

    Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less

  19. Source profiles of particulate matter emissions from a pilot-scale boiler burning North American coal blends.

    PubMed

    Lee, S W

    2001-11-01

    Recent awareness of suspected adverse health effects from ambient particulate matter (PM) emission has prompted publication of new standards for fine PM with aerodynamic diameter less than 2.5 microm (PM2.5). However, scientific data on fine PM emissions from various point sources and their characteristics are very limited. Source apportionment methods are applied to identify contributions of individual regional sources to tropospheric particulate concentrations. The existing industrial database developed using traditional source measurement techniques provides total emission rates only, with no details on chemical nature or size characteristics of particulates. This database is inadequate, in current form, to address source-receptor relationships. A source dilution system was developed for sampling and characterization of total PM, PM2.5, and PM10 (i.e., PM with aerodynamic diameter less than 10 pm) from residual oil and coal combustion. This new system has automatic control capabilities for key parameters, such as relative humidity (RH), temperature, and sample dilution. During optimization of the prototype equipment, three North American coal blends were burned using a 0.7-megawatt thermal (MWt) pulverized coal-fired, pilot-scale boiler. Characteristic emission profiles, including PM2.5 and total PM soluble acids, and elemental and carbon concentrations for three coal blends are presented. Preliminary results indicate that volatile trace elements such as Pb, Zn, Ti, and Se are preferentially enriched in PM2.5. PM2.5 is also more concentrated in soluble sulfates relative to total PM. Coal fly ash collected at the outlet of the electrostatic precipitator (ESP) contains about 85-90% PM10 and 30-50% PM2.5. Particles contain the highest elemental concentrations of Si and Al while Ca, Fe, Na, Ba, and K also exist as major elements. Approximately 4-12% of the materials exists as soluble sulfates in fly ash generated by coal blends containing 0.2-0.8% sulfur by mass. Source profile data for an eastern U.S. coal show good agreement with those reported from a similar study done in the United States. Based on the inadequacies identified in the initial sampling equipment, a new, plume-simulating fine PM measurement system with modular components for field use is being developed for determining coal combustion PM source profiles from utility boiler stacks.

  20. Online drug databases: a new method to assess and compare inclusion of clinically relevant information.

    PubMed

    Silva, Cristina; Fresco, Paula; Monteiro, Joaquim; Rama, Ana Cristina Ribeiro

    2013-08-01

    Evidence-Based Practice requires health care decisions to be based on the best available evidence. The model "Information Mastery" proposes that clinicians should use sources of information that have previously evaluated relevance and validity, provided at the point of care. Drug databases (DB) allow easy and fast access to information and have the benefit of more frequent content updates. Relevant information, in the context of drug therapy, is that which supports safe and effective use of medicines. Accordingly, the European Guideline on the Summary of Product Characteristics (EG-SmPC) was used as a standard to evaluate the inclusion of relevant information contents in DB. To develop and test a method to evaluate relevancy of DB contents, by assessing the inclusion of information items deemed relevant for effective and safe drug use. Hierarchical organisation and selection of the principles defined in the EGSmPC; definition of criteria to assess inclusion of selected information items; creation of a categorisation and quantification system that allows score calculation; calculation of relative differences (RD) of scores for comparison with an "ideal" database, defined as the one that achieves the best quantification possible for each of the information items; pilot test on a sample of 9 drug databases, using 10 drugs frequently associated in literature with morbidity-mortality and also being widely consumed in Portugal. Main outcome measure Calculate individual and global scores for clinically relevant information items of drug monographs in databases, using the categorisation and quantification system created. A--Method development: selection of sections, subsections, relevant information items and corresponding requisites; system to categorise and quantify their inclusion; score and RD calculation procedure. B--Pilot test: calculated scores for the 9 databases; globally, all databases evaluated significantly differed from the "ideal" database; some DB performed better but performance was inconsistent at subsections level, within the same DB. The method developed allows quantification of the inclusion of relevant information items in DB and comparison with an "ideal database". It is necessary to consult diverse DB in order to find all the relevant information needed to support clinical drug use.

  1. The National Landslide Database of Great Britain: Acquisition, communication and the role of social media

    NASA Astrophysics Data System (ADS)

    Pennington, Catherine; Freeborough, Katy; Dashwood, Claire; Dijkstra, Tom; Lawrie, Kenneth

    2015-11-01

    The British Geological Survey (BGS) is the national geological agency for Great Britain that provides geoscientific information to government, other institutions and the public. The National Landslide Database has been developed by the BGS and is the focus for national geohazard research for landslides in Great Britain. The history and structure of the geospatial database and associated Geographical Information System (GIS) are explained, along with the future developments of the database and its applications. The database is the most extensive source of information on landslides in Great Britain with over 17,000 records of landslide events to date, each documented as fully as possible for inland, coastal and artificial slopes. Data are gathered through a range of procedures, including: incorporation of other databases; automated trawling of current and historical scientific literature and media reports; new field- and desk-based mapping technologies with digital data capture, and using citizen science through social media and other online resources. This information is invaluable for directing the investigation, prevention and mitigation of areas of unstable ground in accordance with Government planning policy guidelines. The national landslide susceptibility map (GeoSure) and a national landslide domains map currently under development, as well as regional mapping campaigns, rely heavily on the information contained within the landslide database. Assessing susceptibility to landsliding requires knowledge of the distribution of failures, an understanding of causative factors, their spatial distribution and likely impacts, whilst understanding the frequency and types of landsliding present is integral to modelling how rainfall will influence the stability of a region. Communication of landslide data through the Natural Hazard Partnership (NHP) and Hazard Impact Model contributes to national hazard mitigation and disaster risk reduction with respect to weather and climate. Daily reports of landslide potential are published by BGS through the NHP partnership and data collected for the National Landslide Database are used widely for the creation of these assessments. The National Landslide Database is freely available via an online GIS and is used by a variety of stakeholders for research purposes.

  2. Using GIS databases for simulated nightlight imagery

    NASA Astrophysics Data System (ADS)

    Zollweg, Joshua D.; Gartley, Michael; Roskovensky, John; Mercier, Jeffery

    2012-06-01

    Proposed is a new technique for simulating nighttime scenes with realistically-modelled urban radiance. While nightlight imagery is commonly used to measure urban sprawl,1 it is uncommon to use urbanization as metric to develop synthetic nighttime scenes. In the developed methodology, the open-source Open Street Map (OSM) Geographic Information System (GIS) database is used. The database is comprised of many nodes, which are used to dene the position of dierent types of streets, buildings, and other features. These nodes are the driver used to model urban nightlights, given several assumptions. The rst assumption is that the spatial distribution of nodes is closely related to the spatial distribution of nightlights. Work by Roychowdhury et al has demonstrated the relationship between urban lights and development. 2 So, the real assumption being made is that the density of nodes corresponds to development, which is reasonable. Secondly, the local density of nodes must relate directly to the upwelled radiance within the given locality. Testing these assumptions using Albuquerque and Indianapolis as example cities revealed that dierent types of nodes produce more realistic results than others. Residential street nodes oered the best performance for any single node type, among the types tested in this investigation. Other node types, however, still provide useful supplementary data. Using streets and buildings dened in the OSM database allowed automated generation of simulated nighttime scenes of Albuquerque and Indianapolis in the Digital Imaging and Remote Sensing Image Generation (DIRSIG) model. The simulation was compared to real data from the recently deployed National Polar-orbiting Operational Environmental Satellite System(NPOESS) Visible Infrared Imager Radiometer Suite (VIIRS) platform. As a result of the comparison, correction functions were used to correct for discrepancies between simulated and observed radiance. Future work will include investigating more advanced approaches for mapping the spatial extent of nightlights, based on the distribution of dierent node types in local neighbourhoods. This will allow the spectral prole of each region to be dynamically adjusted, in addition to simply modifying the magnitude of a single source type.

  3. Identifying known unknowns using the US EPA's CompTox Chemistry Dashboard.

    PubMed

    McEachran, Andrew D; Sobus, Jon R; Williams, Antony J

    2017-03-01

    Chemical features observed using high-resolution mass spectrometry can be tentatively identified using online chemical reference databases by searching molecular formulae and monoisotopic masses and then rank-ordering of the hits using appropriate relevance criteria. The most likely candidate "known unknowns," which are those chemicals unknown to an investigator but contained within a reference database or literature source, rise to the top of a chemical list when rank-ordered by the number of associated data sources. The U.S. EPA's CompTox Chemistry Dashboard is a curated and freely available resource for chemistry and computational toxicology research, containing more than 720,000 chemicals of relevance to environmental health science. In this research, the performance of the Dashboard for identifying known unknowns was evaluated against that of the online ChemSpider database, one of the primary resources used by mass spectrometrists, using multiple previously studied datasets reported in the peer-reviewed literature totaling 162 chemicals. These chemicals were examined using both applications via molecular formula and monoisotopic mass searches followed by rank-ordering of candidate compounds by associated references or data sources. A greater percentage of chemicals ranked in the top position when using the Dashboard, indicating an advantage of this application over ChemSpider for identifying known unknowns using data source ranking. Additional approaches are being developed for inclusion into a non-targeted analysis workflow as part of the CompTox Chemistry Dashboard. This work shows the potential for use of the Dashboard in exposure assessment and risk decision-making through significant improvements in non-targeted chemical identification. Graphical abstract Identifying known unknowns in the US EPA's CompTox Chemistry Dashboard from molecular formula and monoisotopic mass inputs.

  4. Development of a web geoservices platform for School of Environmental Sciences, Mahatma Gandhi University, Kerala, India

    NASA Astrophysics Data System (ADS)

    Satheendran, S.; John, C. M.; Fasalul, F. K.; Aanisa, K. M.

    2014-11-01

    Web geoservices is the obvious graduation of Geographic Information System in a distributed environment through a simple browser. It enables organizations to share domain-specific rich and dynamic spatial information over the web. The present study attempted to design and develop a web enabled GIS application for the School of Environmental Sciences, Mahatma Gandhi University, Kottayam, Kerala, India to publish various geographical databases to the public through its website. The development of this project is based upon the open source tools and techniques. The output portal site is platform independent. The premier webgis frame work `Geomoose' is utilized. Apache server is used as the Web Server and the UMN Map Server is used as the map server for this project. It provides various customised tools to query the geographical database in different ways and search for various facilities in the geographical area like banks, attractive places, hospitals, hotels etc. The portal site was tested with the output geographical database of 2 projects of the School such as 1) the Tourism Information System for the Malabar region of Kerala State consisting of 5 northern districts 2) the geoenvironmental appraisal of the Athirappilly Hydroelectric Project covering the entire Chalakkudy river basin.

  5. Development of a biomarkers database for the National Children's Study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lobdell, Danelle T.; Mendola, Pauline

    The National Children's Study (NCS) is a federally-sponsored, longitudinal study of environmental influences on the health and development of children across the United States (www.nationalchildrensstudy.gov). Current plans are to study approximately 100,000 children and their families beginning before birth up to age 21 years. To explore potential biomarkers that could be important measurements in the NCS, we compiled the relevant scientific literature to identify both routine or standardized biological markers as well as new and emerging biological markers. Although the search criteria encouraged examination of factors that influence the breadth of child health and development, attention was primarily focused onmore » exposure, susceptibility, and outcome biomarkers associated with four important child health outcomes: autism and neurobehavioral disorders, injury, cancer, and asthma. The Biomarkers Database was designed to allow users to: (1) search the biomarker records compiled by type of marker (susceptibility, exposure or effect), sampling media (e.g., blood, urine, etc.), and specific marker name; (2) search the citations file; and (3) read the abstract evaluations relative to our search criteria. A searchable, user-friendly database of over 2000 articles was created and is publicly available at: http://cfpub.epa.gov/ncea/cfm/recordisplay.cfm?deid=85844. PubMed was the primary source of references with some additional searches of Toxline, NTIS, and other reference databases. Our initial focus was on review articles, beginning as early as 1996, supplemented with searches of the recent primary research literature from 2001 to 2003. We anticipate this database will have applicability for the NCS as well as other studies of children's environmental health.« less

  6. An integrated chronostratigraphic data system for the twenty-first century

    USGS Publications Warehouse

    Sikora, P.J.; Ogg, James G.; Gary, A.; Cervato, C.; Gradstein, Felix; Huber, B.T.; Marshall, C.; Stein, J.A.; Wardlaw, B.

    2006-01-01

    Research in stratigraphy is increasingly multidisciplinary and conducted by diverse research teams whose members can be widely separated. This developing distributed-research process, facilitated by the availability of the Internet, promises tremendous future benefits to researchers. However, its full potential is hindered by the absence of a development strategy for the necessary infrastructure. At a National Science Foundation workshop convened in November 2001, thirty quantitative stratigraphers and database specialists from both academia and industry met to discuss how best to integrate their respective chronostratigraphic databases. The main goal was to develop a strategy that would allow efficient distribution and integration of existing data relevant to the study of geologic time. Discussions concentrated on three major themes: database standards and compatibility, strategies and tools for information retrieval and analysis of all types of global and regional stratigraphic data, and future directions for database integration and centralization of currently distributed depositories. The result was a recommendation to establish an integrated chronostratigraphic database, to be called Chronos, which would facilitate greater efficiency in stratigraphic studies (http://www.chronos.org/) . The Chronos system will both provide greater ease of data gathering and allow for multidisciplinary synergies, functions of fundamental importance in a variety of research, including time scale construction, paleoenvironmental analysis, paleoclimatology and paleoceanography. Beyond scientific research, Chronos will also provide educational and societal benefits by providing an accessible source of information of general interest (e.g., mass extinctions) and concern (e.g., climatic change). The National Science Foundation has currently funded a three-year program for implementing Chronos.. ?? 2006 Geological Society of America. All rights reserved.

  7. Development of a Carbon Management Geographic Information System (GIS) for the United States

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Howard Herzog; Holly Javedan

    In this project a Carbon Management Geographical Information System (GIS) for the US was developed. The GIS stored, integrated, and manipulated information relating to the components of carbon management systems. Additionally, the GIS was used to interpret and analyze the effect of developing these systems. This report documents the key deliverables from the project: (1) Carbon Management Geographical Information System (GIS) Documentation; (2) Stationary CO{sub 2} Source Database; (3) Regulatory Data for CCS in United States; (4) CO{sub 2} Capture Cost Estimation; (5) CO{sub 2} Storage Capacity Tools; (6) CO{sub 2} Injection Cost Modeling; (7) CO{sub 2} Pipeline Transport Costmore » Estimation; (8) CO{sub 2} Source-Sink Matching Algorithm; and (9) CO{sub 2} Pipeline Transport and Cost Model.« less

  8. Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database.

    PubMed

    Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P

    2016-08-05

    Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Use of Graph Database for the Integration of Heterogeneous Biological Data.

    PubMed

    Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young

    2017-03-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

  10. Use of Graph Database for the Integration of Heterogeneous Biological Data

    PubMed Central

    Yoon, Byoung-Ha; Kim, Seon-Kyu

    2017-01-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946

  11. The VLITE Post-Processing Pipeline

    NASA Astrophysics Data System (ADS)

    Richards, Emily E.; Clarke, Tracy; Peters, Wendy; Polisensky, Emil; Kassim, Namir E.

    2018-01-01

    A post-processing pipeline to adaptively extract and catalog point sources is being developed to enhance the scientific value and accessibility of data products generated by the VLA Low-band Ionosphere and Transient Experiment (VLITE; ) on the Karl G. Jansky Very Large Array (VLA). In contrast to other radio sky surveys, the commensal observing mode of VLITE results in varying depths, sensitivities, and spatial resolutions across the sky based on the configuration of the VLA, location on the sky, and time on source specified by the primary observer for their independent science objectives. Therefore, previously developed tools and methods for generating source catalogs and survey statistics are not always appropriate for VLITE's diverse and growing set of data. A raw catalog of point sources extracted from every VLITE image will be created from source fit parameters stored in a queryable database. Point sources will be measured using the Python Blob Detector and Source Finder software (PyBDSF; Mohan & Rafferty 2015). Sources in the raw catalog will be associated with previous VLITE detections in a resolution- and sensitivity-dependent manner, and cross-matched to other radio sky surveys to aid in the detection of transient and variable sources. Final data products will include separate, tiered point source catalogs grouped by sensitivity limit and spatial resolution.

  12. Advancements in web-database applications for rabies surveillance.

    PubMed

    Rees, Erin E; Gendron, Bruno; Lelièvre, Frédérick; Coté, Nathalie; Bélanger, Denise

    2011-08-02

    Protection of public health from rabies is informed by the analysis of surveillance data from human and animal populations. In Canada, public health, agricultural and wildlife agencies at the provincial and federal level are responsible for rabies disease control, and this has led to multiple agency-specific data repositories. Aggregation of agency-specific data into one database application would enable more comprehensive data analyses and effective communication among participating agencies. In Québec, RageDB was developed to house surveillance data for the raccoon rabies variant, representing the next generation in web-based database applications that provide a key resource for the protection of public health. RageDB incorporates data from, and grants access to, all agencies responsible for the surveillance of raccoon rabies in Québec. Technological advancements of RageDB to rabies surveillance databases include (1) automatic integration of multi-agency data and diagnostic results on a daily basis; (2) a web-based data editing interface that enables authorized users to add, edit and extract data; and (3) an interactive dashboard to help visualize data simply and efficiently, in table, chart, and cartographic formats. Furthermore, RageDB stores data from citizens who voluntarily report sightings of rabies suspect animals. We also discuss how sightings data can indicate public perception to the risk of racoon rabies and thus aid in directing the allocation of disease control resources for protecting public health. RageDB provides an example in the evolution of spatio-temporal database applications for the storage, analysis and communication of disease surveillance data. The database was fast and inexpensive to develop by using open-source technologies, simple and efficient design strategies, and shared web hosting. The database increases communication among agencies collaborating to protect human health from raccoon rabies. Furthermore, health agencies have real-time access to a wide assortment of data documenting new developments in the raccoon rabies epidemic and this enables a more timely and appropriate response.

  13. Advancements in web-database applications for rabies surveillance

    PubMed Central

    2011-01-01

    Background Protection of public health from rabies is informed by the analysis of surveillance data from human and animal populations. In Canada, public health, agricultural and wildlife agencies at the provincial and federal level are responsible for rabies disease control, and this has led to multiple agency-specific data repositories. Aggregation of agency-specific data into one database application would enable more comprehensive data analyses and effective communication among participating agencies. In Québec, RageDB was developed to house surveillance data for the raccoon rabies variant, representing the next generation in web-based database applications that provide a key resource for the protection of public health. Results RageDB incorporates data from, and grants access to, all agencies responsible for the surveillance of raccoon rabies in Québec. Technological advancements of RageDB to rabies surveillance databases include 1) automatic integration of multi-agency data and diagnostic results on a daily basis; 2) a web-based data editing interface that enables authorized users to add, edit and extract data; and 3) an interactive dashboard to help visualize data simply and efficiently, in table, chart, and cartographic formats. Furthermore, RageDB stores data from citizens who voluntarily report sightings of rabies suspect animals. We also discuss how sightings data can indicate public perception to the risk of racoon rabies and thus aid in directing the allocation of disease control resources for protecting public health. Conclusions RageDB provides an example in the evolution of spatio-temporal database applications for the storage, analysis and communication of disease surveillance data. The database was fast and inexpensive to develop by using open-source technologies, simple and efficient design strategies, and shared web hosting. The database increases communication among agencies collaborating to protect human health from raccoon rabies. Furthermore, health agencies have real-time access to a wide assortment of data documenting new developments in the raccoon rabies epidemic and this enables a more timely and appropriate response. PMID:21810215

  14. AQUIS: A PC-based air inventory and permit manager

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, A.E.; Huber, C.C.; Tschanz, J.

    1992-01-01

    The Air Quality Utility Information System (AQUIS) was developed to calculate and track sources, emissions, stacks, permits, and related information. The system runs on IBM-compatible personal computers with dBASE IV and tracks more than 1,200 data items distributed among various source categories. AQUIS is currently operating at nine US Air Force facilities that have up to 1,000 sources. The system provides a flexible reporting capability that permits users who are unfamiliar with database structure to design and prepare reports containing user-specified information. In addition to six criteria pollutants, AQUIS calculates compound-specific emissions and allows users to enter their own emissionmore » estimates.« less

  15. A Comparative Analysis of Transitions from Education to Work in Europe (CATEWE). Final Report [and] Annex to the Final Report.

    ERIC Educational Resources Information Center

    Smyth, Emer; Gangl, Markus; Raffe, David; Hannan, Damian F.; McCoy, Selina

    This project aimed to develop a more comprehensive conceptual framework of school-to-work transitions in different national contexts and apply this framework to the empirical analysis of transition processes across European countries. It drew on these two data sources: European Community Labor Force Survey and integrated databases on national…

  16. Enabling a systems biology knowledgebase with gaggle and firegoose

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baliga, Nitin S.

    The overall goal of this project was to extend the existing Gaggle and Firegoose systems to develop an open-source technology that runs over the web and links desktop applications with many databases and software applications. This technology would enable researchers to incorporate workflows for data analysis that can be executed from this interface to other online applications. The four specific aims were to (1) provide one-click mapping of genes, proteins, and complexes across databases and species; (2) enable multiple simultaneous workflows; (3) expand sophisticated data analysis for online resources; and enhance open-source development of the Gaggle-Firegoose infrastructure. Gaggle is anmore » open-source Java software system that integrates existing bioinformatics programs and data sources into a user-friendly, extensible environment to allow interactive exploration, visualization, and analysis of systems biology data. Firegoose is an extension to the Mozilla Firefox web browser that enables data transfer between websites and desktop tools including Gaggle. In the last phase of this funding period, we have made substantial progress on development and application of the Gaggle integration framework. We implemented the workspace to the Network Portal. Users can capture data from Firegoose and save them to the workspace. Users can create workflows to start multiple software components programmatically and pass data between them. Results of analysis can be saved to the cloud so that they can be easily restored on any machine. We also developed the Gaggle Chrome Goose, a plugin for the Google Chrome browser in tandem with an opencpu server in the Amazon EC2 cloud. This allows users to interactively perform data analysis on a single web page using the R packages deployed on the opencpu server. The cloud-based framework facilitates collaboration between researchers from multiple organizations. We have made a number of enhancements to the cmonkey2 application to enable and improve the integration within different environments, and we have created a new tools pipeline for generating EGRIN2 models in a largely automated way.« less

  17. QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors.

    PubMed

    Tarasova, Olga A; Urusova, Aleksandra F; Filimonov, Dmitry A; Nicklaus, Marc C; Zakharov, Alexey V; Poroikov, Vladimir V

    2015-07-27

    Large-scale databases are important sources of training sets for various QSAR modeling approaches. Generally, these databases contain information extracted from different sources. This variety of sources can produce inconsistency in the data, defined as sometimes widely diverging activity results for the same compound against the same target. Because such inconsistency can reduce the accuracy of predictive models built from these data, we are addressing the question of how best to use data from publicly and commercially accessible databases to create accurate and predictive QSAR models. We investigate the suitability of commercially and publicly available databases to QSAR modeling of antiviral activity (HIV-1 reverse transcriptase (RT) inhibition). We present several methods for the creation of modeling (i.e., training and test) sets from two, either commercially or freely available, databases: Thomson Reuters Integrity and ChEMBL. We found that the typical predictivities of QSAR models obtained using these different modeling set compilation methods differ significantly from each other. The best results were obtained using training sets compiled for compounds tested using only one method and material (i.e., a specific type of biological assay). Compound sets aggregated by target only typically yielded poorly predictive models. We discuss the possibility of "mix-and-matching" assay data across aggregating databases such as ChEMBL and Integrity and their current severe limitations for this purpose. One of them is the general lack of complete and semantic/computer-parsable descriptions of assay methodology carried by these databases that would allow one to determine mix-and-matchability of result sets at the assay level.

  18. New UV-source catalogs, UV spectral database, UV variables and science tools from the GALEX surveys

    NASA Astrophysics Data System (ADS)

    Bianchi, Luciana; de la Vega, Alexander; Shiao, Bernard; Bohlin, Ralph

    2018-03-01

    We present a new, expanded and improved catalog of Ultraviolet (UV) sources from the GALEX All-Sky Imaging survey: GUVcat_AIS (Bianchi et al. in Astrophys. J. Suppl. Ser. 230:24, 2017). The catalog includes 83 million unique sources (duplicate measurements and rim artifacts are removed) measured in far-UV and near-UV. With respect to previous versions (Bianchi et al. in Mon. Not. R. Astron. Soc. 411:2770 2011a, Adv. Space Res. 53:900-991, 2014), GUVcat_AIS covers a slightly larger area, 24,790 square degrees, and includes critical corrections and improvements, as well as new tags, in particular to identify sources in the footprint of extended objects, where pipeline source detection may fail and custom-photometry may be necessary. The UV unique-source catalog facilitates studies of density of sources, and matching of the UV samples with databases at other wavelengths. We also present first results from two ongoing projects, addressing respectively UV variability searches on time scales from seconds to years by mining the GALEX photon archive, and the construction of a database of ˜120,000 GALEX UV spectra (range ˜1300-3000 Å), including quality and calibration assessment and classification of the grism, hence serendipitous, spectral sources.

  19. Modernization and new technologies: Coping with the information explosion

    NASA Technical Reports Server (NTRS)

    Blados, Walter R.; Cotter, Gladys A.

    1993-01-01

    Information has become a valuable and strategic resource in all societies and economies. Scientific and technical information is especially important in developing and maintaining a strong national science and technology base. The expanding use of information technology, the growth of interdisciplinary research, and an increase in international collaboration are changing characteristics of information. This modernization effort applies new technology to current processes to provide near-term benefits to the user. At the same time, we are developing a long-term modernization strategy designed to transition the program to a multimedia, global 'library without walls'. Notwithstanding this modernization program, it is recogized that no one information center can hope to collect all the relevant data. We see information and information systems changing and becoming more international in scope. We are finding that many nations are expending resources on national systems which duplicate each other. At the same time that this duplication exists, many useful sources of aerospace information are not being collected to cover expanded sources of information. This paper reviews the NASA modernization program and raises for consideration new possibilities for unification of the various aerospace database efforts toward a cooperative international aerospace database initiative, one that can optimize the cost/benefit equation for all participants.

  20. MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics.

    PubMed

    Jeffryes, James G; Colastani, Ricardo L; Elbadawi-Sidhu, Mona; Kind, Tobias; Niehaus, Thomas D; Broadbelt, Linda J; Hanson, Andrew D; Fiehn, Oliver; Tyo, Keith E J; Henry, Christopher S

    2015-01-01

    In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures. Graphical abstractMINE database construction and access methods. The process of constructing a MINE database from the curated source databases is depicted on the left. The methods for accessing the database are shown on the right.

Top