Sample records for multiple source databases

  1. Argonne Geothermal Geochemical Database v2.0

    DOE Data Explorer

    Harto, Christopher

    2013-05-22

    A database of geochemical data from potential geothermal sources aggregated from multiple sources as of March 2010. The database contains fields for the location, depth, temperature, pH, total dissolved solids concentration, chemical composition, and date of sampling. A separate tab contains data on non-condensible gas compositions. The database contains records for over 50,000 wells, although many entries are incomplete. Current versions of source documentation are listed in the dataset.

  2. Freshwater Biological Traits Database (Traits)

    EPA Pesticide Factsheets

    The traits database was compiled for a project on climate change effects on river and stream ecosystems. The traits data, gathered from multiple sources, focused on information published or otherwise well-documented by trustworthy sources.

  3. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases

    PubMed Central

    Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B

    2015-01-01

    Objectives To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Materials and methods Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Results Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. Discussion The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Conclusion Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. PMID:25670757

  4. The Multiple-Institution Database for Investigating Engineering Longitudinal Development: An Experiential Case Study of Data Sharing and Reuse

    ERIC Educational Resources Information Center

    Ohland, Matthew W.; Long, Russell A.

    2016-01-01

    Sharing longitudinal student record data and merging data from different sources is critical to addressing important questions being asked of higher education. The Multiple-Institution Database for Investigating Engineering Longitudinal Development (MIDFIELD) is a multi-institution, longitudinal, student record level dataset that is used to answer…

  5. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases.

    PubMed

    Voss, Erica A; Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B

    2015-05-01

    To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  6. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases.

    PubMed

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-10-18

    Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at http://www.ebi.ac.uk/Tools/picr.

  7. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases

    PubMed Central

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-01-01

    Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. Results We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at . PMID:17945017

  8. Development of a One-Stop Data Search and Discovery Engine using Ontologies for Semantic Mappings (HydroSeek)

    NASA Astrophysics Data System (ADS)

    Piasecki, M.; Beran, B.

    2007-12-01

    Search engines have changed the way we see the Internet. The ability to find the information by just typing in keywords was a big contribution to the overall web experience. While the conventional search engine methodology worked well for textual documents, locating scientific data remains a problem since they are stored in databases not readily accessible by search engine bots. Considering different temporal, spatial and thematic coverage of different databases, especially for interdisciplinary research it is typically necessary to work with multiple data sources. These sources can be federal agencies which generally offer national coverage or regional sources which cover a smaller area with higher detail. However for a given geographic area of interest there often exists more than one database with relevant data. Thus being able to query multiple databases simultaneously is a desirable feature that would be tremendously useful for scientists. Development of such a search engine requires dealing with various heterogeneity issues. In scientific databases, systems often impose controlled vocabularies which ensure that they are generally homogeneous within themselves but are semantically heterogeneous when moving between different databases. This defines the boundaries of possible semantic related problems making it easier to solve than with the conventional search engines that deal with free text. We have developed a search engine that enables querying multiple data sources simultaneously and returns data in a standardized output despite the aforementioned heterogeneity issues between the underlying systems. This application relies mainly on metadata catalogs or indexing databases, ontologies and webservices with virtual globe and AJAX technologies for the graphical user interface. Users can trigger a search of dozens of different parameters over hundreds of thousands of stations from multiple agencies by providing a keyword, a spatial extent, i.e. a bounding box, and a temporal bracket. As part of this development we have also added an environment that allows users to do some of the semantic tagging, i.e. the linkage of a variable name (which can be anything they desire) to defined concepts in the ontology structure which in turn provides the backbone of the search engine.

  9. Consumer Product Category Database

    EPA Pesticide Factsheets

    The Chemical and Product Categories database (CPCat) catalogs the use of over 40,000 chemicals and their presence in different consumer products. The chemical use information is compiled from multiple sources while product information is gathered from publicly available Material Safety Data Sheets (MSDS). EPA researchers are evaluating the possibility of expanding the database with additional product and use information.

  10. The Chandra Source Catalog: Storage and Interfaces

    NASA Astrophysics Data System (ADS)

    van Stone, David; Harbo, Peter N.; Tibbetts, Michael S.; Zografou, Panagoula; Evans, Ian N.; Primini, Francis A.; Glotfelty, Kenny J.; Anderson, Craig S.; Bonaventura, Nina R.; Chen, Judy C.; Davis, John E.; Doe, Stephen M.; Evans, Janet D.; Fabbiano, Giuseppina; Galle, Elizabeth C.; Gibbs, Danny G., II; Grier, John D.; Hain, Roger; Hall, Diane M.; He, Xiang Qun (Helen); Houck, John C.; Karovska, Margarita; Kashyap, Vinay L.; Lauer, Jennifer; McCollough, Michael L.; McDowell, Jonathan C.; Miller, Joseph B.; Mitschang, Arik W.; Morgan, Douglas L.; Mossman, Amy E.; Nichols, Joy S.; Nowak, Michael A.; Plummer, David A.; Refsdal, Brian L.; Rots, Arnold H.; Siemiginowska, Aneta L.; Sundheim, Beth A.; Winkelman, Sherry L.

    2009-09-01

    The Chandra Source Catalog (CSC) is part of the Chandra Data Archive (CDA) at the Chandra X-ray Center. The catalog contains source properties and associated data objects such as images, spectra, and lightcurves. The source properties are stored in relational databases and the data objects are stored in files with their metadata stored in databases. The CDA supports different versions of the catalog: multiple fixed release versions and a live database version. There are several interfaces to the catalog: CSCview, a graphical interface for building and submitting queries and for retrieving data objects; a command-line interface for property and source searches using ADQL; and VO-compliant services discoverable though the VO registry. This poster describes the structure of the catalog and provides an overview of the interfaces.

  11. Evaluating the Use of Existing Data Sources, Probabilistic Linkage, and Multiple Imputation to Build Population-based Injury Databases Across Phases of Trauma Care

    PubMed Central

    Newgard, Craig; Malveau, Susan; Staudenmayer, Kristan; Wang, N. Ewen; Hsia, Renee Y.; Mann, N. Clay; Holmes, James F.; Kuppermann, Nathan; Haukoos, Jason S.; Bulger, Eileen M.; Dai, Mengtao; Cook, Lawrence J.

    2012-01-01

    Objectives The objective was to evaluate the process of using existing data sources, probabilistic linkage, and multiple imputation to create large population-based injury databases matched to outcomes. Methods This was a retrospective cohort study of injured children and adults transported by 94 emergency medical systems (EMS) agencies to 122 hospitals in seven regions of the western United States over a 36-month period (2006 to 2008). All injured patients evaluated by EMS personnel within specific geographic catchment areas were included, regardless of field disposition or outcome. The authors performed probabilistic linkage of EMS records to four hospital and postdischarge data sources (emergency department [ED] data, patient discharge data, trauma registries, and vital statistics files) and then handled missing values using multiple imputation. The authors compare and evaluate matched records, match rates (proportion of matches among eligible patients), and injury outcomes within and across sites. Results There were 381,719 injured patients evaluated by EMS personnel in the seven regions. Among transported patients, match rates ranged from 14.9% to 87.5% and were directly affected by the availability of hospital data sources and proportion of missing values for key linkage variables. For vital statistics records (1-year mortality), estimated match rates ranged from 88.0% to 98.7%. Use of multiple imputation (compared to complete case analysis) reduced bias for injury outcomes, although sample size, percentage missing, type of variable, and combined-site versus single-site imputation models all affected the resulting estimates and variance. Conclusions This project demonstrates the feasibility and describes the process of constructing population-based injury databases across multiple phases of care using existing data sources and commonly available analytic methods. Attention to key linkage variables and decisions for handling missing values can be used to increase match rates between data sources, minimize bias, and preserve sampling design. PMID:22506952

  12. CGDSNPdb: a database resource for error-checked and imputed mouse SNPs.

    PubMed

    Hutchins, Lucie N; Ding, Yueming; Szatkiewicz, Jin P; Von Smith, Randy; Yang, Hyuna; de Villena, Fernando Pardo-Manuel; Churchill, Gary A; Graber, Joel H

    2010-07-06

    The Center for Genome Dynamics Single Nucleotide Polymorphism Database (CGDSNPdb) is an open-source value-added database with more than nine million mouse single nucleotide polymorphisms (SNPs), drawn from multiple sources, with genotypes assigned to multiple inbred strains of laboratory mice. All SNPs are checked for accuracy and annotated for properties specific to the SNP as well as those implied by changes to overlapping protein-coding genes. CGDSNPdb serves as the primary interface to two unique data sets, the 'imputed genotype resource' in which a Hidden Markov Model was used to assess local haplotypes and the most probable base assignment at several million genomic loci in tens of strains of mice, and the Affymetrix Mouse Diversity Genotyping Array, a high density microarray with over 600,000 SNPs and over 900,000 invariant genomic probes. CGDSNPdb is accessible online through either a web-based query tool or a MySQL public login. Database URL: http://cgd.jax.org/cgdsnpdb/

  13. What Do They Tell Their Students? Business Faculty Acceptance of the Web and Library Databases for Student Research

    ERIC Educational Resources Information Center

    Dewald, Nancy H.

    2005-01-01

    Business faculty were surveyed as to their use of free Web resources and subscription databases for their own and their students' research. A much higher percentage of respondents either require or encourage Web use by their students than require or encourage database use, though most also advise use of multiple sources.

  14. Filling Terrorism Gaps: VEOs, Evaluating Databases, and Applying Risk Terrain Modeling to Terrorism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hagan, Ross F.

    2016-08-29

    This paper aims to address three issues: the lack of literature differentiating terrorism and violent extremist organizations (VEOs), terrorism incident databases, and the applicability of Risk Terrain Modeling (RTM) to terrorism. Current open source literature and publicly available government sources do not differentiate between terrorism and VEOs; furthermore, they fail to define them. Addressing the lack of a comprehensive comparison of existing terrorism data sources, a matrix comparing a dozen terrorism databases is constructed, providing insight toward the array of data available. RTM, a method for spatial risk analysis at a micro level, has some applicability to terrorism research, particularlymore » for studies looking at risk indicators of terrorism. Leveraging attack data from multiple databases, combined with RTM, offers one avenue for closing existing research gaps in terrorism literature.« less

  15. NoSQL data model for semi-automatic integration of ethnomedicinal plant data from multiple sources.

    PubMed

    Ningthoujam, Sanjoy Singh; Choudhury, Manabendra Dutta; Potsangbam, Kumar Singh; Chetia, Pankaj; Nahar, Lutfun; Sarker, Satyajit D; Basar, Norazah; Das Talukdar, Anupam

    2014-01-01

    Sharing traditional knowledge with the scientific community could refine scientific approaches to phytochemical investigation and conservation of ethnomedicinal plants. As such, integration of traditional knowledge with scientific data using a single platform for sharing is greatly needed. However, ethnomedicinal data are available in heterogeneous formats, which depend on cultural aspects, survey methodology and focus of the study. Phytochemical and bioassay data are also available from many open sources in various standards and customised formats. To design a flexible data model that could integrate both primary and curated ethnomedicinal plant data from multiple sources. The current model is based on MongoDB, one of the Not only Structured Query Language (NoSQL) databases. Although it does not contain schema, modifications were made so that the model could incorporate both standard and customised ethnomedicinal plant data format from different sources. The model presented can integrate both primary and secondary data related to ethnomedicinal plants. Accommodation of disparate data was accomplished by a feature of this database that supported a different set of fields for each document. It also allowed storage of similar data having different properties. The model presented is scalable to a highly complex level with continuing maturation of the database, and is applicable for storing, retrieving and sharing ethnomedicinal plant data. It can also serve as a flexible alternative to a relational and normalised database. Copyright © 2014 John Wiley & Sons, Ltd.

  16. Valid Statistical Analysis for Logistic Regression with Multiple Sources

    NASA Astrophysics Data System (ADS)

    Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.

    Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.

  17. Building a Database for a Quantitative Model

    NASA Technical Reports Server (NTRS)

    Kahn, C. Joseph; Kleinhammer, Roger

    2014-01-01

    A database can greatly benefit a quantitative analysis. The defining characteristic of a quantitative risk, or reliability, model is the use of failure estimate data. Models can easily contain a thousand Basic Events, relying on hundreds of individual data sources. Obviously, entering so much data by hand will eventually lead to errors. Not so obviously entering data this way does not aid linking the Basic Events to the data sources. The best way to organize large amounts of data on a computer is with a database. But a model does not require a large, enterprise-level database with dedicated developers and administrators. A database built in Excel can be quite sufficient. A simple spreadsheet database can link every Basic Event to the individual data source selected for them. This database can also contain the manipulations appropriate for how the data is used in the model. These manipulations include stressing factors based on use and maintenance cycles, dormancy, unique failure modes, the modeling of multiple items as a single "Super component" Basic Event, and Bayesian Updating based on flight and testing experience. A simple, unique metadata field in both the model and database provides a link from any Basic Event in the model to its data source and all relevant calculations. The credibility for the entire model often rests on the credibility and traceability of the data.

  18. The BioGRID Interaction Database: 2011 update

    PubMed Central

    Stark, Chris; Breitkreutz, Bobby-Joe; Chatr-aryamontri, Andrew; Boucher, Lorrie; Oughtred, Rose; Livstone, Michael S.; Nixon, Julie; Van Auken, Kimberly; Wang, Xiaodong; Shi, Xiaoqi; Reguly, Teresa; Rust, Jennifer M.; Winter, Andrew; Dolinski, Kara; Tyers, Mike

    2011-01-01

    The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions. PMID:21071413

  19. PSD Applicability Determination for Multiple Owner/Operator Point Sources Within a Single Facility

    EPA Pesticide Factsheets

    This document may be of assistance in applying the New Source Review (NSR) air permitting regulations including the Prevention of Significant Deterioration (PSD) requirements. This document is part of the NSR Policy and Guidance Database. Some documents in the database are a scanned or retyped version of a paper photocopy of the original. Although we have taken considerable effort to quality assure the documents, some may contain typographical errors. Contact the office that issued the document if you need a copy of the original.

  20. SQL is Dead; Long-live SQL: Relational Database Technology in Science Contexts

    NASA Astrophysics Data System (ADS)

    Howe, B.; Halperin, D.

    2014-12-01

    Relational databases are often perceived as a poor fit in science contexts: Rigid schemas, poor support for complex analytics, unpredictable performance, significant maintenance and tuning requirements --- these idiosyncrasies often make databases unattractive in science contexts characterized by heterogeneous data sources, complex analysis tasks, rapidly changing requirements, and limited IT budgets. In this talk, I'll argue that although the value proposition of typical relational database systems are weak in science, the core ideas that power relational databases have become incredibly prolific in open source science software, and are emerging as a universal abstraction for both big data and small data. In addition, I'll talk about two open source systems we are building to "jailbreak" the core technology of relational databases and adapt them for use in science. The first is SQLShare, a Database-as-a-Service system supporting collaborative data analysis and exchange by reducing database use to an Upload-Query-Share workflow with no installation, schema design, or configuration required. The second is Myria, a service that supports much larger scale data, complex analytics, and supports multiple back end systems. Finally, I'll describe some of the ways our collaborators in oceanography, astronomy, biology, fisheries science, and more are using these systems to replace script-based workflows for reasons of performance, flexibility, and convenience.

  1. Searching Across the International Space Station Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; McDermott, William J.; Smith, Ernest E.; Bell, David G.; Gurram, Mohana

    2007-01-01

    Data access in the enterprise generally requires us to combine data from different sources and different formats. It is advantageous thus to focus on the intersection of the knowledge across sources and domains; keeping irrelevant knowledge around only serves to make the integration more unwieldy and more complicated than necessary. A context search over multiple domain is proposed in this paper to use context sensitive queries to support disciplined manipulation of domain knowledge resources. The objective of a context search is to provide the capability for interrogating many domain knowledge resources, which are largely semantically disjoint. The search supports formally the tasks of selecting, combining, extending, specializing, and modifying components from a diverse set of domains. This paper demonstrates a new paradigm in composition of information for enterprise applications. In particular, it discusses an approach to achieving data integration across multiple sources, in a manner that does not require heavy investment in database and middleware maintenance. This lean approach to integration leads to cost-effectiveness and scalability of data integration with an underlying schemaless object-relational database management system. This highly scalable, information on demand system framework, called NX-Search, which is an implementation of an information system built on NETMARK. NETMARK is a flexible, high-throughput open database integration framework for managing, storing, and searching unstructured or semi-structured arbitrary XML and HTML used widely at the National Aeronautics Space Administration (NASA) and industry.

  2. JBioWH: an open-source Java framework for bioinformatics data integration

    PubMed Central

    Vera, Roberto; Perez-Riverol, Yasset; Perez, Sonia; Ligeti, Balázs; Kertész-Farkas, Attila; Pongor, Sándor

    2013-01-01

    The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh PMID:23846595

  3. JBioWH: an open-source Java framework for bioinformatics data integration.

    PubMed

    Vera, Roberto; Perez-Riverol, Yasset; Perez, Sonia; Ligeti, Balázs; Kertész-Farkas, Attila; Pongor, Sándor

    2013-01-01

    The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh.

  4. OrChem - An open source chemistry search engine for Oracle(R).

    PubMed

    Rijnbeek, Mark; Steinbeck, Christoph

    2009-10-22

    Registration, indexing and searching of chemical structures in relational databases is one of the core areas of cheminformatics. However, little detail has been published on the inner workings of search engines and their development has been mostly closed-source. We decided to develop an open source chemistry extension for Oracle, the de facto database platform in the commercial world. Here we present OrChem, an extension for the Oracle 11G database that adds registration and indexing of chemical structures to support fast substructure and similarity searching. The cheminformatics functionality is provided by the Chemistry Development Kit. OrChem provides similarity searching with response times in the order of seconds for databases with millions of compounds, depending on a given similarity cut-off. For substructure searching, it can make use of multiple processor cores on today's powerful database servers to provide fast response times in equally large data sets. OrChem is free software and can be redistributed and/or modified under the terms of the GNU Lesser General Public License as published by the Free Software Foundation. All software is available via http://orchem.sourceforge.net.

  5. Use of Graph Database for the Integration of Heterogeneous Biological Data.

    PubMed

    Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young

    2017-03-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

  6. Use of Graph Database for the Integration of Heterogeneous Biological Data

    PubMed Central

    Yoon, Byoung-Ha; Kim, Seon-Kyu

    2017-01-01

    Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946

  7. Spatio-Temporal Data Model for Integrating Evolving Nation-Level Datasets

    NASA Astrophysics Data System (ADS)

    Sorokine, A.; Stewart, R. N.

    2017-10-01

    Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc.) and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets). Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.

  8. Development of a statewide trauma registry using multiple linked sources of data.

    PubMed Central

    Clark, D. E.

    1993-01-01

    In order to develop a cost-effective method of injury surveillance and trauma system evaluation in a rural state, computer programs were written linking records from two major hospital trauma registries, a statewide trauma tracking study, hospital discharge abstracts, death certificates, and ambulance run reports. A general-purpose database management system, programming language, and operating system were used. Data from 1991 appeared to be successfully linked using only indirect identifying information. Familiarity with local geography and the idiosyncracies of each data source were helpful in programming for effective matching of records. For each individual case identified in this way, data from all available sources were then merged and imported into a standard database format. This inexpensive, population-based approach, maintaining flexibility for end-users with some database training, may be adaptable for other regions. There is a need for further improvement and simplification of the record-linkage process for this and similar purposes. PMID:8130556

  9. SIMS: addressing the problem of heterogeneity in databases

    NASA Astrophysics Data System (ADS)

    Arens, Yigal

    1997-02-01

    The heterogeneity of remotely accessible databases -- with respect to contents, query language, semantics, organization, etc. -- presents serious obstacles to convenient querying. The SIMS (single interface to multiple sources) system addresses this global integration problem. It does so by defining a single language for describing the domain about which information is stored in the databases and using this language as the query language. Each database to which SIMS is to provide access is modeled using this language. The model describes a database's contents, organization, and other relevant features. SIMS uses these models, together with a planning system drawing on techniques from artificial intelligence, to decompose a given user's high-level query into a series of queries against the databases and other data manipulation steps. The retrieval plan is constructed so as to minimize data movement over the network and maximize parallelism to increase execution speed. SIMS can recover from network failures during plan execution by obtaining data from alternate sources, when possible. SIMS has been demonstrated in the domains of medical informatics and logistics, using real databases.

  10. An image database management system for conducting CAD research

    NASA Astrophysics Data System (ADS)

    Gruszauskas, Nicholas; Drukker, Karen; Giger, Maryellen L.

    2007-03-01

    The development of image databases for CAD research is not a trivial task. The collection and management of images and their related metadata from multiple sources is a time-consuming but necessary process. By standardizing and centralizing the methods in which these data are maintained, one can generate subsets of a larger database that match the specific criteria needed for a particular research project in a quick and efficient manner. A research-oriented management system of this type is highly desirable in a multi-modality CAD research environment. An online, webbased database system for the storage and management of research-specific medical image metadata was designed for use with four modalities of breast imaging: screen-film mammography, full-field digital mammography, breast ultrasound and breast MRI. The system was designed to consolidate data from multiple clinical sources and provide the user with the ability to anonymize the data. Input concerning the type of data to be stored as well as desired searchable parameters was solicited from researchers in each modality. The backbone of the database was created using MySQL. A robust and easy-to-use interface for entering, removing, modifying and searching information in the database was created using HTML and PHP. This standardized system can be accessed using any modern web-browsing software and is fundamental for our various research projects on computer-aided detection, diagnosis, cancer risk assessment, multimodality lesion assessment, and prognosis. Our CAD database system stores large amounts of research-related metadata and successfully generates subsets of cases that match the user's desired search criteria.

  11. New opportunities of real-world data from clinical routine settings in life-cycle management of drugs: example of an integrative approach in multiple sclerosis.

    PubMed

    Rothenbacher, Dietrich; Capkun, Gorana; Uenal, Hatice; Tumani, Hayrettin; Geissbühler, Yvonne; Tilson, Hugh

    2015-05-01

    The assessment and demonstration of a positive benefit-risk balance of a drug is a life-long process and includes specific data from preclinical, clinical development and post-launch experience. However, new integrative approaches are needed to enrich evidence from clinical trials and sponsor-initiated observational studies with information from multiple additional sources, including registry information and other existing observational data and, more recently, health-related administrative claims and medical records databases. To illustrate the value of this approach, this paper exemplifies such a cross-package approach to the area of multiple sclerosis, exploring also possible analytic strategies when using these multiple sources of information.

  12. Combining Multiple Knowledge Sources for Speech Recognition

    DTIC Science & Technology

    1988-09-15

    Thus, the first is thle to clarify the pronunciationt ( TASSEAJ for the acronym TASA !). best adaptation sentence, the second sentence, whens addled...10 rapid adapltati,,n sen- tenrces, and 15 spell-i,, de phrases. 6101 resource rirailageo lei SPEAKER-DEPENDENT DATABASE sentences were randortily...combining the smoothed phoneme models with the de - system tested on a standard database using two well de . tailed context models. BYBLOS makes maximal use

  13. OrChem - An open source chemistry search engine for Oracle®

    PubMed Central

    2009-01-01

    Background Registration, indexing and searching of chemical structures in relational databases is one of the core areas of cheminformatics. However, little detail has been published on the inner workings of search engines and their development has been mostly closed-source. We decided to develop an open source chemistry extension for Oracle, the de facto database platform in the commercial world. Results Here we present OrChem, an extension for the Oracle 11G database that adds registration and indexing of chemical structures to support fast substructure and similarity searching. The cheminformatics functionality is provided by the Chemistry Development Kit. OrChem provides similarity searching with response times in the order of seconds for databases with millions of compounds, depending on a given similarity cut-off. For substructure searching, it can make use of multiple processor cores on today's powerful database servers to provide fast response times in equally large data sets. Availability OrChem is free software and can be redistributed and/or modified under the terms of the GNU Lesser General Public License as published by the Free Software Foundation. All software is available via http://orchem.sourceforge.net. PMID:20298521

  14. Development of a consumer product ingredient database for chemical exposure screening and prioritization.

    PubMed

    Goldsmith, M-R; Grulke, C M; Brooks, R D; Transue, T R; Tan, Y M; Frame, A; Egeghy, P P; Edwards, R; Chang, D T; Tornero-Velez, R; Isaacs, K; Wang, A; Johnson, J; Holm, K; Reich, M; Mitchell, J; Vallero, D A; Phillips, L; Phillips, M; Wambaugh, J F; Judson, R S; Buckley, T J; Dary, C C

    2014-03-01

    Consumer products are a primary source of chemical exposures, yet little structured information is available on the chemical ingredients of these products and the concentrations at which ingredients are present. To address this data gap, we created a database of chemicals in consumer products using product Material Safety Data Sheets (MSDSs) publicly provided by a large retailer. The resulting database represents 1797 unique chemicals mapped to 8921 consumer products and a hierarchy of 353 consumer product "use categories" within a total of 15 top-level categories. We examine the utility of this database and discuss ways in which it will support (i) exposure screening and prioritization, (ii) generic or framework formulations for several indoor/consumer product exposure modeling initiatives, (iii) candidate chemical selection for monitoring near field exposure from proximal sources, and (iv) as activity tracers or ubiquitous exposure sources using "chemical space" map analyses. Chemicals present at high concentrations and across multiple consumer products and use categories that hold high exposure potential are identified. Our database is publicly available to serve regulators, retailers, manufacturers, and the public for predictive screening of chemicals in new and existing consumer products on the basis of exposure and risk. Published by Elsevier Ltd.

  15. Human health risk assessment database, "the NHSRC toxicity value database": supporting the risk assessment process at US EPA's National Homeland Security Research Center.

    PubMed

    Moudgal, Chandrika J; Garrahan, Kevin; Brady-Roberts, Eletha; Gavrelis, Naida; Arbogast, Michelle; Dun, Sarah

    2008-11-15

    The toxicity value database of the United States Environmental Protection Agency's (EPA) National Homeland Security Research Center has been in development since 2004. The toxicity value database includes a compilation of agent property, toxicity, dose-response, and health effects data for 96 agents: 84 chemical and radiological agents and 12 biotoxins. The database is populated with multiple toxicity benchmark values and agent property information from secondary sources, with web links to the secondary sources, where available. A selected set of primary literature citations and associated dose-response data are also included. The toxicity value database offers a powerful means to quickly and efficiently gather pertinent toxicity and dose-response data for a number of agents that are of concern to the nation's security. This database, in conjunction with other tools, will play an important role in understanding human health risks, and will provide a means for risk assessors and managers to make quick and informed decisions on the potential health risks and determine appropriate responses (e.g., cleanup) to agent release. A final, stand alone MS ACESSS working version of the toxicity value database was completed in November, 2007.

  16. Saada: A Generator of Astronomical Database

    NASA Astrophysics Data System (ADS)

    Michel, L.

    2011-11-01

    Saada transforms a set of heterogeneous FITS files or VOtables of various categories (images, tables, spectra, etc.) in a powerful database deployed on the Web. Databases are located on your host and stay independent of any external server. This job doesn’t require writing code. Saada can mix data of various categories in multiple collections. Data collections can be linked each to others making relevant browsing paths and allowing data-mining oriented queries. Saada supports 4 VO services (Spectra, images, sources and TAP) . Data collections can be published immediately after the deployment of the Web interface.

  17. The use of intelligent database systems in acute pancreatitis--a systematic review.

    PubMed

    van den Heever, Marc; Mittal, Anubhav; Haydock, Matthew; Windsor, John

    2014-01-01

    Acute pancreatitis (AP) is a complex disease with multiple aetiological factors, wide ranging severity, and multiple challenges to effective triage and management. Databases, data mining and machine learning algorithms (MLAs), including artificial neural networks (ANNs), may assist by storing and interpreting data from multiple sources, potentially improving clinical decision-making. 1) Identify database technologies used to store AP data, 2) collate and categorise variables stored in AP databases, 3) identify the MLA technologies, including ANNs, used to analyse AP data, and 4) identify clinical and non-clinical benefits and obstacles in establishing a national or international AP database. Comprehensive systematic search of online reference databases. The predetermined inclusion criteria were all papers discussing 1) databases, 2) data mining or 3) MLAs, pertaining to AP, independently assessed by two reviewers with conflicts resolved by a third author. Forty-three papers were included. Three data mining technologies and five ANN methodologies were reported in the literature. There were 187 collected variables identified. ANNs increase accuracy of severity prediction, one study showed ANNs had a sensitivity of 0.89 and specificity of 0.96 six hours after admission--compare APACHE II (cutoff score ≥8) with 0.80 and 0.85 respectively. Problems with databases were incomplete data, lack of clinical data, diagnostic reliability and missing clinical data. This is the first systematic review examining the use of databases, MLAs and ANNs in the management of AP. The clinical benefits these technologies have over current systems and other advantages to adopting them are identified. Copyright © 2013 IAP and EPC. Published by Elsevier B.V. All rights reserved.

  18. Development of a land-cover characteristics database for the conterminous U.S.

    USGS Publications Warehouse

    Loveland, Thomas R.; Merchant, J.W.; Ohlen, D.O.; Brown, Jesslyn F.

    1991-01-01

    Information regarding the characteristics and spatial distribution of the Earth's land cover is critical to global environmental research. A prototype land-cover database for the conterminous United States designed for use in a variety of global modelling, monitoring, mapping, and analytical endeavors has been created. The resultant database contains multiple layers, including the source AVHRR data, the ancillary data layers, the land-cover regions defined by the research, and translation tables linking the regions to other land classification schema (for example, UNESCO, USGS Anderson System). The land-cover characteristics database can be analyzed, transformed, or aggregated by users to meet a broad spectrum of requirements. -from Authors

  19. Mining large heterogeneous data sets in drug discovery.

    PubMed

    Wild, David J

    2009-10-01

    Increasingly, effective drug discovery involves the searching and data mining of large volumes of information from many sources covering the domains of chemistry, biology and pharmacology amongst others. This has led to a proliferation of databases and data sources relevant to drug discovery. This paper provides a review of the publicly-available large-scale databases relevant to drug discovery, describes the kinds of data mining approaches that can be applied to them and discusses recent work in integrative data mining that looks for associations that pan multiple sources, including the use of Semantic Web techniques. The future of mining large data sets for drug discovery requires intelligent, semantic aggregation of information from all of the data sources described in this review, along with the application of advanced methods such as intelligent agents and inference engines in client applications.

  20. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections

    PubMed Central

    Jaeger, Sébastien; Thieffry, Denis

    2017-01-01

    Abstract Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. PMID:28591841

  1. KaBOB: ontology-based semantic integration of biomedical databases.

    PubMed

    Livingston, Kevin M; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E

    2015-04-23

    The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.

  2. Development of a Consumer Product Ingredient Database for ...

    EPA Pesticide Factsheets

    Consumer products are a primary source of chemical exposures, yet little structured information is available on the chemical ingredients of these products and the concentrations at which ingredients are present. To address this data gap, we created a database of chemicals in consumer products using product Material Safety Data Sheets (MSDSs) publicly provided by a large retailer. The resulting database represents 1797 unique chemicals mapped to 8921 consumer products and a hierarchy of 353 consumer product “use categories” within a total of 15 top-level categories. We examine the utility of this database and discuss ways in which it will support (i) exposure screening and prioritization, (ii) generic or framework formulations for several indoor/consumer product exposure modeling initiatives, (iii) candidate chemical selection for monitoring near field exposure from proximal sources, and (iv) as activity tracers or ubiquitous exposure sources using “chemical space” map analyses. Chemicals present at high concentrations and across multiple consumer products and use categories that hold high exposure potential are identified. Our database is publicly available to serve regulators, retailers, manufacturers, and the public for predictive screening of chemicals in new and existing consumer products on the basis of exposure and risk. The National Exposure Research Laboratory’s (NERL’s) Human Exposure and Atmospheric Sciences Division (HEASD) conducts resear

  3. Charting a Path to Location Intelligence for STD Control.

    PubMed

    Gerber, Todd M; Du, Ping; Armstrong-Brown, Janelle; McNutt, Louise-Anne; Coles, F Bruce

    2009-01-01

    This article describes the New York State Department of Health's GeoDatabase project, which developed new methods and techniques for designing and building a geocoding and mapping data repository for sexually transmitted disease (STD) control. The GeoDatabase development was supported through the Centers for Disease Control and Prevention's Outcome Assessment through Systems of Integrated Surveillance workgroup. The design and operation of the GeoDatabase relied upon commercial-off-the-shelf tools that other public health programs may also use for disease-control systems. This article provides a blueprint of the structure and software used to build the GeoDatabase and integrate location data from multiple data sources into the everyday activities of STD control programs.

  4. Compilation and analysis of multiple groundwater-quality datasets for Idaho

    USGS Publications Warehouse

    Hundt, Stephen A.; Hopkins, Candice B.

    2018-05-09

    Groundwater is an important source of drinking and irrigation water throughout Idaho, and groundwater quality is monitored by various Federal, State, and local agencies. The historical, multi-agency records of groundwater quality include a valuable dataset that has yet to be compiled or analyzed on a statewide level. The purpose of this study is to combine groundwater-quality data from multiple sources into a single database, to summarize this dataset, and to perform bulk analyses to reveal spatial and temporal patterns of water quality throughout Idaho. Data were retrieved from the Water Quality Portal (https://www.waterqualitydata.us/), the Idaho Department of Environmental Quality, and the Idaho Department of Water Resources. Analyses included counting the number of times a sample location had concentrations above Maximum Contaminant Levels (MCL), performing trends tests, and calculating correlations between water-quality analytes. The water-quality database and the analysis results are available through USGS ScienceBase (https://doi.org/10.5066/F72V2FBG).

  5. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

    PubMed

    Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L

    2016-11-04

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .

  6. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics

    PubMed Central

    Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.

    2016-01-01

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/. PMID:27577934

  7. A highly sensitive search strategy for clinical trials in Literatura Latino Americana e do Caribe em Ciências da Saúde (LILACS) was developed.

    PubMed

    Manríquez, Juan J

    2008-04-01

    Systematic reviews should include as many articles as possible. However, many systematic reviews use only databases with high English language content as sources of trials. Literatura Latino Americana e do Caribe em Ciências da Saúde (LILACS) is an underused source of trials, and there is not a validated strategy for searching clinical trials to be used in this database. The objective of this study was to develop a sensitive search strategy for clinical trials in LILACS. An analytical survey was performed. Several single and multiple-term search strategies were tested for their ability to retrieve clinical trials in LILACS. Sensitivity, specificity, and accuracy of each single and multiple-term strategy were calculated using the results of a hand-search of 44 Chilean journals as gold standard. After combining the most sensitive, specific, and accurate single and multiple-term search strategy, a strategy with a sensitivity of 97.75% (95% confidence interval [CI]=95.98-99.53) and a specificity of 61.85 (95% CI=61.19-62.51) was obtained. LILACS is a source of trials that could improve systematic reviews. A new highly sensitive search strategy for clinical trials in LILACS has been developed. It is hoped this search strategy will improve and increase the utilization of LILACS in future systematic reviews.

  8. The Chandra Source Catalog : Automated Source Correlation

    NASA Astrophysics Data System (ADS)

    Hain, Roger; Evans, I. N.; Evans, J. D.; Glotfelty, K. J.; Anderson, C. S.; Bonaventura, N. R.; Chen, J. C.; Davis, J. E.; Doe, S. M.; Fabbiano, G.; Galle, E.; Gibbs, D. G.; Grier, J. D.; Hall, D. M.; Harbo, P. N.; He, X.; Houck, J. C.; Karovska, M.; Lauer, J.; McCollough, M. L.; McDowell, J. C.; Miller, J. B.; Mitschang, A. W.; Morgan, D. L.; Nichols, J. S.; Nowak, M. A.; Plummer, D. A.; Primini, F. A.; Refsdal, B. L.; Rots, A. H.; Siemiginowska, A. L.; Sundheim, B. A.; Tibbetts, M. S.; Van Stone, D. W.; Winkelman, S. L.; Zografou, P.

    2009-01-01

    Chandra Source Catalog (CSC) master source pipeline processing seeks to automatically detect sources and compute their properties. Since Chandra is a pointed mission and not a sky survey, different sky regions are observed for a different number of times at varying orientations, resolutions, and other heterogeneous conditions. While this provides an opportunity to collect data from a potentially large number of observing passes, it also creates challenges in determining the best way to combine different detection results for the most accurate characterization of the detected sources. The CSC master source pipeline correlates data from multiple observations by updating existing cataloged source information with new data from the same sky region as they become available. This process sometimes leads to relatively straightforward conclusions, such as when single sources from two observations are similar in size and position. Other observation results require more logic to combine, such as one observation finding a single, large source and another identifying multiple, smaller sources at the same position. We present examples of different overlapping source detections processed in the current version of the CSC master source pipeline. We explain how they are resolved into entries in the master source database, and examine the challenges of computing source properties for the same source detected multiple times. Future enhancements are also discussed. This work is supported by NASA contract NAS8-03060 (CXC).

  9. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

    PubMed

    Castro-Mondragon, Jaime Abraham; Jaeger, Sébastien; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2017-07-27

    Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. GIS for the Gulf: A reference database for hurricane-affected areas: Chapter 4C in Science and the storms-the USGS response to the hurricanes of 2005

    USGS Publications Warehouse

    Greenlee, Dave

    2007-01-01

    A week after Hurricane Katrina made landfall in Louisiana, a collaboration among multiple organizations began building a database called the Geographic Information System for the Gulf, shortened to "GIS for the Gulf," to support the geospatial data needs of people in the hurricane-affected area. Data were gathered from diverse sources and entered into a consistent and standardized data model in a manner that is Web accessible.

  11. Cumulative Environmental Impacts from Multiple Electric Generating Stations

    EPA Pesticide Factsheets

    This document may be of assistance in applying the New Source Review (NSR) air permitting regulations including the Prevention of Significant Deterioration (PSD) requirements. This document is part of the NSR Policy and Guidance Database. Some documents in the database are a scanned or retyped version of a paper photocopy of the original. Although we have taken considerable effort to quality assure the documents, some may contain typographical errors. Contact the office that issued the document if you need a copy of the original.

  12. Database of historically documented springs and spring flow measurements in Texas

    USGS Publications Warehouse

    Heitmuller, Franklin T.; Reece, Brian D.

    2003-01-01

    Springs are naturally occurring features that convey excess ground water to the land surface; they represent a transition from ground water to surface water. Water issues through one opening, multiple openings, or numerous seeps in the rock or soil. The database of this report provides information about springs and spring flow in Texas including spring names, identification numbers, location, and, if available, water source and use. This database does not include every spring in Texas, but is limited to an aggregation of selected digital and hard-copy data of the U.S. Geological Survey (USGS), the Texas Water Development Board (TWDB), and Capitol Environmental Services.

  13. Data model and relational database design for the New England Water-Use Data System (NEWUDS)

    USGS Publications Warehouse

    Tessler, Steven

    2001-01-01

    The New England Water-Use Data System (NEWUDS) is a database for the storage and retrieval of water-use data. NEWUDS can handle data covering many facets of water use, including (1) tracking various types of water-use activities (withdrawals, returns, transfers, distributions, consumptive-use, wastewater collection, and treatment); (2) the description, classification and location of places and organizations involved in water-use activities; (3) details about measured or estimated volumes of water associated with water-use activities; and (4) information about data sources and water resources associated with water use. In NEWUDS, each water transaction occurs unidirectionally between two site objects, and the sites and conveyances form a water network. The core entities in the NEWUDS model are site, conveyance, transaction/rate, location, and owner. Other important entities include water resources (used for withdrawals and returns), data sources, and aliases. Multiple water-exchange estimates can be stored for individual transactions based on different methods or data sources. Storage of user-defined details is accommodated for several of the main entities. Numerous tables containing classification terms facilitate detailed descriptions of data items and can be used for routine or custom data summarization. NEWUDS handles single-user and aggregate-user water-use data, can be used for large or small water-network projects, and is available as a stand-alone Microsoft? Access database structure. Users can customize and extend the database, link it to other databases, or implement the design in other relational database applications.

  14. PropBase Query Layer: a single portal to UK subsurface physical property databases

    NASA Astrophysics Data System (ADS)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2013-04-01

    Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple interface. Data are re-engineered to facilitate easy loading. The query layer structure comprises tables, procedures, functions, triggers, views and materialised views. The structure contains a main table PRB_DATA which contains all of the data with the following attribution: • a unique identifier • the data source • the unique identifier from the parent database for traceability • the 3D location • the property type • the property value • the units • necessary qualifiers • precision information and an audit trail Data sources, property type and units are constrained by dictionaries, a key component of the structure which defines what properties and inheritance hierarchies are to be coded and also guides the process as to what and how these are extracted from the structure. Data types served by the Query Layer include site investigation derived geotechnical data, hydrogeology datasets, regional geochemistry, geophysical logs as well as lithological and borehole metadata. The size and complexity of the data sets with multiple parent structures requires a technically robust approach to keep the layer synchronised. This is achieved through Oracle procedures written in PL/SQL containing the logic required to carry out the data manipulation (inserts, updates, deletes) to keep the layer synchronised with the underlying databases either as regular scheduled jobs (weekly, monthly etc) or invoked on demand. The PropBase Query Layer's implementation has enabled rapid data discovery, visualisation and interpretation of geological data with greater ease, simplifying the parametrisation of 3D model volumes and facilitating the study of intra-unit heterogeneity.

  15. ProXL (Protein Cross-Linking Database): A Platform for Analysis, Visualization, and Sharing of Protein Cross-Linking Mass Spectrometry Data

    PubMed Central

    2016-01-01

    ProXL is a Web application and accompanying database designed for sharing, visualizing, and analyzing bottom-up protein cross-linking mass spectrometry data with an emphasis on structural analysis and quality control. ProXL is designed to be independent of any particular software pipeline. The import process is simplified by the use of the ProXL XML data format, which shields developers of data importers from the relative complexity of the relational database schema. The database and Web interfaces function equally well for any software pipeline and allow data from disparate pipelines to be merged and contrasted. ProXL includes robust public and private data sharing capabilities, including a project-based interface designed to ensure security and facilitate collaboration among multiple researchers. ProXL provides multiple interactive and highly dynamic data visualizations that facilitate structural-based analysis of the observed cross-links as well as quality control. ProXL is open-source, well-documented, and freely available at https://github.com/yeastrc/proxl-web-app. PMID:27302480

  16. ProXL (Protein Cross-Linking Database): A Platform for Analysis, Visualization, and Sharing of Protein Cross-Linking Mass Spectrometry Data.

    PubMed

    Riffle, Michael; Jaschob, Daniel; Zelter, Alex; Davis, Trisha N

    2016-08-05

    ProXL is a Web application and accompanying database designed for sharing, visualizing, and analyzing bottom-up protein cross-linking mass spectrometry data with an emphasis on structural analysis and quality control. ProXL is designed to be independent of any particular software pipeline. The import process is simplified by the use of the ProXL XML data format, which shields developers of data importers from the relative complexity of the relational database schema. The database and Web interfaces function equally well for any software pipeline and allow data from disparate pipelines to be merged and contrasted. ProXL includes robust public and private data sharing capabilities, including a project-based interface designed to ensure security and facilitate collaboration among multiple researchers. ProXL provides multiple interactive and highly dynamic data visualizations that facilitate structural-based analysis of the observed cross-links as well as quality control. ProXL is open-source, well-documented, and freely available at https://github.com/yeastrc/proxl-web-app .

  17. Graph-Based Weakly-Supervised Methods for Information Extraction & Integration

    ERIC Educational Resources Information Center

    Talukdar, Partha Pratim

    2010-01-01

    The variety and complexity of potentially-related data resources available for querying--webpages, databases, data warehouses--has been growing ever more rapidly. There is a growing need to pose integrative queries "across" multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse…

  18. SHIFT: Shared Information Framework and Technology Concept

    DTIC Science & Technology

    2009-02-01

    easy to get an overview of all those tasks what is happening around each mission. Federated Search allows actors to search multiple data sources...is sent to every individual database in the portal or federated search list. 4.7. SHIFT and the world around it The idea in SHIFT is to enable

  19. SchizConnect: Mediating neuroimaging databases on schizophrenia and related disorders for large-scale integration.

    PubMed

    Wang, Lei; Alpert, Kathryn I; Calhoun, Vince D; Cobia, Derin J; Keator, David B; King, Margaret D; Kogan, Alexandr; Landis, Drew; Tallis, Marcelo; Turner, Matthew D; Potkin, Steven G; Turner, Jessica A; Ambite, Jose Luis

    2016-01-01

    SchizConnect (www.schizconnect.org) is built to address the issues of multiple data repositories in schizophrenia neuroimaging studies. It includes a level of mediation--translating across data sources--so that the user can place one query, e.g. for diffusion images from male individuals with schizophrenia, and find out from across participating data sources how many datasets there are, as well as downloading the imaging and related data. The current version handles the Data Usage Agreements across different studies, as well as interpreting database-specific terminologies into a common framework. New data repositories can also be mediated to bring immediate access to existing datasets. Compared with centralized, upload data sharing models, SchizConnect is a unique, virtual database with a focus on schizophrenia and related disorders that can mediate live data as information is being updated at each data source. It is our hope that SchizConnect can facilitate testing new hypotheses through aggregated datasets, promoting discovery related to the mechanisms underlying schizophrenic dysfunction. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. Brain Tumor Database, a free relational database for collection and analysis of brain tumor patient information.

    PubMed

    Bergamino, Maurizio; Hamilton, David J; Castelletti, Lara; Barletta, Laura; Castellan, Lucio

    2015-03-01

    In this study, we describe the development and utilization of a relational database designed to manage the clinical and radiological data of patients with brain tumors. The Brain Tumor Database was implemented using MySQL v.5.0, while the graphical user interface was created using PHP and HTML, thus making it easily accessible through a web browser. This web-based approach allows for multiple institutions to potentially access the database. The BT Database can record brain tumor patient information (e.g. clinical features, anatomical attributes, and radiological characteristics) and be used for clinical and research purposes. Analytic tools to automatically generate statistics and different plots are provided. The BT Database is a free and powerful user-friendly tool with a wide range of possible clinical and research applications in neurology and neurosurgery. The BT Database graphical user interface source code and manual are freely available at http://tumorsdatabase.altervista.org. © The Author(s) 2013.

  1. Citation searching: a systematic review case study of multiple risk behaviour interventions

    PubMed Central

    2014-01-01

    Background The value of citation searches as part of the systematic review process is currently unknown. While the major guides to conducting systematic reviews state that citation searching should be carried out in addition to searching bibliographic databases there are still few studies in the literature that support this view. Rather than using a predefined search strategy to retrieve studies, citation searching uses known relevant papers to identify further papers. Methods We describe a case study about the effectiveness of using the citation sources Google Scholar, Scopus, Web of Science and OVIDSP MEDLINE to identify records for inclusion in a systematic review. We used the 40 included studies identified by traditional database searches from one systematic review of interventions for multiple risk behaviours. We searched for each of the included studies in the four citation sources to retrieve the details of all papers that have cited these studies. We carried out two analyses; the first was to examine the overlap between the four citation sources to identify which citation tool was the most useful; the second was to investigate whether the citation searches identified any relevant records in addition to those retrieved by the original database searches. Results The highest number of citations was retrieved from Google Scholar (1680), followed by Scopus (1173), then Web of Science (1095) and lastly OVIDSP (213). To retrieve all the records identified by the citation tracking searching all four resources was required. Google Scholar identified the highest number of unique citations. The citation tracking identified 9 studies that met the review’s inclusion criteria. Eight of these had already been identified by the traditional databases searches and identified in the screening process while the ninth was not available in any of the databases when the original searches were carried out. It would, however, have been identified by two of the database search strategies if searches had been carried out later. Conclusions Based on the results from this investigation, citation searching as a supplementary search method for systematic reviews may not be the best use of valuable time and resources. It would be useful to verify these findings in other reviews. PMID:24893958

  2. Using Social Media to Identify Sources of Healthy Food in Urban Neighborhoods.

    PubMed

    Gomez-Lopez, Iris N; Clarke, Philippa; Hill, Alex B; Romero, Daniel M; Goodspeed, Robert; Berrocal, Veronica J; Vinod Vydiswaran, V G; Veinot, Tiffany C

    2017-06-01

    An established body of research has used secondary data sources (such as proprietary business databases) to demonstrate the importance of the neighborhood food environment for multiple health outcomes. However, documenting food availability using secondary sources in low-income urban neighborhoods can be particularly challenging since small businesses play a crucial role in food availability. These small businesses are typically underrepresented in national databases, which rely on secondary sources to develop data for marketing purposes. Using social media and other crowdsourced data to account for these smaller businesses holds promise, but the quality of these data remains unknown. This paper compares the quality of full-line grocery store information from Yelp, a crowdsourced content service, to a "ground truth" data set (Detroit Food Map) and a commercially-available dataset (Reference USA) for the greater Detroit area. Results suggest that Yelp is more accurate than Reference USA in identifying healthy food stores in urban areas. Researchers investigating the relationship between the nutrition environment and health may consider Yelp as a reliable and valid source for identifying sources of healthy food in urban environments.

  3. Advanced Modeling and Uncertainty Quantification for Flight Dynamics; Interim Results and Challenges

    NASA Technical Reports Server (NTRS)

    Hyde, David C.; Shweyk, Kamal M.; Brown, Frank; Shah, Gautam

    2014-01-01

    As part of the NASA Vehicle Systems Safety Technologies (VSST), Assuring Safe and Effective Aircraft Control Under Hazardous Conditions (Technical Challenge #3), an effort is underway within Boeing Research and Technology (BR&T) to address Advanced Modeling and Uncertainty Quantification for Flight Dynamics (VSST1-7). The scope of the effort is to develop and evaluate advanced multidisciplinary flight dynamics modeling techniques, including integrated uncertainties, to facilitate higher fidelity response characterization of current and future aircraft configurations approaching and during loss-of-control conditions. This approach is to incorporate multiple flight dynamics modeling methods for aerodynamics, structures, and propulsion, including experimental, computational, and analytical. Also to be included are techniques for data integration and uncertainty characterization and quantification. This research shall introduce new and updated multidisciplinary modeling and simulation technologies designed to improve the ability to characterize airplane response in off-nominal flight conditions. The research shall also introduce new techniques for uncertainty modeling that will provide a unified database model comprised of multiple sources, as well as an uncertainty bounds database for each data source such that a full vehicle uncertainty analysis is possible even when approaching or beyond Loss of Control boundaries. Methodologies developed as part of this research shall be instrumental in predicting and mitigating loss of control precursors and events directly linked to causal and contributing factors, such as stall, failures, damage, or icing. The tasks will include utilizing the BR&T Water Tunnel to collect static and dynamic data to be compared to the GTM extended WT database, characterizing flight dynamics in off-nominal conditions, developing tools for structural load estimation under dynamic conditions, devising methods for integrating various modeling elements into a real-time simulation capability, generating techniques for uncertainty modeling that draw data from multiple modeling sources, and providing a unified database model that includes nominal plus increments for each flight condition. This paper presents status of testing in the BR&T water tunnel and analysis of the resulting data and efforts to characterize these data using alternative modeling methods. Program challenges and issues are also presented.

  4. Methods for structuring scientific knowledge from many areas related to aging research.

    PubMed

    Zhavoronkov, Alex; Cantor, Charles R

    2011-01-01

    Aging and age-related disease represents a substantial quantity of current natural, social and behavioral science research efforts. Presently, no centralized system exists for tracking aging research projects across numerous research disciplines. The multidisciplinary nature of this research complicates the understanding of underlying project categories, the establishment of project relations, and the development of a unified project classification scheme. We have developed a highly visual database, the International Aging Research Portfolio (IARP), available at AgingPortfolio.org to address this issue. The database integrates information on research grants, peer-reviewed publications, and issued patent applications from multiple sources. Additionally, the database uses flexible project classification mechanisms and tools for analyzing project associations and trends. This system enables scientists to search the centralized project database, to classify and categorize aging projects, and to analyze the funding aspects across multiple research disciplines. The IARP is designed to provide improved allocation and prioritization of scarce research funding, to reduce project overlap and improve scientific collaboration thereby accelerating scientific and medical progress in a rapidly growing area of research. Grant applications often precede publications and some grants do not result in publications, thus, this system provides utility to investigate an earlier and broader view on research activity in many research disciplines. This project is a first attempt to provide a centralized database system for research grants and to categorize aging research projects into multiple subcategories utilizing both advanced machine algorithms and a hierarchical environment for scientific collaboration.

  5. Dealing with the Data Deluge: Handling the Multitude Of Chemical Biology Data Sources

    PubMed Central

    Guha, Rajarshi; Nguyen, Dac-Trung; Southall, Noel; Jadhav, Ajit

    2012-01-01

    Over the last 20 years, there has been an explosion in the amount and type of biological and chemical data that has been made publicly available in a variety of online databases. While this means that vast amounts of information can be found online, there is no guarantee that it can be found easily (or at all). A scientist searching for a specific piece of information is faced with a daunting task - many databases have overlapping content, use their own identifiers and, in some cases, have arcane and unintuitive user interfaces. In this overview, a variety of well known data sources for chemical and biological information are highlighted, focusing on those most useful for chemical biology research. The issue of using multiple data sources together and the associated problems such as identifier disambiguation are highlighted. A brief discussion is then provided on Tripod, a recently developed platform that supports the integration of arbitrary data sources, providing users a simple interface to search across a federated collection of resources. PMID:26609498

  6. Mining the Mind Research Network: A Novel Framework for Exploring Large Scale, Heterogeneous Translational Neuroscience Research Data Sources

    PubMed Central

    Bockholt, Henry J.; Scully, Mark; Courtney, William; Rachakonda, Srinivas; Scott, Adam; Caprihan, Arvind; Fries, Jill; Kalyanam, Ravi; Segall, Judith M.; de la Garza, Raul; Lane, Susan; Calhoun, Vince D.

    2009-01-01

    A neuroinformatics (NI) system is critical to brain imaging research in order to shorten the time between study conception and results. Such a NI system is required to scale well when large numbers of subjects are studied. Further, when multiple sites participate in research projects organizational issues become increasingly difficult. Optimized NI applications mitigate these problems. Additionally, NI software enables coordination across multiple studies, leveraging advantages potentially leading to exponential research discoveries. The web-based, Mind Research Network (MRN), database system has been designed and improved through our experience with 200 research studies and 250 researchers from seven different institutions. The MRN tools permit the collection, management, reporting and efficient use of large scale, heterogeneous data sources, e.g., multiple institutions, multiple principal investigators, multiple research programs and studies, and multimodal acquisitions. We have collected and analyzed data sets on thousands of research participants and have set up a framework to automatically analyze the data, thereby making efficient, practical data mining of this vast resource possible. This paper presents a comprehensive framework for capturing and analyzing heterogeneous neuroscience research data sources that has been fully optimized for end-users to perform novel data mining. PMID:20461147

  7. An Investigation of Graduate Student Knowledge and Usage of Open-Access Journals

    ERIC Educational Resources Information Center

    Beard, Regina M.

    2016-01-01

    Graduate students lament the need to achieve the proficiency necessary to competently search multiple databases for their research assignments, regularly eschewing these sources in favor of Google Scholar or some other search engine. The author conducted an anonymous survey investigating graduate student knowledge or awareness of the open-access…

  8. Analysing and correcting the differences between multi-source and multi-scale spatial remote sensing observations.

    PubMed

    Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun

    2014-01-01

    Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation.

  9. Analysing and Correcting the Differences between Multi-Source and Multi-Scale Spatial Remote Sensing Observations

    PubMed Central

    Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun

    2014-01-01

    Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation. PMID:25405760

  10. Multiple Solutions of Real-time Tsunami Forecasting Using Short-term Inundation Forecasting for Tsunamis Tool

    NASA Astrophysics Data System (ADS)

    Gica, E.

    2016-12-01

    The Short-term Inundation Forecasting for Tsunamis (SIFT) tool, developed by NOAA Center for Tsunami Research (NCTR) at the Pacific Marine Environmental Laboratory (PMEL), is used in forecast operations at the Tsunami Warning Centers in Alaska and Hawaii. The SIFT tool relies on a pre-computed tsunami propagation database, real-time DART buoy data, and an inversion algorithm to define the tsunami source. The tsunami propagation database is composed of 50×100km unit sources, simulated basin-wide for at least 24 hours. Different combinations of unit sources, DART buoys, and length of real-time DART buoy data can generate a wide range of results within the defined tsunami source. For an inexperienced SIFT user, the primary challenge is to determine which solution, among multiple solutions for a single tsunami event, would provide the best forecast in real time. This study investigates how the use of different tsunami sources affects simulated tsunamis at tide gauge locations. Using the tide gauge at Hilo, Hawaii, a total of 50 possible solutions for the 2011 Tohoku tsunami are considered. Maximum tsunami wave amplitude and root mean square error results are used to compare tide gauge data and the simulated tsunami time series. Results of this study will facilitate SIFT users' efforts to determine if the simulated tide gauge tsunami time series from a specific tsunami source solution would be within the range of possible solutions. This study will serve as the basis for investigating more historical tsunami events and tide gauge locations.

  11. An open source web interface for linking models to infrastructure system databases

    NASA Astrophysics Data System (ADS)

    Knox, S.; Mohamed, K.; Harou, J. J.; Rheinheimer, D. E.; Medellin-Azuara, J.; Meier, P.; Tilmant, A.; Rosenberg, D. E.

    2016-12-01

    Models of networked engineered resource systems such as water or energy systems are often built collaboratively with developers from different domains working at different locations. These models can be linked to large scale real world databases, and they are constantly being improved and extended. As the development and application of these models becomes more sophisticated, and the computing power required for simulations and/or optimisations increases, so has the need for online services and tools which enable the efficient development and deployment of these models. Hydra Platform is an open source, web-based data management system, which allows modellers of network-based models to remotely store network topology and associated data in a generalised manner, allowing it to serve multiple disciplines. Hydra Platform uses a web API using JSON to allow external programs (referred to as `Apps') to interact with its stored networks and perform actions such as importing data, running models, or exporting the networks to different formats. Hydra Platform supports multiple users accessing the same network and has a suite of functions for managing users and data. We present ongoing development in Hydra Platform, the Hydra Web User Interface, through which users can collaboratively manage network data and models in a web browser. The web interface allows multiple users to graphically access, edit and share their networks, run apps and view results. Through apps, which are located on the server, the web interface can give users access to external data sources and models without the need to install or configure any software. This also ensures model results can be reproduced by removing platform or version dependence. Managing data and deploying models via the web interface provides a way for multiple modellers to collaboratively manage data, deploy and monitor model runs and analyse results.

  12. PhamDB: a web-based application for building Phamerator databases.

    PubMed

    Lamine, James G; DeJong, Randall J; Nelesen, Serita M

    2016-07-01

    PhamDB is a web application which creates databases of bacteriophage genes, grouped by gene similarity. It is backwards compatible with the existing Phamerator desktop software while providing an improved database creation workflow. Key features include a graphical user interface, validation of uploaded GenBank files, and abilities to import phages from existing databases, modify existing databases and queue multiple jobs. Source code and installation instructions for Linux, Windows and Mac OSX are freely available at https://github.com/jglamine/phage PhamDB is also distributed as a docker image which can be managed via Kitematic. This docker image contains the application and all third party software dependencies as a pre-configured system, and is freely available via the installation instructions provided. snelesen@calvin.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Non-redundant patent sequence databases with value-added annotations at two levels

    PubMed Central

    Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

    2010-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/. PMID:19884134

  14. Non-redundant patent sequence databases with value-added annotations at two levels.

    PubMed

    Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

    2010-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/.

  15. Identifying known unknowns using the US EPA's CompTox ...

    EPA Pesticide Factsheets

    Chemical features observed using high-resolution mass spectrometry can be tentatively identified using online chemical reference databases by searching molecular formulae and monoisotopic masses and then rank-ordering of the hits using appropriate relevance criteria. The most likely candidate “known unknowns,” which are those chemicals unknown to an investigator but contained within a reference database or literature source, rise to the top of a chemical list when rank-ordered by the number of associated data sources. The U.S. EPA’s CompTox Chemistry Dashboard is a curated and freely available resource for chemistry and computational toxicology research, containing more than 720,000 chemicals of relevance to environmental health science. In this research, the performance of the Dashboard for identifying “known unknowns” was evaluated against that of the online ChemSpider database, one of the primary resources used by mass spectrometrists, using multiple previously studied datasets reported in the peer-reviewed literature totaling 162 chemicals. These chemicals were examined using both applications via molecular formula and monoisotopic mass searches followed by rank-ordering of candidate compounds by associated references or data sources. A greater percentage of chemicals ranked in the top position when using the Dashboard, indicating an advantage of this application over ChemSpider for identifying known unknowns using data source ranking. Addition

  16. SchizConnect: Mediating Neuroimaging Databases on Schizophrenia and Related Disorders for Large-Scale Integration

    PubMed Central

    Wang, Lei; Alpert, Kathryn I.; Calhoun, Vince D.; Cobia, Derin J.; Keator, David B.; King, Margaret D.; Kogan, Alexandr; Landis, Drew; Tallis, Marcelo; Turner, Matthew D.; Potkin, Steven G.; Turner, Jessica A.; Ambite, Jose Luis

    2015-01-01

    SchizConnect (www.schizconnect.org) is built to address the issues of multiple data repositories in schizophrenia neuroimaging studies. It includes a level of mediation—translating across data sources—so that the user can place one query, e.g. for diffusion images from male individuals with schizophrenia, and find out from across participating data sources how many datasets there are, as well as downloading the imaging and related data. The current version handles the Data Usage Agreements across different studies, as well as interpreting database-specific terminologies into a common framework. New data repositories can also be mediated to bring immediate access to existing datasets. Compared with centralized, upload data sharing models, SchizConnect is a unique, virtual database with a focus on schizophrenia and related disorders that can mediate live data as information are being updated at each data source. It is our hope that SchizConnect can facilitate testing new hypotheses through aggregated datasets, promoting discovery related to the mechanisms underlying schizophrenic dysfunction. PMID:26142271

  17. Conversion of National Health Insurance Service-National Sample Cohort (NHIS-NSC) Database into Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM).

    PubMed

    You, Seng Chan; Lee, Seongwon; Cho, Soo-Yeon; Park, Hojun; Jung, Sungjae; Cho, Jaehyeong; Yoon, Dukyong; Park, Rae Woong

    2017-01-01

    It is increasingly necessary to generate medical evidence applicable to Asian people compared to those in Western countries. Observational Health Data Sciences a Informatics (OHDSI) is an international collaborative which aims to facilitate generating high-quality evidence via creating and applying open-source data analytic solutions to a large network of health databases across countries. We aimed to incorporate Korean nationwide cohort data into the OHDSI network by converting the national sample cohort into Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM). The data of 1.13 million subjects was converted to OMOP-CDM, resulting in average 99.1% conversion rate. The ACHILLES, open-source OMOP-CDM-based data profiling tool, was conducted on the converted database to visualize data-driven characterization and access the quality of data. The OMOP-CDM version of National Health Insurance Service-National Sample Cohort (NHIS-NSC) can be a valuable tool for multiple aspects of medical research by incorporation into the OHDSI research network.

  18. Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices.

    PubMed

    Dececchi, T Alex; Mabee, Paula M; Blackburn, David C

    2016-01-01

    Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications ('monographs') and those used in phylogenetic analyses ('matrices'). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life.

  19. Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices

    PubMed Central

    Dececchi, T. Alex; Mabee, Paula M.; Blackburn, David C.

    2016-01-01

    Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications (‘monographs’) and those used in phylogenetic analyses (‘matrices’). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life. PMID:27191170

  20. DamaGIS: a multisource geodatabase for collection of flood-related damage data

    NASA Astrophysics Data System (ADS)

    Saint-Martin, Clotilde; Javelle, Pierre; Vinet, Freddy

    2018-06-01

    Every year in France, recurring flood events result in several million euros of damage, and reducing the heavy consequences of floods has become a high priority. However, actions to reduce the impact of floods are often hindered by the lack of damage data on past flood events. The present paper introduces a new database for collection and assessment of flood-related damage. The DamaGIS database offers an innovative bottom-up approach to gather and identify damage data from multiple sources, including new media. The study area has been defined as the south of France considering the high frequency of floods over the past years. This paper presents the structure and contents of the database. It also presents operating instructions in order to keep collecting damage data within the database. This paper also describes an easily reproducible method to assess the severity of flood damage regardless of the location or date of occurrence. A first analysis of the damage contents is also provided in order to assess data quality and the relevance of the database. According to this analysis, despite its lack of comprehensiveness, the DamaGIS database presents many advantages. Indeed, DamaGIS provides a high accuracy of data as well as simplicity of use. It also has the additional benefit of being accessible in multiple formats and is open access. The DamaGIS database is available at https://doi.org/10.5281/zenodo.1241089.

  1. PICKLE 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology

    PubMed Central

    Gioutlakis, Aris; Klapa, Maria I.

    2017-01-01

    It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes. PMID:29023571

  2. Combining Evidence of Preferential Gene-Tissue Relationships from Multiple Sources

    PubMed Central

    Guo, Jing; Hammar, Mårten; Öberg, Lisa; Padmanabhuni, Shanmukha S.; Bjäreland, Marcus; Dalevi, Daniel

    2013-01-01

    An important challenge in drug discovery and disease prognosis is to predict genes that are preferentially expressed in one or a few tissues, i.e. showing a considerably higher expression in one tissue(s) compared to the others. Although several data sources and methods have been published explicitly for this purpose, they often disagree and it is not evident how to retrieve these genes and how to distinguish true biological findings from those that are due to choice-of-method and/or experimental settings. In this work we have developed a computational approach that combines results from multiple methods and datasets with the aim to eliminate method/study-specific biases and to improve the predictability of preferentially expressed human genes. A rule-based score is used to merge and assign support to the results. Five sets of genes with known tissue specificity were used for parameter pruning and cross-validation. In total we identify 3434 tissue-specific genes. We compare the genes of highest scores with the public databases: PaGenBase (microarray), TiGER (EST) and HPA (protein expression data). The results have 85% overlap to PaGenBase, 71% to TiGER and only 28% to HPA. 99% of our predictions have support from at least one of these databases. Our approach also performs better than any of the databases on identifying drug targets and biomarkers with known tissue-specificity. PMID:23950964

  3. Constructing a Geology Ontology Using a Relational Database

    NASA Astrophysics Data System (ADS)

    Hou, W.; Yang, L.; Yin, S.; Ye, J.; Clarke, K.

    2013-12-01

    In geology community, the creation of a common geology ontology has become a useful means to solve problems of data integration, knowledge transformation and the interoperation of multi-source, heterogeneous and multiple scale geological data. Currently, human-computer interaction methods and relational database-based methods are the primary ontology construction methods. Some human-computer interaction methods such as the Geo-rule based method, the ontology life cycle method and the module design method have been proposed for applied geological ontologies. Essentially, the relational database-based method is a reverse engineering of abstracted semantic information from an existing database. The key is to construct rules for the transformation of database entities into the ontology. Relative to the human-computer interaction method, relational database-based methods can use existing resources and the stated semantic relationships among geological entities. However, two problems challenge the development and application. One is the transformation of multiple inheritances and nested relationships and their representation in an ontology. The other is that most of these methods do not measure the semantic retention of the transformation process. In this study, we focused on constructing a rule set to convert the semantics in a geological database into a geological ontology. According to the relational schema of a geological database, a conversion approach is presented to convert a geological spatial database to an OWL-based geological ontology, which is based on identifying semantics such as entities, relationships, inheritance relationships, nested relationships and cluster relationships. The semantic integrity of the transformation was verified using an inverse mapping process. In a geological ontology, an inheritance and union operations between superclass and subclass were used to present the nested relationship in a geochronology and the multiple inheritances relationship. Based on a Quaternary database of downtown of Foshan city, Guangdong Province, in Southern China, a geological ontology was constructed using the proposed method. To measure the maintenance of semantics in the conversation process and the results, an inverse mapping from the ontology to a relational database was tested based on a proposed conversation rule. The comparison of schema and entities and the reduction of tables between the inverse database and the original database illustrated that the proposed method retains the semantic information well during the conversation process. An application for abstracting sandstone information showed that semantic relationships among concepts in the geological database were successfully reorganized in the constructed ontology. Key words: geological ontology; geological spatial database; multiple inheritance; OWL Acknowledgement: This research is jointly funded by the Specialized Research Fund for the Doctoral Program of Higher Education of China (RFDP) (20100171120001), NSFC (41102207) and the Fundamental Research Funds for the Central Universities (12lgpy19).

  4. Exploring consumer exposure pathways and patterns of use for chemicals in the environment.

    PubMed

    Dionisio, Kathie L; Frame, Alicia M; Goldsmith, Michael-Rock; Wambaugh, John F; Liddell, Alan; Cathey, Tommy; Smith, Doris; Vail, James; Ernstoff, Alexi S; Fantke, Peter; Jolliet, Olivier; Judson, Richard S

    2015-01-01

    Humans are exposed to thousands of chemicals in the workplace, home, and via air, water, food, and soil. A major challenge in estimating chemical exposures is to understand which chemicals are present in these media and microenvironments. Here we describe the Chemical/Product Categories Database (CPCat), a new, publically available (http://actor.epa.gov/cpcat) database of information on chemicals mapped to "use categories" describing the usage or function of the chemical. CPCat was created by combining multiple and diverse sources of data on consumer- and industrial-process based chemical uses from regulatory agencies, manufacturers, and retailers in various countries. The database uses a controlled vocabulary of 833 terms and a novel nomenclature to capture and streamline descriptors of chemical use for 43,596 chemicals from the various sources. Examples of potential applications of CPCat are provided, including identifying chemicals to which children may be exposed and to support prioritization of chemicals for toxicity screening. CPCat is expected to be a valuable resource for regulators, risk assessors, and exposure scientists to identify potential sources of human exposures and exposure pathways, particularly for use in high-throughput chemical exposure assessment.

  5. Evaluating Land-Atmosphere Interactions with the North American Soil Moisture Database

    NASA Astrophysics Data System (ADS)

    Giles, S. M.; Quiring, S. M.; Ford, T.; Chavez, N.; Galvan, J.

    2015-12-01

    The North American Soil Moisture Database (NASMD) is a high-quality observational soil moisture database that was developed to study land-atmosphere interactions. It includes over 1,800 monitoring stations the United States, Canada and Mexico. Soil moisture data are collected from multiple sources, quality controlled and integrated into an online database (soilmoisture.tamu.edu). The period of record varies substantially and only a few of these stations have an observation record extending back into the 1990s. Daily soil moisture observations have been quality controlled using the North American Soil Moisture Database QAQC algorithm. The database is designed to facilitate observationally-driven investigations of land-atmosphere interactions, validation of the accuracy of soil moisture simulations in global land surface models, satellite calibration/validation for SMOS and SMAP, and an improved understanding of how soil moisture influences climate on seasonal to interannual timescales. This paper provides some examples of how the NASMD has been utilized to enhance understanding of land-atmosphere interactions in the U.S. Great Plains.

  6. Human grasping database for activities of daily living with depth, color and kinematic data streams.

    PubMed

    Saudabayev, Artur; Rysbek, Zhanibek; Khassenova, Raykhan; Varol, Huseyin Atakan

    2018-05-29

    This paper presents a grasping database collected from multiple human subjects for activities of daily living in unstructured environments. The main strength of this database is the use of three different sensing modalities: color images from a head-mounted action camera, distance data from a depth sensor on the dominant arm and upper body kinematic data acquired from an inertial motion capture suit. 3826 grasps were identified in the data collected during 9-hours of experiments. The grasps were grouped according to a hierarchical taxonomy into 35 different grasp types. The database contains information related to each grasp and associated sensor data acquired from the three sensor modalities. We also provide our data annotation software written in Matlab as an open-source tool. The size of the database is 172 GB. We believe this database can be used as a stepping stone to develop big data and machine learning techniques for grasping and manipulation with potential applications in rehabilitation robotics and intelligent automation.

  7. ClusterMine360: a database of microbial PKS/NRPS biosynthesis

    PubMed Central

    Conway, Kyle R.; Boddy, Christopher N.

    2013-01-01

    ClusterMine360 (http://www.clustermine360.ca/) is a database of microbial polyketide and non-ribosomal peptide gene clusters. It takes advantage of crowd-sourcing by allowing members of the community to make contributions while automation is used to help achieve high data consistency and quality. The database currently has >200 gene clusters from >185 compound families. It also features a unique sequence repository containing >10 000 polyketide synthase/non-ribosomal peptide synthetase domains. The sequences are filterable and downloadable as individual or multiple sequence FASTA files. We are confident that this database will be a useful resource for members of the polyketide synthases/non-ribosomal peptide synthetases research community, enabling them to keep up with the growing number of sequenced gene clusters and rapidly mine these clusters for functional information. PMID:23104377

  8. A geodata warehouse: Using denormalisation techniques as a tool for delivering spatially enabled integrated geological information to geologists

    NASA Astrophysics Data System (ADS)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2016-11-01

    New requirements to understand geological properties in three dimensions have led to the development of PropBase, a data structure and delivery tools to deliver this. At the BGS, relational database management systems (RDBMS) has facilitated effective data management using normalised subject-based database designs with business rules in a centralised, vocabulary controlled, architecture. These have delivered effective data storage in a secure environment. However, isolated subject-oriented designs prevented efficient cross-domain querying of datasets. Additionally, the tools provided often did not enable effective data discovery as they struggled to resolve the complex underlying normalised structures providing poor data access speeds. Users developed bespoke access tools to structures they did not fully understand sometimes delivering them incorrect results. Therefore, BGS has developed PropBase, a generic denormalised data structure within an RDBMS to store property data, to facilitate rapid and standardised data discovery and access, incorporating 2D and 3D physical and chemical property data, with associated metadata. This includes scripts to populate and synchronise the layer with its data sources through structured input and transcription standards. A core component of the architecture includes, an optimised query object, to deliver geoscience information from a structure equivalent to a data warehouse. This enables optimised query performance to deliver data in multiple standardised formats using a web discovery tool. Semantic interoperability is enforced through vocabularies combined from all data sources facilitating searching of related terms. PropBase holds 28.1 million spatially enabled property data points from 10 source databases incorporating over 50 property data types with a vocabulary set that includes 557 property terms. By enabling property data searches across multiple databases PropBase has facilitated new scientific research, previously considered impractical. PropBase is easily extended to incorporate 4D data (time series) and is providing a baseline for new "big data" monitoring projects.

  9. Data integration and warehousing: coordination between newborn screening and related public health programs.

    PubMed

    Therrell, Bradford L

    2003-01-01

    At birth, patient demographic and health information begin to accumulate in varied databases. There are often multiple sources of the same or similar data. New public health programs are often created without considering data linkages. Recently, newborn hearing screening (NHS) programs and immunization programs have virtually ignored the existence of newborn dried blood spot (DBS) newborn screening databases containing similar demographic data, creating data duplication in their 'new' systems. Some progressive public health departments are developing data warehouses of basic, recurrent patient information, and linking these databases to other health program databases where programs and services can benefit from such linkages. Demographic data warehousing saves time (and money) by eliminating duplicative data entry and reducing the chances of data errors. While newborn screening data are usually the first data available, they should not be the only data source considered for early data linkage or for populating a data warehouse. Birth certificate information should also be considered along with other data sources for infants that may not have received newborn screening or who may have been born outside of the jurisdiction and not have birth certificate information locally available. This newborn screening serial number provides a convenient identification number for use in the DBS program and for linking with other systems. As a minimum, data linkages should exist between newborn dried blood spot screening, newborn hearing screening, immunizations, birth certificates and birth defect registries.

  10. Development of a Knowledgebase (MetRxn) of Metabolites, Reactions and Atom Mappings to Accelerate Discovery and Redesign

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maranas, Costas D.

    With advances in DNA sequencing and genome annotation techniques, the breadth of metabolic knowledge across all kingdoms of life is increasing. The construction of genome-scale models (GSMs) facilitates this distillation of knowledge by systematically accounting for reaction stoichiometry and directionality, gene to protein to reaction relationships, reaction localization among cellular organelles, metabolite transport costs and routes, transcriptional regulation, and biomass composition. Genome-scale reconstructions available now span across all kingdoms of life, from microbes to whole-plant models, and have become indispensable for driving informed metabolic designs and interventions. A key barrier to the pace of this development is our inability tomore » utilize metabolite/reaction information from databases such as BRENDA [1], KEGG [2], MetaCyc [3], etc. due to incompatibilities of representation, duplications, and errors. Duplicate entries constitute a major impediment, where the same metabolite is found with multiple names across databases and models, which significantly slows downs the collating of information from multiple data sources. This can also lead to serious modeling errors such as charge/mass imbalances [4,5] which can thwart model predictive abilities such as identifying synthetic lethal gene pairs and quantifying metabolic flows. Hence, we created the MetRxn database [6] that takes the next step in integrating data from multiple sources and formats to automatically create a standardized knowledgebase. We subsequently deployed this resource to bring about new paradigms in genome-scale metabolic model reconstruction, metabolic flux elucidation through MFA, modeling of microbial communities, and pathway prospecting. This research has enabled the PI’s group to continue building upon research milestones and reach new ones (see list of MetRxn-related publications below).« less

  11. Conceptual Model Formalization in a Semantic Interoperability Service Framework: Transforming Relational Database Schemas to OWL.

    PubMed

    Bravo, Carlos; Suarez, Carlos; González, Carolina; López, Diego; Blobel, Bernd

    2014-01-01

    Healthcare information is distributed through multiple heterogeneous and autonomous systems. Access to, and sharing of, distributed information sources are a challenging task. To contribute to meeting this challenge, this paper presents a formal, complete and semi-automatic transformation service from Relational Databases to Web Ontology Language. The proposed service makes use of an algorithm that allows to transform several data models of different domains by deploying mainly inheritance rules. The paper emphasizes the relevance of integrating the proposed approach into an ontology-based interoperability service to achieve semantic interoperability.

  12. PSD Applicability Determination for Multiple Owner/Operator Point Sources Within a Single Facility

    EPA Pesticide Factsheets

    This document may be of assistance in applying the Title V air operating permit regulations. This document is part of the Title V Policy and Guidance Database available at www2.epa.gov/title-v-operating-permits/title-v-operating-permit-policy-and-guidance-document-index. Some documents in the database are a scanned or retyped version of a paper photocopy of the original. Although we have taken considerable effort to quality assure the documents, some may contain typographical errors. Contact the office that issued the document if you need a copy of the original.

  13. Analysis of Ribosome Inactivating Protein (RIP): A Bioinformatics Approach

    NASA Astrophysics Data System (ADS)

    Jothi, G. Edward Gnana; Majilla, G. Sahaya Jose; Subhashini, D.; Deivasigamani, B.

    2012-10-01

    In spite of the medical advances in recent years, the world is in need of different sources to encounter certain health issues.Ribosome Inactivating Proteins (RIPs) were found to be one among them. In order to get easy access about RIPs, there is a need to analyse RIPs towards constructing a database on RIPs. Also, multiple sequence alignment was done towards screening for homologues of significant RIPs from rare sources against RIPs from easily available sources in terms of similarity. Protein sequences were retrieved from SWISS-PROT and are further analysed using pair wise and multiple sequence alignment.Analysis shows that, 151 RIPs have been characterized to date. Amongst them, there are 87 type I, 37 type II, 1 type III and 25 unknown RIPs. The sequence length information of various RIPs about the availability of full or partial sequence was also found. The multiple sequence alignment of 37 type I RIP using the online server Multalin, indicates the presence of 20 conserved residues. Pairwise alignment and multiple sequence alignment of certain selected RIPs in two groups namely Group I and Group II were carried out and the consensus level was found to be 98%, 98% and 90% respectively.

  14. System and method for integrating and accessing multiple data sources within a data warehouse architecture

    DOEpatents

    Musick, Charles R [Castro Valley, CA; Critchlow, Terence [Livermore, CA; Ganesh, Madhaven [San Jose, CA; Slezak, Tom [Livermore, CA; Fidelis, Krzysztof [Brentwood, CA

    2006-12-19

    A system and method is disclosed for integrating and accessing multiple data sources within a data warehouse architecture. The metadata formed by the present method provide a way to declaratively present domain specific knowledge, obtained by analyzing data sources, in a consistent and useable way. Four types of information are represented by the metadata: abstract concepts, databases, transformations and mappings. A mediator generator automatically generates data management computer code based on the metadata. The resulting code defines a translation library and a mediator class. The translation library provides a data representation for domain specific knowledge represented in a data warehouse, including "get" and "set" methods for attributes that call transformation methods and derive a value of an attribute if it is missing. The mediator class defines methods that take "distinguished" high-level objects as input and traverse their data structures and enter information into the data warehouse.

  15. Data model and relational database design for the New Jersey Water-Transfer Data System (NJWaTr)

    USGS Publications Warehouse

    Tessler, Steven

    2003-01-01

    The New Jersey Water-Transfer Data System (NJWaTr) is a database design for the storage and retrieval of water-use data. NJWaTr can manage data encompassing many facets of water use, including (1) the tracking of various types of water-use activities (withdrawals, returns, transfers, distributions, consumptive-use, wastewater collection, and treatment); (2) the storage of descriptions, classifications and locations of places and organizations involved in water-use activities; (3) the storage of details about measured or estimated volumes of water associated with water-use activities; and (4) the storage of information about data sources and water resources associated with water use. In NJWaTr, each water transfer occurs unidirectionally between two site objects, and the sites and conveyances form a water network. The core entities in the NJWaTr model are site, conveyance, transfer/volume, location, and owner. Other important entities include water resource (used for withdrawals and returns), data source, permit, and alias. Multiple water-exchange estimates based on different methods or data sources can be stored for individual transfers. Storage of user-defined details is accommodated for several of the main entities. Many tables contain classification terms to facilitate the detailed description of data items and can be used for routine or custom data summarization. NJWaTr accommodates single-user and aggregate-user water-use data, can be used for large or small water-network projects, and is available as a stand-alone Microsoft? Access database. Data stored in the NJWaTr structure can be retrieved in user-defined combinations to serve visualization and analytical applications. Users can customize and extend the database, link it to other databases, or implement the design in other relational database applications.

  16. A scalable database model for multiparametric time series: a volcano observatory case study

    NASA Astrophysics Data System (ADS)

    Montalto, Placido; Aliotta, Marco; Cassisi, Carmelo; Prestifilippo, Michele; Cannata, Andrea

    2014-05-01

    The variables collected by a sensor network constitute a heterogeneous data source that needs to be properly organized in order to be used in research and geophysical monitoring. With the time series term we refer to a set of observations of a given phenomenon acquired sequentially in time. When the time intervals are equally spaced one speaks of period or sampling frequency. Our work describes in detail a possible methodology for storage and management of time series using a specific data structure. We designed a framework, hereinafter called TSDSystem (Time Series Database System), in order to acquire time series from different data sources and standardize them within a relational database. The operation of standardization provides the ability to perform operations, such as query and visualization, of many measures synchronizing them using a common time scale. The proposed architecture follows a multiple layer paradigm (Loaders layer, Database layer and Business Logic layer). Each layer is specialized in performing particular operations for the reorganization and archiving of data from different sources such as ASCII, Excel, ODBC (Open DataBase Connectivity), file accessible from the Internet (web pages, XML). In particular, the loader layer performs a security check of the working status of each running software through an heartbeat system, in order to automate the discovery of acquisition issues and other warning conditions. Although our system has to manage huge amounts of data, performance is guaranteed by using a smart partitioning table strategy, that keeps balanced the percentage of data stored in each database table. TSDSystem also contains modules for the visualization of acquired data, that provide the possibility to query different time series on a specified time range, or follow the realtime signal acquisition, according to a data access policy from the users.

  17. A multidisciplinary database for geophysical time series management

    NASA Astrophysics Data System (ADS)

    Montalto, P.; Aliotta, M.; Cassisi, C.; Prestifilippo, M.; Cannata, A.

    2013-12-01

    The variables collected by a sensor network constitute a heterogeneous data source that needs to be properly organized in order to be used in research and geophysical monitoring. With the time series term we refer to a set of observations of a given phenomenon acquired sequentially in time. When the time intervals are equally spaced one speaks of period or sampling frequency. Our work describes in detail a possible methodology for storage and management of time series using a specific data structure. We designed a framework, hereinafter called TSDSystem (Time Series Database System), in order to acquire time series from different data sources and standardize them within a relational database. The operation of standardization provides the ability to perform operations, such as query and visualization, of many measures synchronizing them using a common time scale. The proposed architecture follows a multiple layer paradigm (Loaders layer, Database layer and Business Logic layer). Each layer is specialized in performing particular operations for the reorganization and archiving of data from different sources such as ASCII, Excel, ODBC (Open DataBase Connectivity), file accessible from the Internet (web pages, XML). In particular, the loader layer performs a security check of the working status of each running software through an heartbeat system, in order to automate the discovery of acquisition issues and other warning conditions. Although our system has to manage huge amounts of data, performance is guaranteed by using a smart partitioning table strategy, that keeps balanced the percentage of data stored in each database table. TSDSystem also contains modules for the visualization of acquired data, that provide the possibility to query different time series on a specified time range, or follow the realtime signal acquisition, according to a data access policy from the users.

  18. Column Store for GWAC: A High-cadence, High-density, Large-scale Astronomical Light Curve Pipeline and Distributed Shared-nothing Database

    NASA Astrophysics Data System (ADS)

    Wan, Meng; Wu, Chao; Wang, Jing; Qiu, Yulei; Xin, Liping; Mullender, Sjoerd; Mühleisen, Hannes; Scheers, Bart; Zhang, Ying; Nes, Niels; Kersten, Martin; Huang, Yongpan; Deng, Jinsong; Wei, Jianyan

    2016-11-01

    The ground-based wide-angle camera array (GWAC), a part of the SVOM space mission, will search for various types of optical transients by continuously imaging a field of view (FOV) of 5000 degrees2 every 15 s. Each exposure consists of 36 × 4k × 4k pixels, typically resulting in 36 × ˜175,600 extracted sources. For a modern time-domain astronomy project like GWAC, which produces massive amounts of data with a high cadence, it is challenging to search for short timescale transients in both real-time and archived data, and to build long-term light curves for variable sources. Here, we develop a high-cadence, high-density light curve pipeline (HCHDLP) to process the GWAC data in real-time, and design a distributed shared-nothing database to manage the massive amount of archived data which will be used to generate a source catalog with more than 100 billion records during 10 years of operation. First, we develop HCHDLP based on the column-store DBMS of MonetDB, taking advantage of MonetDB’s high performance when applied to massive data processing. To realize the real-time functionality of HCHDLP, we optimize the pipeline in its source association function, including both time and space complexity from outside the database (SQL semantic) and inside (RANGE-JOIN implementation), as well as in its strategy of building complex light curves. The optimized source association function is accelerated by three orders of magnitude. Second, we build a distributed database using a two-level time partitioning strategy via the MERGE TABLE and REMOTE TABLE technology of MonetDB. Intensive tests validate that our database architecture is able to achieve both linear scalability in response time and concurrent access by multiple users. In summary, our studies provide guidance for a solution to GWAC in real-time data processing and management of massive data.

  19. A Dashboard for the Italian Computing in ALICE

    NASA Astrophysics Data System (ADS)

    Elia, D.; Vino, G.; Bagnasco, S.; Crescente, A.; Donvito, G.; Franco, A.; Lusso, S.; Mura, D.; Piano, S.; Platania, G.; ALICE Collaboration

    2017-10-01

    A dashboard devoted to the computing in the Italian sites for the ALICE experiment at the LHC has been deployed. A combination of different complementary monitoring tools is typically used in most of the Tier-2 sites: this makes somewhat difficult to figure out at a glance the status of the site and to compare information extracted from different sources for debugging purposes. To overcome these limitations a dedicated ALICE dashboard has been designed and implemented in each of the ALICE Tier-2 sites in Italy: in particular, it provides a single, interactive and easily customizable graphical interface where heterogeneous data are presented. The dashboard is based on two main ingredients: an open source time-series database and a dashboard builder tool for visualizing time-series metrics. Various sensors, able to collect data from the multiple data sources, have been also written. A first version of a national computing dashboard has been implemented using a specific instance of the builder to gather data from all the local databases.

  20. Two Wrongs Make a Right: Addressing Underreporting in Binary Data from Multiple Sources.

    PubMed

    Cook, Scott J; Blas, Betsabe; Carroll, Raymond J; Sinha, Samiran

    2017-04-01

    Media-based event data-i.e., data comprised from reporting by media outlets-are widely used in political science research. However, events of interest (e.g., strikes, protests, conflict) are often underreported by these primary and secondary sources, producing incomplete data that risks inconsistency and bias in subsequent analysis. While general strategies exist to help ameliorate this bias, these methods do not make full use of the information often available to researchers. Specifically, much of the event data used in the social sciences is drawn from multiple, overlapping news sources (e.g., Agence France-Presse, Reuters). Therefore, we propose a novel maximum likelihood estimator that corrects for misclassification in data arising from multiple sources. In the most general formulation of our estimator, researchers can specify separate sets of predictors for the true-event model and each of the misclassification models characterizing whether a source fails to report on an event. As such, researchers are able to accurately test theories on both the causes of and reporting on an event of interest. Simulations evidence that our technique regularly out performs current strategies that either neglect misclassification, the unique features of the data-generating process, or both. We also illustrate the utility of this method with a model of repression using the Social Conflict in Africa Database.

  1. Federated Web-accessible Clinical Data Management within an Extensible NeuroImaging Database

    PubMed Central

    Keator, David B.; Wei, Dingying; Fennema-Notestine, Christine; Pease, Karen R.; Bockholt, Jeremy; Grethe, Jeffrey S.

    2010-01-01

    Managing vast datasets collected throughout multiple clinical imaging communities has become critical with the ever increasing and diverse nature of datasets. Development of data management infrastructure is further complicated by technical and experimental advances that drive modifications to existing protocols and acquisition of new types of research data to be incorporated into existing data management systems. In this paper, an extensible data management system for clinical neuroimaging studies is introduced: The Human Clinical Imaging Database (HID) and Toolkit. The database schema is constructed to support the storage of new data types without changes to the underlying schema. The complex infrastructure allows management of experiment data, such as image protocol and behavioral task parameters, as well as subject-specific data, including demographics, clinical assessments, and behavioral task performance metrics. Of significant interest, embedded clinical data entry and management tools enhance both consistency of data reporting and automatic entry of data into the database. The Clinical Assessment Layout Manager (CALM) allows users to create on-line data entry forms for use within and across sites, through which data is pulled into the underlying database via the generic clinical assessment management engine (GAME). Importantly, the system is designed to operate in a distributed environment, serving both human users and client applications in a service-oriented manner. Querying capabilities use a built-in multi-database parallel query builder/result combiner, allowing web-accessible queries within and across multiple federated databases. The system along with its documentation is open-source and available from the Neuroimaging Informatics Tools and Resource Clearinghouse (NITRC) site. PMID:20567938

  2. Federated web-accessible clinical data management within an extensible neuroimaging database.

    PubMed

    Ozyurt, I Burak; Keator, David B; Wei, Dingying; Fennema-Notestine, Christine; Pease, Karen R; Bockholt, Jeremy; Grethe, Jeffrey S

    2010-12-01

    Managing vast datasets collected throughout multiple clinical imaging communities has become critical with the ever increasing and diverse nature of datasets. Development of data management infrastructure is further complicated by technical and experimental advances that drive modifications to existing protocols and acquisition of new types of research data to be incorporated into existing data management systems. In this paper, an extensible data management system for clinical neuroimaging studies is introduced: The Human Clinical Imaging Database (HID) and Toolkit. The database schema is constructed to support the storage of new data types without changes to the underlying schema. The complex infrastructure allows management of experiment data, such as image protocol and behavioral task parameters, as well as subject-specific data, including demographics, clinical assessments, and behavioral task performance metrics. Of significant interest, embedded clinical data entry and management tools enhance both consistency of data reporting and automatic entry of data into the database. The Clinical Assessment Layout Manager (CALM) allows users to create on-line data entry forms for use within and across sites, through which data is pulled into the underlying database via the generic clinical assessment management engine (GAME). Importantly, the system is designed to operate in a distributed environment, serving both human users and client applications in a service-oriented manner. Querying capabilities use a built-in multi-database parallel query builder/result combiner, allowing web-accessible queries within and across multiple federated databases. The system along with its documentation is open-source and available from the Neuroimaging Informatics Tools and Resource Clearinghouse (NITRC) site.

  3. ClassLess: A Comprehensive Database of Young Stellar Objects

    NASA Astrophysics Data System (ADS)

    Hillenbrand, Lynne; Baliber, Nairn

    2015-01-01

    We have designed and constructed a database housing published measurements of Young Stellar Objects (YSOs) within ~1 kpc of the Sun. ClassLess, so called because it includes YSOs in all stages of evolution, is a relational database in which user interaction is conducted via HTML web browsers, queries are performed in scientific language, and all data are linked to the sources of publication. Each star is associated with a cluster (or clusters), and both spatially resolved and unresolved measurements are stored, allowing proper use of data from multiple star systems. With this fully searchable tool, myriad ground- and space-based instruments and surveys across wavelength regimes can be exploited. In addition to primary measurements, the database self consistently calculates and serves higher level data products such as extinction, luminosity, and mass. As a result, searches for young stars with specific physical characteristics can be completed with just a few mouse clicks.

  4. Biopython: freely available Python tools for computational molecular biology and bioinformatics.

    PubMed

    Cock, Peter J A; Antao, Tiago; Chang, Jeffrey T; Chapman, Brad A; Cox, Cymon J; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J L

    2009-06-01

    The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Biopython is freely available, with documentation and source code at (www.biopython.org) under the Biopython license.

  5. Occupational therapy articles in serial publications: an analysis of sources.

    PubMed Central

    Reed, K L

    1988-01-01

    This study was designed to locate and document serial literature on occupational therapy published since 1900. Emphasis is placed on finding articles on occupational therapy or by occupational therapists from sources other than those normally associated with the professional journals. Multiple sources were used including print indexes, online databases, occupational therapy bibliographies, and tables of contents or yearly indexes. Almost 7,000 articles were identified, not including those published in foreign journals. Occupational therapy publications have increased steadily since 1900, with the most rapid increase during the 1970s and 1980s when five new occupational therapy journals were initiated. Suggestions for formulating search strategies are included. PMID:3285932

  6. Functional Interaction Network Construction and Analysis for Disease Discovery.

    PubMed

    Wu, Guanming; Haw, Robin

    2017-01-01

    Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.

  7. Mapping the literature of nursing: 1996–2000

    PubMed Central

    Allen, Margaret (Peg); Jacobs, Susan Kaplan; Levy, June R.

    2006-01-01

    Introduction: This project is a collaborative effort of the Task Force on Mapping the Nursing Literature of the Nursing and Allied Health Resources Section of the Medical Library Association. This overview summarizes eighteen studies covering general nursing and sixteen specialties. Method: Following a common protocol, citations from source journals were analyzed for a three-year period within the years 1996 to 2000. Analysis included cited formats, age, and ranking of the frequency of cited journal titles. Highly cited journals were analyzed for coverage in twelve health sciences and academic databases. Results: Journals were the most frequently cited format, followed by books. More than 60% of the cited resources were published in the previous seven years. Bradford's law was validated, with a small core of cited journals accounting for a third of the citations. Medical and science databases provided the most comprehensive access for biomedical titles, while CINAHL and PubMed provided the best access for nursing journals. Discussion: Beyond a heavily cited core, nursing journal citations are widely dispersed among a variety of sources and disciplines, with corresponding access via a variety of bibliographic tools. Results underscore the interdisciplinary nature of the nursing profession. Conclusion: For comprehensive searches, nurses need to search multiple databases. Libraries need to provide access to databases beyond PubMed, including CINAHL and academic databases. Database vendors should improve their coverage of nursing, biomedical, and psychosocial titles identified in these studies. Additional research is needed to update these studies and analyze nursing specialties not covered. PMID:16636714

  8. Mapping the literature of nursing: 1996-2000.

    PubMed

    Allen, Margaret Peg; Jacobs, Susan Kaplan; Levy, June R

    2006-04-01

    This project is a collaborative effort of the Task Force on Mapping the Nursing Literature of the Nursing and Allied Health Resources Section of the Medical Library Association. This overview summarizes eighteen studies covering general nursing and sixteen specialties. Following a common protocol, citations from source journals were analyzed for a three-year period within the years 1996 to 2000. Analysis included cited formats, age, and ranking of the frequency of cited journal titles. Highly cited journals were analyzed for coverage in twelve health sciences and academic databases. Journals were the most frequently cited format, followed by books. More than 60% of the cited resources were published in the previous seven years. Bradford's law was validated, with a small core of cited journals accounting for a third of the citations. Medical and science databases provided the most comprehensive access for biomedical titles, while CINAHL and PubMed provided the best access for nursing journals. Beyond a heavily cited core, nursing journal citations are widely dispersed among a variety of sources and disciplines, with corresponding access via a variety of bibliographic tools. Results underscore the interdisciplinary nature of the nursing profession. For comprehensive searches, nurses need to search multiple databases. Libraries need to provide access to databases beyond PubMed, including CINAHL and academic databases. Database vendors should improve their coverage of nursing, biomedical, and psychosocial titles identified in these studies. Additional research is needed to update these studies and analyze nursing specialties not covered.

  9. Assessment and application of national environmental databases and mapping tools at the local level to two community case studies.

    PubMed

    Hammond, Davyda; Conlon, Kathryn; Barzyk, Timothy; Chahine, Teresa; Zartarian, Valerie; Schultz, Brad

    2011-03-01

    Communities are concerned over pollution levels and seek methods to systematically identify and prioritize the environmental stressors in their communities. Geographic information system (GIS) maps of environmental information can be useful tools for communities in their assessment of environmental-pollution-related risks. Databases and mapping tools that supply community-level estimates of ambient concentrations of hazardous pollutants, risk, and potential health impacts can provide relevant information for communities to understand, identify, and prioritize potential exposures and risk from multiple sources. An assessment of existing databases and mapping tools was conducted as part of this study to explore the utility of publicly available databases, and three of these databases were selected for use in a community-level GIS mapping application. Queried data from the U.S. EPA's National-Scale Air Toxics Assessment, Air Quality System, and National Emissions Inventory were mapped at the appropriate spatial and temporal resolutions for identifying risks of exposure to air pollutants in two communities. The maps combine monitored and model-simulated pollutant and health risk estimates, along with local survey results, to assist communities with the identification of potential exposure sources and pollution hot spots. Findings from this case study analysis will provide information to advance the development of new tools to assist communities with environmental risk assessments and hazard prioritization. © 2010 Society for Risk Analysis.

  10. Semi-automatic Data Integration using Karma

    NASA Astrophysics Data System (ADS)

    Garijo, D.; Kejriwal, M.; Pierce, S. A.; Houser, P. I. Q.; Peckham, S. D.; Stanko, Z.; Hardesty Lewis, D.; Gil, Y.; Pennington, D. D.; Knoblock, C.

    2017-12-01

    Data integration applications are ubiquitous in scientific disciplines. A state-of-the-art data integration system accepts both a set of data sources and a target ontology as input, and semi-automatically maps the data sources in terms of concepts and relationships in the target ontology. Mappings can be both complex and highly domain-specific. Once such a semantic model, expressing the mapping using community-wide standard, is acquired, the source data can be stored in a single repository or database using the semantics of the target ontology. However, acquiring the mapping is a labor-prone process, and state-of-the-art artificial intelligence systems are unable to fully automate the process using heuristics and algorithms alone. Instead, a more realistic goal is to develop adaptive tools that minimize user feedback (e.g., by offering good mapping recommendations), while at the same time making it intuitive and easy for the user to both correct errors and to define complex mappings. We present Karma, a data integration system that has been developed over multiple years in the information integration group at the Information Sciences Institute, a research institute at the University of Southern California's Viterbi School of Engineering. Karma is a state-of-the-art data integration tool that supports an interactive graphical user interface, and has been featured in multiple domains over the last five years, including geospatial, biological, humanities and bibliographic applications. Karma allows a user to import their own ontology and datasets using widely used formats such as RDF, XML, CSV and JSON, can be set up either locally or on a server, supports a native backend database for prototyping queries, and can even be seamlessly integrated into external computational pipelines, including those ingesting data via streaming data sources, Web APIs and SQL databases. We illustrate a Karma workflow at a conceptual level, along with a live demo, and show use cases of Karma specifically for the geosciences. In particular, we show how Karma can be used intuitively to obtain the mapping model between case study data sources and a publicly available and expressive target ontology that has been designed to capture a broad set of concepts in geoscience with standardized, easily searchable names.

  11. A Taxonomic Search Engine: Federating taxonomic databases using web services

    PubMed Central

    Page, Roderic DM

    2005-01-01

    Background The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. Results The Taxonomic Search Engine (TSE) is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO) and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID) authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata) for each name. Conclusion The Taxonomic Search Engine is available at and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names. PMID:15757517

  12. Multiple imputation as one tool to provide longitudinal databases for modelling human height and weight development.

    PubMed

    Aßmann, C

    2016-06-01

    Besides large efforts regarding field work, provision of valid databases requires statistical and informational infrastructure to enable long-term access to longitudinal data sets on height, weight and related issues. To foster use of longitudinal data sets within the scientific community, provision of valid databases has to address data-protection regulations. It is, therefore, of major importance to hinder identifiability of individuals from publicly available databases. To reach this goal, one possible strategy is to provide a synthetic database to the public allowing for pretesting strategies for data analysis. The synthetic databases can be established using multiple imputation tools. Given the approval of the strategy, verification is based on the original data. Multiple imputation by chained equations is illustrated to facilitate provision of synthetic databases as it allows for capturing a wide range of statistical interdependencies. Also missing values, typically occurring within longitudinal databases for reasons of item non-response, can be addressed via multiple imputation when providing databases. The provision of synthetic databases using multiple imputation techniques is one possible strategy to ensure data protection, increase visibility of longitudinal databases and enhance the analytical potential.

  13. Biomine: predicting links between biological entities using network models of heterogeneous databases.

    PubMed

    Eronen, Lauri; Toivonen, Hannu

    2012-06-06

    Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.

  14. Enabling a systems biology knowledgebase with gaggle and firegoose

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baliga, Nitin S.

    The overall goal of this project was to extend the existing Gaggle and Firegoose systems to develop an open-source technology that runs over the web and links desktop applications with many databases and software applications. This technology would enable researchers to incorporate workflows for data analysis that can be executed from this interface to other online applications. The four specific aims were to (1) provide one-click mapping of genes, proteins, and complexes across databases and species; (2) enable multiple simultaneous workflows; (3) expand sophisticated data analysis for online resources; and enhance open-source development of the Gaggle-Firegoose infrastructure. Gaggle is anmore » open-source Java software system that integrates existing bioinformatics programs and data sources into a user-friendly, extensible environment to allow interactive exploration, visualization, and analysis of systems biology data. Firegoose is an extension to the Mozilla Firefox web browser that enables data transfer between websites and desktop tools including Gaggle. In the last phase of this funding period, we have made substantial progress on development and application of the Gaggle integration framework. We implemented the workspace to the Network Portal. Users can capture data from Firegoose and save them to the workspace. Users can create workflows to start multiple software components programmatically and pass data between them. Results of analysis can be saved to the cloud so that they can be easily restored on any machine. We also developed the Gaggle Chrome Goose, a plugin for the Google Chrome browser in tandem with an opencpu server in the Amazon EC2 cloud. This allows users to interactively perform data analysis on a single web page using the R packages deployed on the opencpu server. The cloud-based framework facilitates collaboration between researchers from multiple organizations. We have made a number of enhancements to the cmonkey2 application to enable and improve the integration within different environments, and we have created a new tools pipeline for generating EGRIN2 models in a largely automated way.« less

  15. An open access database for the evaluation of heart sound algorithms.

    PubMed

    Liu, Chengyu; Springer, David; Li, Qiao; Moody, Benjamin; Juan, Ricardo Abad; Chorro, Francisco J; Castells, Francisco; Roig, José Millet; Silva, Ikaro; Johnson, Alistair E W; Syed, Zeeshan; Schmidt, Samuel E; Papadaniil, Chrysa D; Hadjileontiadis, Leontios; Naseri, Hosein; Moukadem, Ali; Dieterlen, Alain; Brandt, Christian; Tang, Hong; Samieinasab, Maryam; Samieinasab, Mohammad Reza; Sameni, Reza; Mark, Roger G; Clifford, Gari D

    2016-12-01

    In the past few decades, analysis of heart sound signals (i.e. the phonocardiogram or PCG), especially for automated heart sound segmentation and classification, has been widely studied and has been reported to have the potential value to detect pathology accurately in clinical applications. However, comparative analyses of algorithms in the literature have been hindered by the lack of high-quality, rigorously validated, and standardized open databases of heart sound recordings. This paper describes a public heart sound database, assembled for an international competition, the PhysioNet/Computing in Cardiology (CinC) Challenge 2016. The archive comprises nine different heart sound databases sourced from multiple research groups around the world. It includes 2435 heart sound recordings in total collected from 1297 healthy subjects and patients with a variety of conditions, including heart valve disease and coronary artery disease. The recordings were collected from a variety of clinical or nonclinical (such as in-home visits) environments and equipment. The length of recording varied from several seconds to several minutes. This article reports detailed information about the subjects/patients including demographics (number, age, gender), recordings (number, location, state and time length), associated synchronously recorded signals, sampling frequency and sensor type used. We also provide a brief summary of the commonly used heart sound segmentation and classification methods, including open source code provided concurrently for the Challenge. A description of the PhysioNet/CinC Challenge 2016, including the main aims, the training and test sets, the hand corrected annotations for different heart sound states, the scoring mechanism, and associated open source code are provided. In addition, several potential benefits from the public heart sound database are discussed.

  16. Colorado Late Cenozoic Fault and Fold Database and Internet Map Server: User-friendly technology for complex information

    USGS Publications Warehouse

    Morgan, K.S.; Pattyn, G.J.; Morgan, M.L.

    2005-01-01

    Internet mapping applications for geologic data allow simultaneous data delivery and collection, enabling quick data modification while efficiently supplying the end user with information. Utilizing Web-based technologies, the Colorado Geological Survey's Colorado Late Cenozoic Fault and Fold Database was transformed from a monothematic, nonspatial Microsoft Access database into a complex information set incorporating multiple data sources. The resulting user-friendly format supports easy analysis and browsing. The core of the application is the Microsoft Access database, which contains information compiled from available literature about faults and folds that are known or suspected to have moved during the late Cenozoic. The database contains nonspatial fields such as structure type, age, and rate of movement. Geographic locations of the fault and fold traces were compiled from previous studies at 1:250,000 scale to form a spatial database containing information such as length and strike. Integration of the two databases allowed both spatial and nonspatial information to be presented on the Internet as a single dataset (http://geosurvey.state.co.us/pubs/ceno/). The user-friendly interface enables users to view and query the data in an integrated manner, thus providing multiple ways to locate desired information. Retaining the digital data format also allows continuous data updating and quick delivery of newly acquired information. This dataset is a valuable resource to anyone interested in earthquake hazards and the activity of faults and folds in Colorado. Additional geologic hazard layers and imagery may aid in decision support and hazard evaluation. The up-to-date and customizable maps are invaluable tools for researchers or the public.

  17. Using Commercially available Tools for multi-faceted health assessment: Data Integration Lessons Learned

    PubMed Central

    Wilamowska, Katarzyna; Le, Thai; Demiris, George; Thompson, Hilaire

    2013-01-01

    Health monitoring data collected from multiple available intake devices provide a rich resource to support older adult health and wellness. Though large amounts of data can be collected, there is currently a lack of understanding on integration of these various data sources using commercially available products. This article describes an inexpensive approach to integrating data from multiple sources from a recently completed pilot project that assessed older adult wellness, and demonstrates challenges and benefits in pursuing data integration using commercially available products. The data in this project were sourced from a) electronically captured participant intake surveys, and existing commercial software output for b) vital signs and c) cognitive function. All the software used for data integration in this project was freeware and was chosen because of its ease of comprehension by novice database users. The methods and results of this approach provide a model for researchers with similar data integration needs to easily replicate this effort at a low cost. PMID:23728444

  18. Federated querying architecture with clinical & translational health IT application.

    PubMed

    Livne, Oren E; Schultz, N Dustin; Narus, Scott P

    2011-10-01

    We present a software architecture that federates data from multiple heterogeneous health informatics data sources owned by multiple organizations. The architecture builds upon state-of-the-art open-source Java and XML frameworks in innovative ways. It consists of (a) federated query engine, which manages federated queries and result set aggregation via a patient identification service; and (b) data source facades, which translate the physical data models into a common model on-the-fly and handle large result set streaming. System modules are connected via reusable Apache Camel integration routes and deployed to an OSGi enterprise service bus. We present an application of our architecture that allows users to construct queries via the i2b2 web front-end, and federates patient data from the University of Utah Enterprise Data Warehouse and the Utah Population database. Our system can be easily adopted, extended and integrated with existing SOA Healthcare and HL7 frameworks such as i2b2 and caGrid.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, Nageswara S; Sen, Satyabrata; Berry, M. L..

    Domestic Nuclear Detection Office s (DNDO) Intelligence Radiation Sensors Systems (IRSS) program supported the development of networks of commercial-off-the-shelf (COTS) radiation counters for detecting, localizing, and identifying low-level radiation sources. Under this program, a series of indoor and outdoor tests were conducted with multiple source strengths and types, different background profiles, and various types of source and detector movements. Following the tests, network algorithms were replayed in various re-constructed scenarios using sub-networks. These measurements and algorithm traces together provide a rich collection of highly valuable datasets for testing the current and next generation radiation network algorithms, including the ones (tomore » be) developed by broader R&D communities such as distributed detection, information fusion, and sensor networks. From this multiple TeraByte IRSS database, we distilled out and packaged the first batch of canonical datasets for public release. They include measurements from ten indoor and two outdoor tests which represent increasingly challenging baseline scenarios for robustly testing radiation network algorithms.« less

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gwyn, Stephen D. J., E-mail: Stephen.Gwyn@nrc-cnrc.gc.ca

    This paper describes the image stacks and catalogs of the Canada-France-Hawaii Telescope Legacy Survey produced using the MegaPipe data pipeline at the Canadian Astronomy Data Centre. The Legacy Survey is divided into two parts. The Deep Survey consists of four fields each of 1 deg{sup 2}, with magnitude limits (50% completeness for point sources) of u = 27.5, g = 27.9, r = 27.7, i = 27.4, and z = 26.2. It contains 1.6 Multiplication-Sign 10{sup 6} sources. The Wide Survey consists of 150 deg{sup 2} split over four fields, with magnitude limits of u = 26.0, g = 26.5,more » r = 25.9, i = 25.7, and z = 24.6. It contains 3 Multiplication-Sign 10{sup 7} sources. This paper describes the calibration, image stacking, and catalog generation process. The images and catalogs are available on the web through several interfaces: normal image and text file catalog downloads, a 'Google Sky' interface, an image cutout service, and a catalog database query service.« less

  1. WEB-GIS Decision Support System for CO2 storage

    NASA Astrophysics Data System (ADS)

    Gaitanaru, Dragos; Leonard, Anghel; Radu Gogu, Constantin; Le Guen, Yvi; Scradeanu, Daniel; Pagnejer, Mihaela

    2013-04-01

    Environmental decision support systems (DSS) paradigm evolves and changes as more knowledge and technology become available to the environmental community. Geographic Information Systems (GIS) can be used to extract, assess and disseminate some types of information, which are otherwise difficult to access by traditional methods. In the same time, with the help of the Internet and accompanying tools, creating and publishing online interactive maps has become easier and rich with options. The Decision Support System (MDSS) developed for the MUSTANG (A MUltiple Space and Time scale Approach for the quaNtification of deep saline formations for CO2 storaGe) project is a user friendly web based application that uses the GIS capabilities. MDSS can be exploited by the experts for CO2 injection and storage in deep saline aquifers. The main objective of the MDSS is to help the experts to take decisions based large structured types of data and information. In order to achieve this objective the MDSS has a geospatial objected-orientated database structure for a wide variety of data and information. The entire application is based on several principles leading to a series of capabilities and specific characteristics: (i) Open-Source - the entire platform (MDSS) is based on open-source technologies - (1) database engine, (2) application server, (3) geospatial server, (4) user interfaces, (5) add-ons, etc. (ii) Multiple database connections - MDSS is capable to connect to different databases that are located on different server machines. (iii)Desktop user experience - MDSS architecture and design follows the structure of a desktop software. (iv)Communication - the server side and the desktop are bound together by series functions that allows the user to upload, use, modify and download data within the application. The architecture of the system involves one database and a modular application composed by: (1) a visualization module, (2) an analysis module, (3) a guidelines module, and (4) a risk assessment module. The Database component is build by using the PostgreSQL and PostGIS open source technology. The visualization module allows the user to view data of CO2 injection sites in different ways: (1) geospatial visualization, (2) table view, (3) 3D visualization. The analysis module will allow the user to perform certain analysis like Injectivity, Containment and Capacity analysis. The Risk Assessment module focus on the site risk matrix approach. The Guidelines module contains the methodologies of CO2 injection and storage into deep saline aquifers guidelines.

  2. Automatic image database generation from CAD for 3D object recognition

    NASA Astrophysics Data System (ADS)

    Sardana, Harish K.; Daemi, Mohammad F.; Ibrahim, Mohammad K.

    1993-06-01

    The development and evaluation of Multiple-View 3-D object recognition systems is based on a large set of model images. Due to the various advantages of using CAD, it is becoming more and more practical to use existing CAD data in computer vision systems. Current PC- level CAD systems are capable of providing physical image modelling and rendering involving positional variations in cameras, light sources etc. We have formulated a modular scheme for automatic generation of various aspects (views) of the objects in a model based 3-D object recognition system. These views are generated at desired orientations on the unit Gaussian sphere. With a suitable network file sharing system (NFS), the images can directly be stored on a database located on a file server. This paper presents the image modelling solutions using CAD in relation to multiple-view approach. Our modular scheme for data conversion and automatic image database storage for such a system is discussed. We have used this approach in 3-D polyhedron recognition. An overview of the results, advantages and limitations of using CAD data and conclusions using such as scheme are also presented.

  3. Techniques to Access Databases and Integrate Data for Hydrologic Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whelan, Gene; Tenney, Nathan D.; Pelton, Mitchell A.

    2009-06-17

    This document addresses techniques to access and integrate data for defining site-specific conditions and behaviors associated with ground-water and surface-water radionuclide transport applicable to U.S. Nuclear Regulatory Commission reviews. Environmental models typically require input data from multiple internal and external sources that may include, but are not limited to, stream and rainfall gage data, meteorological data, hydrogeological data, habitat data, and biological data. These data may be retrieved from a variety of organizations (e.g., federal, state, and regional) and source types (e.g., HTTP, FTP, and databases). Available data sources relevant to hydrologic analyses for reactor licensing are identified and reviewed.more » The data sources described can be useful to define model inputs and parameters, including site features (e.g., watershed boundaries, stream locations, reservoirs, site topography), site properties (e.g., surface conditions, subsurface hydraulic properties, water quality), and site boundary conditions, input forcings, and extreme events (e.g., stream discharge, lake levels, precipitation, recharge, flood and drought characteristics). Available software tools for accessing established databases, retrieving the data, and integrating it with models were identified and reviewed. The emphasis in this review was on existing software products with minimal required modifications to enable their use with the FRAMES modeling framework. The ability of four of these tools to access and retrieve the identified data sources was reviewed. These four software tools were the Hydrologic Data Acquisition and Processing System (HDAPS), Integrated Water Resources Modeling System (IWRMS) External Data Harvester, Data for Environmental Modeling Environmental Data Download Tool (D4EM EDDT), and the FRAMES Internet Database Tools. The IWRMS External Data Harvester and the D4EM EDDT were identified as the most promising tools based on their ability to access and retrieve the required data, and their ability to integrate the data into environmental models using the FRAMES environment.« less

  4. The antiSMASH database, a comprehensive database of microbial secondary metabolite biosynthetic gene clusters.

    PubMed

    Blin, Kai; Medema, Marnix H; Kottmann, Renzo; Lee, Sang Yup; Weber, Tilmann

    2017-01-04

    Secondary metabolites produced by microorganisms are the main source of bioactive compounds that are in use as antimicrobial and anticancer drugs, fungicides, herbicides and pesticides. In the last decade, the increasing availability of microbial genomes has established genome mining as a very important method for the identification of their biosynthetic gene clusters (BGCs). One of the most popular tools for this task is antiSMASH. However, so far, antiSMASH is limited to de novo computing results for user-submitted genomes and only partially connects these with BGCs from other organisms. Therefore, we developed the antiSMASH database, a simple but highly useful new resource to browse antiSMASH-annotated BGCs in the currently 3907 bacterial genomes in the database and perform advanced search queries combining multiple search criteria. antiSMASH-DB is available at http://antismash-db.secondarymetabolites.org/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. ClassLess: A Comprehensive Database of Young Stellar Objects

    NASA Astrophysics Data System (ADS)

    Hillenbrand, Lynne A.; baliber, nairn

    2015-08-01

    We have designed and constructed a database intended to house catalog and literature-published measurements of Young Stellar Objects (YSOs) within ~1 kpc of the Sun. ClassLess, so called because it includes YSOs in all stages of evolution, is a relational database in which user interaction is conducted via HTML web browsers, queries are performed in scientific language, and all data are linked to the sources of publication. Each star is associated with a cluster (or clusters), and both spatially resolved and unresolved measurements are stored, allowing proper use of data from multiple star systems. With this fully searchable tool, myriad ground- and space-based instruments and surveys across wavelength regimes can be exploited. In addition to primary measurements, the database self consistently calculates and serves higher level data products such as extinction, luminosity, and mass. As a result, searches for young stars with specific physical characteristics can be completed with just a few mouse clicks. We are in the database population phase now, and are eager to engage with interested experts worldwide on local galactic star formation and young stellar populations.

  6. A geospatial database model for the management of remote sensing datasets at multiple spectral, spatial, and temporal scales

    NASA Astrophysics Data System (ADS)

    Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George

    2017-10-01

    In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.

  7. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases

    PubMed Central

    Orchard, Sandra; Ammari, Mais; Aranda, Bruno; Breuza, Lionel; Briganti, Leonardo; Broackes-Carter, Fiona; Campbell, Nancy H.; Chavali, Gayatri; Chen, Carol; del-Toro, Noemi; Duesbury, Margaret; Dumousseau, Marine; Galeota, Eugenia; Hinz, Ursula; Iannuccelli, Marta; Jagannathan, Sruthi; Jimenez, Rafael; Khadake, Jyoti; Lagreid, Astrid; Licata, Luana; Lovering, Ruth C.; Meldal, Birgit; Melidoni, Anna N.; Milagros, Mila; Peluso, Daniele; Perfetto, Livia; Porras, Pablo; Raghunath, Arathi; Ricard-Blum, Sylvie; Roechert, Bernd; Stutz, Andre; Tognolli, Michael; van Roey, Kim; Cesareni, Gianni; Hermjakob, Henning

    2014-01-01

    IntAct (freely available at http://www.ebi.ac.uk/intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMEx- and MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriate levels of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium (http://www.imexconsortium.org). PMID:24234451

  8. Systematic analysis of snake neurotoxins' functional classification using a data warehousing approach.

    PubMed

    Siew, Joyce Phui Yee; Khan, Asif M; Tan, Paul T J; Koh, Judice L Y; Seah, Seng Hong; Koo, Chuay Yeng; Chai, Siaw Ching; Armugam, Arunmozhiarasi; Brusic, Vladimir; Jeyaseelan, Kandiah

    2004-12-12

    Sequence annotations, functional and structural data on snake venom neurotoxins (svNTXs) are scattered across multiple databases and literature sources. Sequence annotations and structural data are available in the public molecular databases, while functional data are almost exclusively available in the published articles. There is a need for a specialized svNTXs database that contains NTX entries, which are organized, well annotated and classified in a systematic manner. We have systematically analyzed svNTXs and classified them using structure-function groups based on their structural, functional and phylogenetic properties. Using conserved motifs in each phylogenetic group, we built an intelligent module for the prediction of structural and functional properties of unknown NTXs. We also developed an annotation tool to aid the functional prediction of newly identified NTXs as an additional resource for the venom research community. We created a searchable online database of NTX proteins sequences (http://research.i2r.a-star.edu.sg/Templar/DB/snake_neurotoxin). This database can also be found under Swiss-Prot Toxin Annotation Project website (http://www.expasy.org/sprot/).

  9. TISSUES 2.0: an integrative web resource on mammalian tissue expression

    PubMed Central

    Palasca, Oana; Santos, Alberto; Stolte, Christian; Gorodkin, Jan; Jensen, Lars Juhl

    2018-01-01

    Abstract Physiological and molecular similarities between organisms make it possible to translate findings from simpler experimental systems—model organisms—into more complex ones, such as human. This translation facilitates the understanding of biological processes under normal or disease conditions. Researchers aiming to identify the similarities and differences between organisms at the molecular level need resources collecting multi-organism tissue expression data. We have developed a database of gene–tissue associations in human, mouse, rat and pig by integrating multiple sources of evidence: transcriptomics covering all four species and proteomics (human only), manually curated and mined from the scientific literature. Through a scoring scheme, these associations are made comparable across all sources of evidence and across organisms. Furthermore, the scoring produces a confidence score assigned to each of the associations. The TISSUES database (version 2.0) is publicly accessible through a user-friendly web interface and as part of the STRING app for Cytoscape. In addition, we analyzed the agreement between datasets, across and within organisms, and identified that the agreement is mainly affected by the quality of the datasets rather than by the technologies used or organisms compared. Database URL: http://tissues.jensenlab.org/ PMID:29617745

  10. Mapping the literature of clinical laboratory science.

    PubMed

    Delwiche, Frances A

    2003-07-01

    This paper describes a citation analysis of the literature of clinical laboratory science (medical technology), conducted as part of a project of the Nursing and Allied Health Resources Section of the Medical Library Association. Three source journals widely read by those in the field were identified, from which cited references were collected for a three-year period. Analysis of the references showed that journals were the predominant format of literature cited and the majority of the references were from the last eleven years. Applying Bradford's Law of Scattering to the list of journals cited, three zones were created, each producing approximately one third of the cited references. Thirteen journals were in the first zone, eighty-one in the second, and 849 in the third. A similar list of journals cited was created for four specialty areas in the field: chemistry, hematology, immunohematology, and microbiology. In comparing the indexing coverage of the Zone 1 and 2 journals by four major databases, MEDLINE provided the most comprehensive coverage, while the Cumulative Index to Nursing and Allied Health Literature was the only database that provided complete coverage of the three source journals. However, to obtain complete coverage of the field, it is essential to search multiple databases.

  11. Mapping the literature of clinical laboratory science

    PubMed Central

    Delwiche, Frances A.

    2003-01-01

    This paper describes a citation analysis of the literature of clinical laboratory science (medical technology), conducted as part of a project of the Nursing and Allied Health Resources Section of the Medical Library Association. Three source journals widely read by those in the field were identified, from which cited references were collected for a three-year period. Analysis of the references showed that journals were the predominant format of literature cited and the majority of the references were from the last eleven years. Applying Bradford's Law of Scattering to the list of journals cited, three zones were created, each producing approximately one third of the cited references. Thirteen journals were in the first zone, eighty-one in the second, and 849 in the third. A similar list of journals cited was created for four specialty areas in the field: chemistry, hematology, immunohematology, and microbiology. In comparing the indexing coverage of the Zone 1 and 2 journals by four major databases, MEDLINE provided the most comprehensive coverage, while the Cumulative Index to Nursing and Allied Health Literature was the only database that provided complete coverage of the three source journals. However, to obtain complete coverage of the field, it is essential to search multiple databases. PMID:12883564

  12. The HCUP SID Imputation Project: Improving Statistical Inferences for Health Disparities Research by Imputing Missing Race Data.

    PubMed

    Ma, Yan; Zhang, Wei; Lyman, Stephen; Huang, Yihe

    2018-06-01

    To identify the most appropriate imputation method for missing data in the HCUP State Inpatient Databases (SID) and assess the impact of different missing data methods on racial disparities research. HCUP SID. A novel simulation study compared four imputation methods (random draw, hot deck, joint multiple imputation [MI], conditional MI) for missing values for multiple variables, including race, gender, admission source, median household income, and total charges. The simulation was built on real data from the SID to retain their hierarchical data structures and missing data patterns. Additional predictive information from the U.S. Census and American Hospital Association (AHA) database was incorporated into the imputation. Conditional MI prediction was equivalent or superior to the best performing alternatives for all missing data structures and substantially outperformed each of the alternatives in various scenarios. Conditional MI substantially improved statistical inferences for racial health disparities research with the SID. © Health Research and Educational Trust.

  13. Mapping the literature of transcultural nursing*

    PubMed Central

    Murphy, Sharon C.

    2006-01-01

    Overview: No bibliometric studies of the literature of the field of transcultural nursing have been published. This paper describes a citation analysis as part of the project undertaken by the Nursing and Allied Health Resources Section of the Medical Library Association to map the literature of nursing. Objective: The purpose of this study was to identify the core literature and determine which databases provided the most complete access to the transcultural nursing literature. Methods: Cited references from essential source journals were analyzed for a three-year period. Eight major databases were compared for indexing coverage of the identified core list of journals. Results: This study identifies 138 core journals. Transcultural nursing relies on journal literature from associated health sciences fields in addition to nursing. Books provide an important format. Nearly all cited references were from the previous 18 years. In comparing indexing coverage among 8 major databases, 3 databases rose to the top. Conclusions: No single database can claim comprehensive indexing coverage for this broad field. It is essential to search multiple databases. Based on this study, PubMed/MEDLINE, Social Sciences Citation Index, and CINAHL provide the best coverage. Collections supporting transcultural nursing require robust access to literature beyond nursing publications. PMID:16710461

  14. Efficacy of Transcranial Magnetic Stimulation (TMS) in the Treatment of Schizophrenia: A Review of the Literature to Date.

    PubMed

    Cole, Jonathan C; Green Bernacki, Carolyn; Helmer, Amanda; Pinninti, Narsimha; O'reardon, John P

    2015-01-01

    We reviewed the literature on transcranial magnetic stimulation and its uses and efficacy in schizophrenia. Multiple sources were examined on transcranial magnetic stimulation efficacy in relieving positive and negative symptoms of schizophrenia. Literature review was conducted via Ovid Medline and PubMed databases. We found multiple published studies and metaanalyses that give evidence that repetitive transcranial magnetic stimulation can have benefit in relieving positive and negative symptoms of schizophrenia, particularly auditory hallucinations. These findings should encourage the psychiatric community to expand research into other applications for which transcranial magnetic stimulation may be used to treat patients with psychiatric disability.

  15. VizieR Online Data Catalog: GUViCS. Ultraviolet Source Catalogs (Voyer+, 2014)

    NASA Astrophysics Data System (ADS)

    Voyer, E. N.; Boselli, A.; Boissier, S.; Heinis, S.; Cortese, L.; Ferrarese, L.; Cote, P.; Cuillandre, J.-C.; Gwyn, S. D. J.; Peng, E. W.; Zhang, H.; Liu, C.

    2014-07-01

    These catalogs are based on GALEX NUV and FUV source detections in and behind the Virgo Cluster. The detections are split into catalogs of extended sources and point-like sources. The UV Virgo Cluster Extended Source catalog (UV_VES.fit) provides the deepest and most extensive UV photometric data of extended galaxies in Virgo to date. If certain data is not available for a given source then a null value is entered (e.g. -999, -99). UV point-like sources are matched with SDSS, NGVS, and NED and the relevant photometry and further data from these databases/catalogs are provided in this compilation of catalogs. The primary GUViCS UV Virgo Cluster Point-Like Source catalog is UV_VPS.fit. This catalog provides the most useful GALEX pipeline NUV and FUV photometric parameters, and categorizes sources as stars, Virgo members, and background sources, when possible. It also provides identifiers for optical matches in the SDSS and NED, and indicates if a match exists in the NGVS, only if GUViCS-optical matches are one-to-one. NED spectroscopic redshifts are also listed for GUViCS-NED one-to-one matches. If certain data is not available for a given source a null value is entered. Additionally, the catalog is useful for quick access to optical data on one-to-one GUViCS-SDSS matches.The only parameter available in the catalog for UV sources that have multiple SDSS matches is the total number of multiple matches, i.e. SDSSNUMMTCHS. Multiple GUViCS sources matched to the same SDSS source are also flagged given a total number of matches, SDSSNUMMTCHS, of one. All other fields for multiple matches are set to a null value of -99. In order to obtain full optical SDSS data for multiply matched UV sources in both scenarios, the user can cross-correlate the GUViCS ID of the sources of interest with the full GUViCS-SDSS matched catalog in GUV_SDSS.fit. The GUViCS-SDSS matched catalog, GUV_SDSS.fit, provides the most relevant SDSS data on all GUViCS-SDSS matches, including one-to-one matches and multiply matched sources. The catalog gives full SDSS identification information, complete SDSS photometric measurements in multiple aperture types, and complete redshift information (photometric and spectroscopic). It is ideal for large statistical studies of galaxy populations at multiple wavelengths in the background of the Virgo Cluster. The catalog can also be used as a starting point to study and search for previously unknown UV-bright point-like objects within the Virgo Cluster. If certain data is not available for a given source that field is given a null value. (6 data files).

  16. DOE technology information management system database study report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Widing, M.A.; Blodgett, D.W.; Braun, M.D.

    1994-11-01

    To support the missions of the US Department of Energy (DOE) Special Technologies Program, Argonne National Laboratory is defining the requirements for an automated software system that will search electronic databases on technology. This report examines the work done and results to date. Argonne studied existing commercial and government sources of technology databases in five general areas: on-line services, patent database sources, government sources, aerospace technology sources, and general technology sources. First, it conducted a preliminary investigation of these sources to obtain information on the content, cost, frequency of updates, and other aspects of their databases. The Laboratory then performedmore » detailed examinations of at least one source in each area. On this basis, Argonne recommended which databases should be incorporated in DOE`s Technology Information Management System.« less

  17. PathCase-SB architecture and database design

    PubMed Central

    2011-01-01

    Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889

  18. Lessons Learned from Deploying an Analytical Task Management Database

    NASA Technical Reports Server (NTRS)

    O'Neil, Daniel A.; Welch, Clara; Arceneaux, Joshua; Bulgatz, Dennis; Hunt, Mitch; Young, Stephen

    2007-01-01

    Defining requirements, missions, technologies, and concepts for space exploration involves multiple levels of organizations, teams of people with complementary skills, and analytical models and simulations. Analytical activities range from filling a To-Be-Determined (TBD) in a requirement to creating animations and simulations of exploration missions. In a program as large as returning to the Moon, there are hundreds of simultaneous analysis activities. A way to manage and integrate efforts of this magnitude is to deploy a centralized database that provides the capability to define tasks, identify resources, describe products, schedule deliveries, and generate a variety of reports. This paper describes a web-accessible task management system and explains the lessons learned during the development and deployment of the database. Through the database, managers and team leaders can define tasks, establish review schedules, assign teams, link tasks to specific requirements, identify products, and link the task data records to external repositories that contain the products. Data filters and spreadsheet export utilities provide a powerful capability to create custom reports. Import utilities provide a means to populate the database from previously filled form files. Within a four month period, a small team analyzed requirements, developed a prototype, conducted multiple system demonstrations, and deployed a working system supporting hundreds of users across the aeros pace community. Open-source technologies and agile software development techniques, applied by a skilled team enabled this impressive achievement. Topics in the paper cover the web application technologies, agile software development, an overview of the system's functions and features, dealing with increasing scope, and deploying new versions of the system.

  19. mTM-align: a server for fast protein structure database search and multiple protein structure alignment.

    PubMed

    Dong, Runze; Pan, Shuo; Peng, Zhenling; Zhang, Yang; Yang, Jianyi

    2018-05-21

    With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of two closely related modules: one for structure database search and the other for multiple structure alignment. The database search is speeded up based on a heuristic algorithm and a hierarchical organization of the structures in the database. The multiple structure alignment is performed using the recently developed algorithm mTM-align. Benchmark tests demonstrate that our algorithms outperform other peering methods for both modules, in terms of speed and accuracy. One of the unique features for the server is the interplay between database search and multiple structure alignment. The server provides service not only for performing fast database search, but also for making accurate multiple structure alignment with the structures found by the search. For the database search, it takes about 2-5 min for a structure of a medium size (∼300 residues). For the multiple structure alignment, it takes a few seconds for ∼10 structures of medium sizes. The server is freely available at: http://yanglab.nankai.edu.cn/mTM-align/.

  20. InverPep: A database of invertebrate antimicrobial peptides.

    PubMed

    Gómez, Esteban A; Giraldo, Paula; Orduz, Sergio

    2017-03-01

    The aim of this work was to construct InverPep, a database specialised in experimentally validated antimicrobial peptides (AMPs) from invertebrates. AMP data contained in InverPep were manually curated from other databases and the scientific literature. MySQL was integrated with the development platform Laravel; this framework allows to integrate programming in PHP with HTML and was used to design the InverPep web page's interface. InverPep contains 18 separated fields, including InverPep code, phylum and species source, peptide name, sequence, peptide length, secondary structure, molar mass, charge, isoelectric point, hydrophobicity, Boman index, aliphatic index and percentage of hydrophobic amino acids. CALCAMPI, an algorithm to calculate the physicochemical properties of multiple peptides simultaneously, was programmed in PERL language. To date, InverPep contains 702 experimentally validated AMPs from invertebrate species. All of the peptides contain information associated with their source, physicochemical properties, secondary structure, biological activity and links to external literature. Most AMPs in InverPep have a length between 10 and 50 amino acids, a positive charge, a Boman index between 0 and 2 kcal/mol, and 30-50% hydrophobic amino acids. InverPep includes 33 AMPs not reported in other databases. Besides, CALCAMPI and statistical analysis of InverPep data is presented. The InverPep database is available in English and Spanish. InverPep is a useful database to study invertebrate AMPs and its information could be used for the design of new peptides. The user-friendly interface of InverPep and its information can be freely accessed via a web-based browser at http://ciencias.medellin.unal.edu.co/gruposdeinvestigacion/prospeccionydisenobiomoleculas/InverPep/public/home_en. Copyright © 2016 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.

  1. Middle East and North Africa Database Development and Research to Populate the DOE Knowledge Base

    DTIC Science & Technology

    2000-09-01

    strongly in the Arabian peninsula, except in the shield region. The two deep basins in the northern Arabian peninsula, in the Palmyrides and the Rutbah...eastern Mediterranean) crust from multiple-source Werner deconvolution of Bouguer gravity anomalies, J. Geophy. Res., 104, 25,469-25,478, 1999...discontinuities beneath the Arabian Shield, Geophysical Research Letters, 25, 2,873-2,876, 1998b. Sweeney, J., and B. Walter, Preliminary Definition of

  2. PLEXOS Input Data Generator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    The PLEXOS Input Data Generator (PIDG) is a tool that enables PLEXOS users to better version their data, automate data processing, collaborate in developing inputs, and transfer data between different production cost modeling and other power systems analysis software. PIDG can process data that is in a generalized format from multiple input sources, including CSV files, PostgreSQL databases, and PSS/E .raw files and write it to an Excel file that can be imported into PLEXOS with only limited manual intervention.

  3. Opportunities and Challenges in Supply-Side Simulation: Physician-Based Models

    PubMed Central

    Gresenz, Carole Roan; Auerbach, David I; Duarte, Fabian

    2013-01-01

    Objective To provide a conceptual framework and to assess the availability of empirical data for supply-side microsimulation modeling in the context of health care. Data Sources Multiple secondary data sources, including the American Community Survey, Health Tracking Physician Survey, and SK&A physician database. Study Design We apply our conceptual framework to one entity in the health care market—physicians—and identify, assess, and compare data available for physician-based simulation models. Principal Findings Our conceptual framework describes three broad types of data required for supply-side microsimulation modeling. Our assessment of available data for modeling physician behavior suggests broad comparability across various sources on several dimensions and highlights the need for significant integration of data across multiple sources to provide a platform adequate for modeling. A growing literature provides potential estimates for use as behavioral parameters that could serve as the models' engines. Sources of data for simulation modeling that account for the complex organizational and financial relationships among physicians and other supply-side entities are limited. Conclusions A key challenge for supply-side microsimulation modeling is optimally combining available data to harness their collective power. Several possibilities also exist for novel data collection. These have the potential to serve as catalysts for the next generation of supply-side-focused simulation models to inform health policy. PMID:23347041

  4. GERIREX - growing a second generation medical expert system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kocur, J. Jr.; Suh, S.C.

    This article describes GERIREX, a medical expert system as the core module of an integrated system for total management of a medical practice. GERIREX is currently a first-generation consultant in the domain of prescribing for the geriatric patient with multiple ailments. Employing rule and objective probabilistic knowledge representations, the system performs at the near-expert level, correctly ranking single and multiple drug therapy for hypertension and/or congestive heart failure in the presence of between two and seven of 18 common accompanying or underlying conditions. GERIREX creates permanent consultation records and can access patient information from existing databases. System requirements are metmore » by very modest PCs, yet power, speed, flexibility, and ease of use rival or exceed those of many other systems. GERIREX interfaces with a variety of configurations and applications, including text, spreadsheets, databases, and executables, to fit in with current plans to upgrade to a second generation system, providing a degree of self-maintenance through intelligent parsing of a drug data source such as the Physicians` Desk Reference (PDR - CDROM version). Another option under consideration is developing neural networks to both replace the current knowledge base, and to embody the rationale employed by the medical expert in evaluating drug data for treatment selection. In this version, the current drug database would be used as warning data for the network tasked with adding new drugs to the drug database, imitating the process whereby a physician determines their personal arsenal from among the wide range of available options.« less

  5. Mobile Source Observation Database (MSOD)

    EPA Pesticide Factsheets

    The Mobile Source Observation Database (MSOD) is a relational database developed by the Assessment and Standards Division (ASD) of the U.S. EPA Office of Transportation and Air Quality (formerly the Office of Mobile Sources).

  6. cPath: open source software for collecting, storing, and querying biological pathways.

    PubMed

    Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

    2006-11-13

    Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling.

  7. Implementation of an open adoption research data management system for clinical studies.

    PubMed

    Müller, Jan; Heiss, Kirsten Ingmar; Oberhoffer, Renate

    2017-07-06

    Research institutions need to manage multiple studies with individual data sets, processing rules and different permissions. So far, there is no standard technology that provides an easy to use environment to create databases and user interfaces for clinical trials or research studies. Therefore various software solutions are being used-from custom software, explicitly designed for a specific study, to cost intensive commercial Clinical Trial Management Systems (CTMS) up to very basic approaches with self-designed Microsoft ® databases. The technology applied to conduct those studies varies tremendously from study to study, making it difficult to evaluate data across various studies (meta-analysis) and keeping a defined level of quality in database design, data processing, displaying and exporting. Furthermore, the systems being used to collect study data are often operated redundantly to systems used in patient care. As a consequence the data collection in studies is inefficient and data quality may suffer from unsynchronized datasets, non-normalized database scenarios and manually executed data transfers. With OpenCampus Research we implemented an open adoption software (OAS) solution on an open source basis, which provides a standard environment for state-of-the-art research database management at low cost.

  8. Online Databases in Physics.

    ERIC Educational Resources Information Center

    Sievert, MaryEllen C.; Verbeck, Alison F.

    1984-01-01

    This overview of 47 online sources for physics information available in the United States--including sub-field databases, transdisciplinary databases, and multidisciplinary databases-- notes content, print source, language, time coverage, and databank. Two discipline-specific databases (SPIN and PHYSICS BRIEFS) are also discussed. (EJS)

  9. Multiple Representations-Based Face Sketch-Photo Synthesis.

    PubMed

    Peng, Chunlei; Gao, Xinbo; Wang, Nannan; Tao, Dacheng; Li, Xuelong; Li, Jie

    2016-11-01

    Face sketch-photo synthesis plays an important role in law enforcement and digital entertainment. Most of the existing methods only use pixel intensities as the feature. Since face images can be described using features from multiple aspects, this paper presents a novel multiple representations-based face sketch-photo-synthesis method that adaptively combines multiple representations to represent an image patch. In particular, it combines multiple features from face images processed using multiple filters and deploys Markov networks to exploit the interacting relationships between the neighboring image patches. The proposed framework could be solved using an alternating optimization strategy and it normally converges in only five outer iterations in the experiments. Our experimental results on the Chinese University of Hong Kong (CUHK) face sketch database, celebrity photos, CUHK Face Sketch FERET Database, IIIT-D Viewed Sketch Database, and forensic sketches demonstrate the effectiveness of our method for face sketch-photo synthesis. In addition, cross-database and database-dependent style-synthesis evaluations demonstrate the generalizability of this novel method and suggest promising solutions for face identification in forensic science.

  10. Towards a Global Service Registry for the World-Wide LHC Computing Grid

    NASA Astrophysics Data System (ADS)

    Field, Laurence; Alandes Pradillo, Maria; Di Girolamo, Alessandro

    2014-06-01

    The World-Wide LHC Computing Grid encompasses a set of heterogeneous information systems; from central portals such as the Open Science Grid's Information Management System and the Grid Operations Centre Database, to the WLCG information system, where the information sources are the Grid services themselves. Providing a consistent view of the information, which involves synchronising all these informations systems, is a challenging activity that has lead the LHC virtual organisations to create their own configuration databases. This experience, whereby each virtual organisation's configuration database interfaces with multiple information systems, has resulted in the duplication of effort, especially relating to the use of manual checks for the handling of inconsistencies. The Global Service Registry aims to address this issue by providing a centralised service that aggregates information from multiple information systems. It shows both information on registered resources (i.e. what should be there) and available resources (i.e. what is there). The main purpose is to simplify the synchronisation of the virtual organisation's own configuration databases, which are used for job submission and data management, through the provision of a single interface for obtaining all the information. By centralising the information, automated consistency and validation checks can be performed to improve the overall quality of information provided. Although internally the GLUE 2.0 information model is used for the purpose of integration, the Global Service Registry in not dependent on any particular information model for ingestion or dissemination. The intention is to allow the virtual organisation's configuration databases to be decoupled from the underlying information systems in a transparent way and hence simplify any possible future migration due to the evolution of those systems. This paper presents the Global Service Registry architecture, its advantages compared to the current situation and how it can support the evolution of information systems.

  11. Sequence History Update Tool

    NASA Technical Reports Server (NTRS)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  12. The eNanoMapper database for nanomaterial safety information

    PubMed Central

    Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon

    2015-01-01

    Summary Background: The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. Results: The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. Conclusion: We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the “representational state transfer” (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure–activity relationships for nanomaterials (NanoQSAR). PMID:26425413

  13. The eNanoMapper database for nanomaterial safety information.

    PubMed

    Jeliazkova, Nina; Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon

    2015-01-01

    The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the "representational state transfer" (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure-activity relationships for nanomaterials (NanoQSAR).

  14. The role of source memory in older adults' recollective experience.

    PubMed

    Boywitt, C Dennis; Kuhlmann, Beatrice G; Meiser, Thorsten

    2012-06-01

    Younger adults' "remember" judgments are accompanied by better memory for the source of an item than "know" judgments. Furthermore, remember judgments are not merely associated with better memory for individual source features but also with bound memory for multiple source features. However, older adults, independent of their subjective memory experience, are generally less likely to "bind" source features to an item and to each other in memory (i.e., the associative deficit). In two experiments, we tested whether memory for perceptual source features, independently or bound, is also the basis for older adults' remember responses or if their associative deficit leads them to base their responses on other types of information. The results suggest that retrieval of perceptual source features, individually or bound, forms the basis for younger but not for older adults' remember judgments even when the overall level of memory for perceptual sources is closely equated (Experiment 1) and when attention is explicitly directed to the source information at encoding (Experiment 2). PsycINFO Database Record (c) 2012 APA, all rights reserved

  15. Identifying known unknowns using the US EPA's CompTox Chemistry Dashboard.

    PubMed

    McEachran, Andrew D; Sobus, Jon R; Williams, Antony J

    2017-03-01

    Chemical features observed using high-resolution mass spectrometry can be tentatively identified using online chemical reference databases by searching molecular formulae and monoisotopic masses and then rank-ordering of the hits using appropriate relevance criteria. The most likely candidate "known unknowns," which are those chemicals unknown to an investigator but contained within a reference database or literature source, rise to the top of a chemical list when rank-ordered by the number of associated data sources. The U.S. EPA's CompTox Chemistry Dashboard is a curated and freely available resource for chemistry and computational toxicology research, containing more than 720,000 chemicals of relevance to environmental health science. In this research, the performance of the Dashboard for identifying known unknowns was evaluated against that of the online ChemSpider database, one of the primary resources used by mass spectrometrists, using multiple previously studied datasets reported in the peer-reviewed literature totaling 162 chemicals. These chemicals were examined using both applications via molecular formula and monoisotopic mass searches followed by rank-ordering of candidate compounds by associated references or data sources. A greater percentage of chemicals ranked in the top position when using the Dashboard, indicating an advantage of this application over ChemSpider for identifying known unknowns using data source ranking. Additional approaches are being developed for inclusion into a non-targeted analysis workflow as part of the CompTox Chemistry Dashboard. This work shows the potential for use of the Dashboard in exposure assessment and risk decision-making through significant improvements in non-targeted chemical identification. Graphical abstract Identifying known unknowns in the US EPA's CompTox Chemistry Dashboard from molecular formula and monoisotopic mass inputs.

  16. Lean Middleware

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; Bell, David g.; Ashish, Naveen

    2005-01-01

    This paper describes an approach to achieving data integration across multiple sources in an enterprise, in a manner that is cost efficient and economically scalable. We present an approach that does not rely on major investment in structured, heavy-weight database systems for data storage or heavy-weight middleware responsible for integrated access. The approach is centered around pushing any required data structure and semantics functionality (schema) to application clients, as well as pushing integration specification and functionality to clients where integration can be performed on-the-fly .

  17. Competitive-Cooperative Automated Reasoning from Distributed and Multiple Source of Data

    NASA Astrophysics Data System (ADS)

    Fard, Amin Milani

    Knowledge extraction from distributed database systems, have been investigated during past decade in order to analyze billions of information records. In this work a competitive deduction approach in a heterogeneous data grid environment is proposed using classic data mining and statistical methods. By applying a game theory concept in a multi-agent model, we tried to design a policy for hierarchical knowledge discovery and inference fusion. To show the system run, a sample multi-expert system has also been developed.

  18. An Integrated Nursing Management Information System: From Concept to Reality

    PubMed Central

    Pinkley, Connie L.; Sommer, Patricia K.

    1988-01-01

    This paper addresses the transition from the conceptualization of a Nursing Management Information System (NMIS) integrated and interdependent with the Hospital Information System (HIS) to its realization. Concepts of input, throughout, and output are presented to illustrate developmental strategies used to achieve nursing information products. Essential processing capabilities include: 1) ability to interact with multiple data sources; 2) database management, statistical, and graphics software packages; 3) online, batch and reporting; and 4) interactive data analysis. Challenges encountered in system construction are examined.

  19. Doppler Radar Profiler for Launch Winds at the Kennedy Space Center (Phase 1a)

    NASA Technical Reports Server (NTRS)

    Murri, Daniel G.

    2011-01-01

    The NASA Engineering and Safety Center (NESC) received a request from the, NASA Technical Fellow for Flight Mechanics at Langley Research Center (LaRC), to develop a database from multiple Doppler radar wind profiler (DRWP) sources and develop data processing algorithms to construct high temporal resolution DRWP wind profiles for day-of-launch (DOL) vehicle assessment. This document contains the outcome of Phase 1a of the assessment including Findings, Observations, NESC Recommendations, and Lessons Learned.

  20. Combining data from multiple sources using the CUAHSI Hydrologic Information System

    NASA Astrophysics Data System (ADS)

    Tarboton, D. G.; Ames, D. P.; Horsburgh, J. S.; Goodall, J. L.

    2012-12-01

    The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) has developed a Hydrologic Information System (HIS) to provide better access to data by enabling the publication, cataloging, discovery, retrieval, and analysis of hydrologic data using web services. The CUAHSI HIS is an Internet based system comprised of hydrologic databases and servers connected through web services as well as software for data publication, discovery and access. The HIS metadata catalog lists close to 100 web services registered to provide data through this system, ranging from large federal agency data sets to experimental watersheds managed by University investigators. The system's flexibility in storing and enabling public access to similarly formatted data and metadata has created a community data resource from governmental and academic data that might otherwise remain private or analyzed only in isolation. Comprehensive understanding of hydrology requires integration of this information from multiple sources. HydroDesktop is the client application developed as part of HIS to support data discovery and access through this system. HydroDesktop is founded on an open source GIS client and has a plug-in architecture that has enabled the integration of modeling and analysis capability with the functionality for data discovery and access. Model integration is possible through a plug-in built on the OpenMI standard and data visualization and analysis is supported by an R plug-in. This presentation will demonstrate HydroDesktop, showing how it provides an analysis environment within which data from multiple sources can be discovered, accessed and integrated.

  1. Biopython: freely available Python tools for computational molecular biology and bioinformatics

    PubMed Central

    Cock, Peter J. A.; Antao, Tiago; Chang, Jeffrey T.; Chapman, Brad A.; Cox, Cymon J.; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J. L.

    2009-01-01

    Summary: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license. Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/_Mailing_listspeter.cock@scri.ac.uk. PMID:19304878

  2. Big data and high-performance analytics in structural health monitoring for bridge management

    NASA Astrophysics Data System (ADS)

    Alampalli, Sharada; Alampalli, Sandeep; Ettouney, Mohammed

    2016-04-01

    Structural Health Monitoring (SHM) can be a vital tool for effective bridge management. Combining large data sets from multiple sources to create a data-driven decision-making framework is crucial for the success of SHM. This paper presents a big data analytics framework that combines multiple data sets correlated with functional relatedness to convert data into actionable information that empowers risk-based decision-making. The integrated data environment incorporates near real-time streams of semi-structured data from remote sensors, historical visual inspection data, and observations from structural analysis models to monitor, assess, and manage risks associated with the aging bridge inventories. Accelerated processing of dataset is made possible by four technologies: cloud computing, relational database processing, support from NOSQL database, and in-memory analytics. The framework is being validated on a railroad corridor that can be subjected to multiple hazards. The framework enables to compute reliability indices for critical bridge components and individual bridge spans. In addition, framework includes a risk-based decision-making process that enumerate costs and consequences of poor bridge performance at span- and network-levels when rail networks are exposed to natural hazard events such as floods and earthquakes. Big data and high-performance analytics enable insights to assist bridge owners to address problems faster.

  3. Mobile Source Observation Database (MSOD)

    EPA Pesticide Factsheets

    The Mobile Source Observation Database (MSOD) is a relational database being developed by the Assessment and Standards Division (ASD) of the US Environmental Protection Agency Office of Transportation and Air Quality (formerly the Office of Mobile Sources). The MSOD contains emission test data from in-use mobile air- pollution sources such as cars, trucks, and engines from trucks and nonroad vehicles. Data in the database was collected from 1982 to the present. The data is intended to be representative of in-use vehicle emissions in the United States.

  4. The Saccharomyces Genome Database Variant Viewer

    PubMed Central

    Sheppard, Travis K.; Hitz, Benjamin C.; Engel, Stacia R.; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla S.; Demeter, Janos; Hellerstedt, Sage T.; Karra, Kalpana; Nash, Robert S.; Paskov, Kelley M.; Skrzypek, Marek S.; Weng, Shuai; Wong, Edith D.; Cherry, J. Michael

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. PMID:26578556

  5. The challenge of sustainability in healthcare systems: Frequency and cost of inappropriate patterns of breast cancer care (the E.Pic.A study).

    PubMed

    Massa, Ilaria; Balzi, William; Burattini, Costanza; Gentili, Nicola; Bucchi, Lauro; Nanni, Oriana; Gallegati, Davide; Pierini, Andrea; Amadori, Dino; Falcini, Fabio; Altini, Mattia

    2017-08-01

    In a context of decreasing economic health resources and a rise in health needs, it is urgent to face this sustainability crisis through the analysis of healthcare expenditures. Wastages, deriving from inappropriate interventions, erode resources which could be reallocated to high-value activities. To identify these areas of wastages, we developed a method for combining and analyzing data from multiple sources. Here we report the preliminary results of a retrospective cohort study evaluating the performance of breast cancer (BC) care at IRST, an Italian cancer institute. Four data sources gathered in a real-world setting (a clinical database, two administrative databases and a cancer registry) were linked. Essential Key Performance Indexes (KPIs) in the pattern of BC diagnosis (KPI 1 and 2) and treatment (KPI 3 and 4) based on current guidelines were developed by a board of professionals. The costs of inappropriate examinations were associated with the diagnostic KPIs. We found that 2798 patients treated at IRST from January 2010 to June 2016 received a total of 2516 inappropriate examinations accounting for € 573,510.80. Linkage from multiple routine healthcare data sources is feasible: it allows the measurement of important KPIs specifically designed for BC care, and the identification of areas of low-value use of the resources. If systematically applied, this method could help provide a complete picture of inappropriateness and waste, redirect these resources to higher-value interventions for patients, and fill the gap between proper use of the resources and the best clinical results. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  6. SIDD: A Semantically Integrated Database towards a Global View of Human Disease

    PubMed Central

    Cheng, Liang; Wang, Guohua; Li, Jie; Zhang, Tianjiao; Xu, Peigang; Wang, Yadong

    2013-01-01

    Background A number of databases have been developed to collect disease-related molecular, phenotypic and environmental features (DR-MPEs), such as genes, non-coding RNAs, genetic variations, drugs, phenotypes and environmental factors. However, each of current databases focused on only one or two DR-MPEs. There is an urgent demand to develop an integrated database, which can establish semantic associations among disease-related databases and link them to provide a global view of human disease at the biological level. This database, once developed, will facilitate researchers to query various DR-MPEs through disease, and investigate disease mechanisms from different types of data. Methodology To establish an integrated disease-associated database, disease vocabularies used in different databases are mapped to Disease Ontology (DO) through semantic match. 4,284 and 4,186 disease terms from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM) respectively are mapped to DO. Then, the relationships between DR-MPEs and diseases are extracted and merged from different source databases for reducing the data redundancy. Conclusions A semantically integrated disease-associated database (SIDD) is developed, which integrates 18 disease-associated databases, for researchers to browse multiple types of DR-MPEs in a view. A web interface allows easy navigation for querying information through browsing a disease ontology tree or searching a disease term. Furthermore, a network visualization tool using Cytoscape Web plugin has been implemented in SIDD. It enhances the SIDD usage when viewing the relationships between diseases and DR-MPEs. The current version of SIDD (Jul 2013) documents 4,465,131 entries relating to 139,365 DR-MPEs, and to 3,824 human diseases. The database can be freely accessed from: http://mlg.hit.edu.cn/SIDD. PMID:24146757

  7. Synchronising data sources and filling gaps by global hydrological modelling

    NASA Astrophysics Data System (ADS)

    Pimentel, Rafael; Crochemore, Louise; Hasan, Abdulghani; Pineda, Luis; Isberg, Kristina; Arheimer, Berit

    2017-04-01

    The advances in remote sensing in the last decades combined with the creation of different open hydrological databases have generated a very large amount of useful information for global hydrological modelling. Working with this huge number of datasets to set up a global hydrological model can constitute challenges such as multiple data formats and big heterogeneity on spatial and temporal resolutions. Different initiatives have made effort to homogenize some of these data sources, i.e. GRDC (Global Runoff Data Center), HYDROSHEDS (SHuttle Elevation Derivatives at multiple Scales), GLWD (Global Lake and Wetland Database) for runoff, watershed delineation and water bodies respectively. However, not all the related issues are covered or homogenously solved at the global scale and new information is continuously available to complete the current ones. This work presents synchronising efforts to make use of different global data sources needed to set up the semi-distributed hydrological model HYPE (Hydrological Predictions for the Environment) at the global scale. These data sources included: topography for watershed delineation, gauging stations of river flow, and extention of lakes, flood plains and land cover classes. A new database with approximately 100 000 subbasins, with an average area of 1000 km2, was created. Subbasin delineation was done combining Global Width Database for Large River (GWD-LR), SRTM high-resolution elevation data and a number of forced points of interest (gauging station of river flow, lakes, reservoirs, urban areas, nuclear plants and areas with high risk of flooding). Regarding flow data, the locations of GRDC stations were checked or placed along the river network when necessary, and completed with available information from national water services in data-sparse regions. A screening of doublet stations and associated time series was necessary to efficiently combine the two types of data sources. A total number about 21 000 stations were considered as forced point. In the case of lakes, some updating relating with location and area, of GLWD was done using esa (European Space Agency) gridded water bodies dataset. Many of the original lakes were shifted in relation with topography and some of them change their extension since the creation of the database. Moreover, the location of the outlet of all these lakes was also calculated. A new definition of global floodplain areas was also included. The land covers provided by ESA and some elevation criteria were used to define elevation land classes (ELC) using for the definition of the properties of each one of the proposed subbasin. All these new features: a) the inclusion of river width in the delineation of the subbasin, going further in the consideration of river shape; b) the merging of several data bases of gauging stations of river flow into an extended global dataset; c) coherent location of the lakes, river networks and floodplains; and d) a new definition of hydrological response units also considering elevation of the subbasins, will contribute to a better implementation of global hydrological models. The first results of world-wide HYPE will be shown but the model will yet not be fully calibrated using multi-sources of observed data and information. The ambition is to receive a global scale model which can also be useful at local scales. Starting with the global picture and then going into the details.

  8. Freshwater Biological Traits Database (Data Sources)

    EPA Science Inventory

    When EPA release the final report, Freshwater Biological Traits Database, it referenced numerous data sources that are included below. The Traits Database report covers the development of a database of freshwater biological traits with additional traits that are relevan...

  9. Linking Multiple Databases: Term Project Using "Sentences" DBMS.

    ERIC Educational Resources Information Center

    King, Ronald S.; Rainwater, Stephen B.

    This paper describes a methodology for use in teaching an introductory Database Management System (DBMS) course. Students master basic database concepts through the use of a multiple component project implemented in both relational and associative data models. The associative data model is a new approach for designing multi-user, Web-enabled…

  10. Comet: an open-source MS/MS sequence database search tool.

    PubMed

    Eng, Jimmy K; Jahan, Tahmina A; Hoopmann, Michael R

    2013-01-01

    Proteomics research routinely involves identifying peptides and proteins via MS/MS sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of commercial and open-source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. TRAM (Transcriptome Mapper): database-driven creation and analysis of transcriptome maps from multiple sources

    PubMed Central

    2011-01-01

    Background Several tools have been developed to perform global gene expression profile data analysis, to search for specific chromosomal regions whose features meet defined criteria as well as to study neighbouring gene expression. However, most of these tools are tailored for a specific use in a particular context (e.g. they are species-specific, or limited to a particular data format) and they typically accept only gene lists as input. Results TRAM (Transcriptome Mapper) is a new general tool that allows the simple generation and analysis of quantitative transcriptome maps, starting from any source listing gene expression values for a given gene set (e.g. expression microarrays), implemented as a relational database. It includes a parser able to assign univocal and updated gene symbols to gene identifiers from different data sources. Moreover, TRAM is able to perform intra-sample and inter-sample data normalization, including an original variant of quantile normalization (scaled quantile), useful to normalize data from platforms with highly different numbers of investigated genes. When in 'Map' mode, the software generates a quantitative representation of the transcriptome of a sample (or of a pool of samples) and identifies if segments of defined lengths are over/under-expressed compared to the desired threshold. When in 'Cluster' mode, the software searches for a set of over/under-expressed consecutive genes. Statistical significance for all results is calculated with respect to genes localized on the same chromosome or to all genome genes. Transcriptome maps, showing differential expression between two sample groups, relative to two different biological conditions, may be easily generated. We present the results of a biological model test, based on a meta-analysis comparison between a sample pool of human CD34+ hematopoietic progenitor cells and a sample pool of megakaryocytic cells. Biologically relevant chromosomal segments and gene clusters with differential expression during the differentiation toward megakaryocyte were identified. Conclusions TRAM is designed to create, and statistically analyze, quantitative transcriptome maps, based on gene expression data from multiple sources. The release includes FileMaker Pro database management runtime application and it is freely available at http://apollo11.isto.unibo.it/software/, along with preconfigured implementations for mapping of human, mouse and zebrafish transcriptomes. PMID:21333005

  12. Integrating personalized medical test contents with XML and XSL-FO.

    PubMed

    Toddenroth, Dennis; Dugas, Martin; Frankewitsch, Thomas

    2011-03-01

    In 2004 the adoption of a modular curriculum at the medical faculty in Muenster led to the introduction of centralized examinations based on multiple-choice questions (MCQs). We report on how organizational challenges of realizing faculty-wide personalized tests were addressed by implementation of a specialized software module to automatically generate test sheets from individual test registrations and MCQ contents. Key steps of the presented method for preparing personalized test sheets are (1) the compilation of relevant item contents and graphical media from a relational database with database queries, (2) the creation of Extensible Markup Language (XML) intermediates, and (3) the transformation into paginated documents. The software module by use of an open source print formatter consistently produced high-quality test sheets, while the blending of vectorized textual contents and pixel graphics resulted in efficient output file sizes. Concomitantly the module permitted an individual randomization of item sequences to prevent illicit collusion. The automatic generation of personalized MCQ test sheets is feasible using freely available open source software libraries, and can be efficiently deployed on a faculty-wide scale.

  13. Comparing top-down and bottom-up estimates of methane emissions across multiple U.S. oil and gas basins provides insights into national O&G emissions, mitigation strategies, and research priorities

    NASA Astrophysics Data System (ADS)

    Lyon, D. R.; Alvarez, R.; Zavala Araiza, D.; Hamburg, S.

    2017-12-01

    We develop a county-level inventory of U.S. anthropogenic methane emissions by integrating multiple data sources including the Drillinginfo oil and gas (O&G) production database, Environmental Protection Agency (EPA) Greenhouse Gas Reporting Program, a previously published gridded EPA Greenhouse Gas Inventory (Maasakkers et al 2016), and recent measurements studies of O&G pneumatic devices, equipment leaks, abandoned wells, and midstream facilities. Our bottom-up estimates of total and O&G methane emissions are consistently lower than top-down, aerial mass balance estimates in ten O&G production areas. We evaluate several hypotheses for the top-down/bottom-up discrepancy including potential bias of the aerial mass balance method, temporal mismatch of top-down and bottom-up emission estimates, and source attribution errors. In most basins, the top-down/bottom-up gap cannot be explained fully without additional O&G emissions from sources not included in traditional inventories, such as super-emitters caused by malfunctions or abnormal process conditions. Top-down/bottom-up differences across multiple basins are analyzed to estimate the magnitude of these additional emissions and constrain total methane emissions from the U.S. O&G supply chain. We discuss the implications for mitigating O&G methane emissions and suggest research priorities for increasing the accuracy of future emission inventories.

  14. Interbreeding among deeply divergent mitochondrial lineages in the American cockroach (Periplaneta americana)

    NASA Astrophysics Data System (ADS)

    von Beeren, Christoph; Stoeckle, Mark Y.; Xia, Joyce; Burke, Griffin; Kronauer, Daniel J. C.

    2015-02-01

    DNA barcoding promises to be a useful tool to identify pest species assuming adequate representation of genetic variants in a reference library. Here we examined mitochondrial DNA barcodes in a global urban pest, the American cockroach (Periplaneta americana). Our sampling effort generated 284 cockroach specimens, most from New York City, plus 15 additional U.S. states and six other countries, enabling the first large-scale survey of P. americana barcode variation. Periplaneta americana barcode sequences (n = 247, including 24 GenBank records) formed a monophyletic lineage separate from other Periplaneta species. We found three distinct P. americana haplogroups with relatively small differences within (<=0.6%) and larger differences among groups (2.4%-4.7%). This could be interpreted as indicative of multiple cryptic species. However, nuclear DNA sequences (n = 77 specimens) revealed extensive gene flow among mitochondrial haplogroups, confirming a single species. This unusual genetic pattern likely reflects multiple introductions from genetically divergent source populations, followed by interbreeding in the invasive range. Our findings highlight the need for comprehensive reference databases in DNA barcoding studies, especially when dealing with invasive populations that might be derived from multiple genetically distinct source populations.

  15. Adaptive Portfolio Optimization for Multiple Electricity Markets Participation.

    PubMed

    Pinto, Tiago; Morais, Hugo; Sousa, Tiago M; Sousa, Tiago; Vale, Zita; Praca, Isabel; Faia, Ricardo; Pires, Eduardo Jose Solteiro

    2016-08-01

    The increase of distributed energy resources, mainly based on renewable sources, requires new solutions that are able to deal with this type of resources' particular characteristics (namely, the renewable energy sources intermittent nature). The smart grid concept is increasing its consensus as the most suitable solution to facilitate the small players' participation in electric power negotiations while improving energy efficiency. The opportunity for players' participation in multiple energy negotiation environments (smart grid negotiation in addition to the already implemented market types, such as day-ahead spot markets, balancing markets, intraday negotiations, bilateral contracts, forward and futures negotiations, and among other) requires players to take suitable decisions on whether to, and how to participate in each market type. This paper proposes a portfolio optimization methodology, which provides the best investment profile for a market player, considering different market opportunities. The amount of power that each supported player should negotiate in each available market type in order to maximize its profits, considers the prices that are expected to be achieved in each market, in different contexts. The price forecasts are performed using artificial neural networks, providing a specific database with the expected prices in the different market types, at each time. This database is then used as input by an evolutionary particle swarm optimization process, which originates the most advantage participation portfolio for the market player. The proposed approach is tested and validated with simulations performed in multiagent simulator of competitive electricity markets, using real electricity markets data from the Iberian operator-MIBEL.

  16. An Autonomic Framework for Integrating Security and Quality of Service Support in Databases

    ERIC Educational Resources Information Center

    Alomari, Firas

    2013-01-01

    The back-end databases of multi-tiered applications are a major data security concern for enterprises. The abundance of these systems and the emergence of new and different threats require multiple and overlapping security mechanisms. Therefore, providing multiple and diverse database intrusion detection and prevention systems (IDPS) is a critical…

  17. cPath: open source software for collecting, storing, and querying biological pathways

    PubMed Central

    Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

    2006-01-01

    Background Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. Results We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. Conclusion cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling. PMID:17101041

  18. Palmprint and face score level fusion: hardware implementation of a contactless small sample biometric system

    NASA Astrophysics Data System (ADS)

    Poinsot, Audrey; Yang, Fan; Brost, Vincent

    2011-02-01

    Including multiple sources of information in personal identity recognition and verification gives the opportunity to greatly improve performance. We propose a contactless biometric system that combines two modalities: palmprint and face. Hardware implementations are proposed on the Texas Instrument Digital Signal Processor and Xilinx Field-Programmable Gate Array (FPGA) platforms. The algorithmic chain consists of a preprocessing (which includes palm extraction from hand images), Gabor feature extraction, comparison by Hamming distance, and score fusion. Fusion possibilities are discussed and tested first using a bimodal database of 130 subjects that we designed (uB database), and then two common public biometric databases (AR for face and PolyU for palmprint). High performance has been obtained for recognition and verification purpose: a recognition rate of 97.49% with AR-PolyU database and an equal error rate of 1.10% on the uB database using only two training samples per subject have been obtained. Hardware results demonstrate that preprocessing can easily be performed during the acquisition phase, and multimodal biometric recognition can be treated almost instantly (0.4 ms on FPGA). We show the feasibility of a robust and efficient multimodal hardware biometric system that offers several advantages, such as user-friendliness and flexibility.

  19. Geroprotectors.org: a new, structured and curated database of current therapeutic interventions in aging and age-related disease.

    PubMed

    Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex

    2015-09-01

    As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions.

  20. Geroprotectors.org: a new, structured and curated database of current therapeutic interventions in aging and age-related disease

    PubMed Central

    Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E.; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C.; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex

    2015-01-01

    As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions. PMID:26342919

  1. Ethnic disparities among food sources of energy and nutrients of public health concern and nutrients to limit in adults in the United States: NHANES 2003-2006.

    PubMed

    O'Neil, Carol E; Nicklas, Theresa A; Keast, Debra R; Fulgoni, Victor L

    2014-01-01

    Identification of current food sources of energy and nutrients among US non-Hispanic whites (NHW), non-Hispanic blacks (NHB), and Mexican American (MA) adults is needed to help with public health efforts in implementing culturally sensitive and feasible dietary recommendations. The objective of this study was to determine the food sources of energy and nutrients to limit [saturated fatty acids (SFA), added sugars, and sodium] and nutrients of public health concern (dietary fiber, vitamin D, calcium, and potassium) by NHW, NHB, and MA adults. This was a cross-sectional analysis of a nationally representative sample of NWH (n=4,811), NHB (2,062), and MA (n=1,950) adults 19+ years. The 2003-2006 NHANES 24-h recall (Day 1) dietary intake data were analyzed. An updated USDA Dietary Source Nutrient Database was developed using current food composition databases. Food grouping included ingredients from disaggregated mixtures. Mean energy and nutrient intakes from food sources were sample-weighted. Percentages of total dietary intake contributed from food sources were ranked. Multiple differences in intake among ethnic groups were seen for energy and all nutrients examined. For example, energy intake was higher in MA as compared to NHB; SFA, added sugars, and sodium intakes were higher in NHW than NHB; dietary fiber was highest in MA and lowest in NHB; vitamin D was highest in NHW; calcium was lowest in NHB; and potassium was higher in NHW as compared to NHB. Food sources of these nutrients also varied. Identification of intake of nutrients to limit and of public health concern can help health professionals implement appropriate dietary recommendations and plan interventions that are ethnically appropriate.

  2. MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB

    PubMed Central

    Cockfield, Jeremy; Su, Kyungmin; Robbins, Kay A.

    2013-01-01

    Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED are maintained at http://vislab.github.com/MobbedMatlab/ PMID:24124417

  3. MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB.

    PubMed

    Cockfield, Jeremy; Su, Kyungmin; Robbins, Kay A

    2013-01-01

    Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED are maintained at http://vislab.github.com/MobbedMatlab/

  4. Construction and validation of a population-based bone densitometry database.

    PubMed

    Leslie, William D; Caetano, Patricia A; Macwilliam, Leonard R; Finlayson, Gregory S

    2005-01-01

    Utilization of dual-energy X-ray absorptiometry (DXA) for the initial diagnostic assessment of osteoporosis and in monitoring treatment has risen dramatically in recent years. Population-based studies of the impact of DXA and osteoporosis remain challenging because of incomplete and fragmented test data that exist in most regions. Our aim was to create and assess completeness of a database of all clinical DXA services and test results for the province of Manitoba, Canada and to present descriptive data resulting from testing. A regionally based bone density program for the province of Manitoba, Canada was established in 1997. Subsequent DXA services were prospectively captured in a program database. This database was retrospectively populated with earlier DXA results dating back to 1990 (the year that the first DXA scanner was installed) by integrating multiple data sources. A random chart audit was performed to assess completeness and accuracy of this dataset. For comparison, testing rates determined from the DXA database were compared with physician administrative claims data. There was a high level of completeness of this database (>99%) and accurate personal identifier information sufficient for linkage with other health care administrative data (>99%). This contrasted with physician billing data that were found to be markedly incomplete. Descriptive data provide a profile of individuals receiving DXA and their test results. In conclusion, the Manitoba bone density database has great potential as a resource for clinical and health policy research because it is population based with a high level of completeness and accuracy.

  5. Clinical results of HIS, RIS, PACS integration using data integration CASE tools

    NASA Astrophysics Data System (ADS)

    Taira, Ricky K.; Chan, Hing-Ming; Breant, Claudine M.; Huang, Lu J.; Valentino, Daniel J.

    1995-05-01

    Current infrastructure research in PACS is dominated by the development of communication networks (local area networks, teleradiology, ATM networks, etc.), multimedia display workstations, and hierarchical image storage architectures. However, limited work has been performed on developing flexible, expansible, and intelligent information processing architectures for the vast decentralized image and text data repositories prevalent in healthcare environments. Patient information is often distributed among multiple data management systems. Current large-scale efforts to integrate medical information and knowledge sources have been costly with limited retrieval functionality. Software integration strategies to unify distributed data and knowledge sources is still lacking commercially. Systems heterogeneity (i.e., differences in hardware platforms, communication protocols, database management software, nomenclature, etc.) is at the heart of the problem and is unlikely to be standardized in the near future. In this paper, we demonstrate the use of newly available CASE (computer- aided software engineering) tools to rapidly integrate HIS, RIS, and PACS information systems. The advantages of these tools include fast development time (low-level code is generated from graphical specifications), and easy system maintenance (excellent documentation, easy to perform changes, and centralized code repository in an object-oriented database). The CASE tools are used to develop and manage the `middle-ware' in our client- mediator-serve architecture for systems integration. Our architecture is scalable and can accommodate heterogeneous database and communication protocols.

  6. MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling.

    PubMed

    Piro, Vitor C; Matschkowski, Marcel; Renard, Bernhard Y

    2017-08-14

    Many metagenome analysis tools are presently available to classify sequences and profile environmental samples. In particular, taxonomic profiling and binning methods are commonly used for such tasks. Tools available among these two categories make use of several techniques, e.g., read mapping, k-mer alignment, and composition analysis. Variations on the construction of the corresponding reference sequence databases are also common. In addition, different tools provide good results in different datasets and configurations. All this variation creates a complicated scenario to researchers to decide which methods to use. Installation, configuration and execution can also be difficult especially when dealing with multiple datasets and tools. We propose MetaMeta: a pipeline to execute and integrate results from metagenome analysis tools. MetaMeta provides an easy workflow to run multiple tools with multiple samples, producing a single enhanced output profile for each sample. MetaMeta includes a database generation, pre-processing, execution, and integration steps, allowing easy execution and parallelization. The integration relies on the co-occurrence of organisms from different methods as the main feature to improve community profiling while accounting for differences in their databases. In a controlled case with simulated and real data, we show that the integrated profiles of MetaMeta overcome the best single profile. Using the same input data, it provides more sensitive and reliable results with the presence of each organism being supported by several methods. MetaMeta uses Snakemake and has six pre-configured tools, all available at BioConda channel for easy installation (conda install -c bioconda metameta). The MetaMeta pipeline is open-source and can be downloaded at: https://gitlab.com/rki_bioinformatics .

  7. Emission Database for Global Atmospheric Research (EDGAR).

    ERIC Educational Resources Information Center

    Olivier, J. G. J.; And Others

    1994-01-01

    Presents the objective and methodology chosen for the construction of a global emissions source database called EDGAR and the structural design of the database system. The database estimates on a regional and grid basis, 1990 annual emissions of greenhouse gases, and of ozone depleting compounds from all known sources. (LZ)

  8. Exploring consumer pathways and patterns of use for ...

    EPA Pesticide Factsheets

    Background: Humans may be exposed to thousands of chemicals through contact in the workplace, home, and via air, water, food, and soil. A major challenge is estimating exposures to these chemicals, which requires understanding potential exposure routes directly related to how chemicals are used. Objectives: We aimed to assign “use categories” to a database of chemicals, including ingredients in consumer products, to help prioritize which chemicals will be given more scrutiny relative to human exposure potential and target populations. The goal was to identify (a) human activities that result in increased chemical exposure while (b) simplifying the dimensionality of hazard assessment for risk characterization. Methods: Major data sources on consumer- and industrial-process based chemical uses were compiled from multiple countries, including from regulatory agencies, manufacturers, and retailers. The resulting categorical chemical use and functional information are presented through the Chemical/Product Categories Database (CPCat). Results: CPCat contains information on 43,596 unique chemicals mapped to 833 terms categorizing their usage or function. Examples presented demonstrate potential applications of the CPCat database, including the identification of chemicals to which children may be exposed (including those that are not identified on product ingredient labels), and prioritization of chemicals for toxicity screening. The CPCat database is availabl

  9. Gene: a gene-centered information resource at NCBI.

    PubMed

    Brown, Garth R; Hem, Vichet; Katz, Kenneth S; Ovetsky, Michael; Wallin, Craig; Ermolaeva, Olga; Tolstoy, Igor; Tatusova, Tatiana; Pruitt, Kim D; Maglott, Donna R; Murphy, Terence D

    2015-01-01

    The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  10. StarView: The object oriented design of the ST DADS user interface

    NASA Technical Reports Server (NTRS)

    Williams, J. D.; Pollizzi, J. A.

    1992-01-01

    StarView is the user interface being developed for the Hubble Space Telescope Data Archive and Distribution Service (ST DADS). ST DADS is the data archive for HST observations and a relational database catalog describing the archived data. Users will use StarView to query the catalog and select appropriate datasets for study. StarView sends requests for archived datasets to ST DADS which processes the requests and returns the database to the user. StarView is designed to be a powerful and extensible user interface. Unique features include an internal relational database to navigate query results, a form definition language that will work with both CRT and X interfaces, a data definition language that will allow StarView to work with any relational database, and the ability to generate adhoc queries without requiring the user to understand the structure of the ST DADS catalog. Ultimately, StarView will allow the user to refine queries in the local database for improved performance and merge in data from external sources for correlation with other query results. The user will be able to create a query from single or multiple forms, merging the selected attributes into a single query. Arbitrary selection of attributes for querying is supported. The user will be able to select how query results are viewed. A standard form or table-row format may be used. Navigation capabilities are provided to aid the user in viewing query results. Object oriented analysis and design techniques were used in the design of StarView to support the mechanisms and concepts required to implement these features. One such mechanism is the Model-View-Controller (MVC) paradigm. The MVC allows the user to have multiple views of the underlying database, while providing a consistent mechanism for interaction regardless of the view. This approach supports both CRT and X interfaces while providing a common mode of user interaction. Another powerful abstraction is the concept of a Query Model. This concept allows a single query to be built form a single or multiple forms before it is submitted to ST DADS. Supporting this concept is the adhoc query generator which allows the user to select and qualify an indeterminate number attributes from the database. The user does not need any knowledge of how the joins across various tables are to be resolved. The adhoc generator calculates the joins automatically and generates the correct SQL query.

  11. BIG: a large-scale data integration tool for renal physiology.

    PubMed

    Zhao, Yue; Yang, Chin-Rang; Raghuram, Viswanathan; Parulekar, Jaya; Knepper, Mark A

    2016-10-01

    Due to recent advances in high-throughput techniques, we and others have generated multiple proteomic and transcriptomic databases to describe and quantify gene expression, protein abundance, or cellular signaling on the scale of the whole genome/proteome in kidney cells. The existence of so much data from diverse sources raises the following question: "How can researchers find information efficiently for a given gene product over all of these data sets without searching each data set individually?" This is the type of problem that has motivated the "Big-Data" revolution in Data Science, which has driven progress in fields such as marketing. Here we present an online Big-Data tool called BIG (Biological Information Gatherer) that allows users to submit a single online query to obtain all relevant information from all indexed databases. BIG is accessible at http://big.nhlbi.nih.gov/.

  12. Structator: fast index-based search for RNA sequence-structure patterns

    PubMed Central

    2011-01-01

    Background The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. Results We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. Conclusions The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator. PMID:21619640

  13. Indexing and retrieval of multimedia objects at different levels of granularity

    NASA Astrophysics Data System (ADS)

    Faudemay, Pascal; Durand, Gwenael; Seyrat, Claude; Tondre, Nicolas

    1998-10-01

    Intelligent access to multimedia databases for `naive user' should probably be based on queries formulation by `intelligent agents'. These agents should `understand' the semantics of the contents, learn user preferences and deliver to the user a subset of the source contents, for further navigation. The goal of such systems should be to enable `zero-command' access to the contents, while keeping the freedom of choice of the user. Such systems should interpret multimedia contents in terms of multiple audiovisual objects (from video to visual or audio object), and on actions and scenarios.

  14. CycADS: an annotation database system to ease the development and update of BioCyc databases

    PubMed Central

    Vellozo, Augusto F.; Véron, Amélie S.; Baa-Puyoulet, Patrice; Huerta-Cepas, Jaime; Cottret, Ludovic; Febvay, Gérard; Calevro, Federica; Rahbé, Yvan; Douglas, Angela E.; Gabaldón, Toni; Sagot, Marie-France; Charles, Hubert; Colella, Stefano

    2011-01-01

    In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms. Database URL: http://www.cycadsys.org PMID:21474551

  15. The Natural History of Class I Primate Alcohol Dehydrogenases Includes Gene Duplication, Gene Loss, and Gene Conversion

    PubMed Central

    Carrigan, Matthew A.; Uryasev, Oleg; Davis, Ross P.; Zhai, LanMin; Hurley, Thomas D.; Benner, Steven A.

    2012-01-01

    Background Gene duplication is a source of molecular innovation throughout evolution. However, even with massive amounts of genome sequence data, correlating gene duplication with speciation and other events in natural history can be difficult. This is especially true in its most interesting cases, where rapid and multiple duplications are likely to reflect adaptation to rapidly changing environments and life styles. This may be so for Class I of alcohol dehydrogenases (ADH1s), where multiple duplications occurred in primate lineages in Old and New World monkeys (OWMs and NWMs) and hominoids. Methodology/Principal Findings To build a preferred model for the natural history of ADH1s, we determined the sequences of nine new ADH1 genes, finding for the first time multiple paralogs in various prosimians (lemurs, strepsirhines). Database mining then identified novel ADH1 paralogs in both macaque (an OWM) and marmoset (a NWM). These were used with the previously identified human paralogs to resolve controversies relating to dates of duplication and gene conversion in the ADH1 family. Central to these controversies are differences in the topologies of trees generated from exonic (coding) sequences and intronic sequences. Conclusions/Significance We provide evidence that gene conversions are the primary source of difference, using molecular clock dating of duplications and analyses of microinsertions and deletions (micro-indels). The tree topology inferred from intron sequences appear to more correctly represent the natural history of ADH1s, with the ADH1 paralogs in platyrrhines (NWMs) and catarrhines (OWMs and hominoids) having arisen by duplications shortly predating the divergence of OWMs and NWMs. We also conclude that paralogs in lemurs arose independently. Finally, we identify errors in database interpretation as the source of controversies concerning gene conversion. These analyses provide a model for the natural history of ADH1s that posits four ADH1 paralogs in the ancestor of Catarrhine and Platyrrhine primates, followed by the loss of an ADH1 paralog in the human lineage. PMID:22859968

  16. Establishing a Dynamic Database of Blue and Fin Whale Locations from Recordings at the IMS CTBTO hydro-acoustic network. The Baleakanta Project

    NASA Astrophysics Data System (ADS)

    Le Bras, R. J.; Kuzma, H.

    2013-12-01

    Falling as they do into the frequency range of continuously recording hydrophones (15-100Hz), blue and fin whale songs are a significant source of noise on the hydro-acoustic monitoring array of the International Monitoring System (IMS) of the Comprehensive Nuclear Test Ban Treaty Organization (CTBTO). One researcher's noise, however, can be a very interesting signal in another field of study. The aim of the Baleakanta Project (www.baleakanta.org) is to flag and catalogue these songs, using the azimuth and slowness of the signal measured at multiple hydrophones to solve for the approximate location of singing whales. Applying techniques borrowed from human speaker identification, it may even be possible to recognize the songs of particular individuals. The result will be a dynamic database of whale locations and songs with known individuals noted. This database will be of great value to marine biologists studying cetaceans, as there is no existing dataset which spans the globe over many years (more than 15 years of data have been collected by the IMS). Current whale song datasets from other sources are limited to detections made on small, temporary listening devices. The IMS song catalogue will make it possible to study at least some aspects of the global migration patterns of whales, changes in their songs over time, and the habits of individuals. It is believed that about 10 blue whale 'cultures' exist with distinct vocal patterns; the IMS song catalogue will test that number. Results and a subset of the database (delayed in time to mitigate worries over whaling and harassment of the animals) will be released over the web. A traveling museum exhibit is planned which will not only educate the public about whale songs, but will also make the CTBTO and its achievements more widely known. As a testament to the public's enduring fascination with whales, initial funding for this project has been crowd-sourced through an internet campaign.

  17. Online Sources of Competitive Intelligence.

    ERIC Educational Resources Information Center

    Wagers, Robert

    1986-01-01

    Presents an approach to using online sources of information for competitor intelligence (i.e., monitoring industry and tracking activities of competitors); identifies principal sources; and suggests some ways of making use of online databases. Types and sources of information and sources and database charts are appended. Eight references are…

  18. NHEXAS PHASE I ARIZONA STUDY--STANDARD OPERATING PROCEDURE FOR DATABASE TREE AND DATA SOURCES (UA-D-41.0)

    EPA Science Inventory

    The purpose of this SOP is to describe the database storage organization, as well as describe the sources of data for each database used during the Arizona NHEXAS project and the "Border" study. Keywords: data; database; organization.

    The National Human Exposure Assessment Sur...

  19. The burden of clostridium difficile infection: estimates of the incidence of CDI from U.S. Administrative databases.

    PubMed

    Olsen, Margaret A; Young-Xu, Yinong; Stwalley, Dustin; Kelly, Ciarán P; Gerding, Dale N; Saeed, Mohammed J; Mahé, Cedric; Dubberke, Erik R

    2016-04-22

    Many administrative data sources are available to study the epidemiology of infectious diseases, including Clostridium difficile infection (CDI), but few publications have compared CDI event rates across databases using similar methodology. We used comparable methods with multiple administrative databases to compare the incidence of CDI in older and younger persons in the United States. We performed a retrospective study using three longitudinal data sources (Medicare, OptumInsight LabRx, and Healthcare Cost and Utilization Project State Inpatient Database (SID)), and two hospital encounter-level data sources (Nationwide Inpatient Sample (NIS) and Premier Perspective database) to identify CDI in adults aged 18 and older with calculation of CDI incidence rates/100,000 person-years of observation (pyo) and CDI categorization (onset and association). The incidence of CDI ranged from 66/100,000 in persons under 65 years (LabRx), 383/100,000 in elderly persons (SID), and 677/100,000 in elderly persons (Medicare). Ninety percent of CDI episodes in the LabRx population were characterized as community-onset compared to 41 % in the Medicare population. The majority of CDI episodes in the Medicare and LabRx databases were identified based on only a CDI diagnosis, whereas almost ¾ of encounters coded for CDI in the Premier hospital data were confirmed with a positive test result plus treatment with metronidazole or oral vancomycin. Using only the Medicare inpatient data to calculate encounter-level CDI events resulted in 553 CDI events/100,000 persons, virtually the same as the encounter proportion calculated using the NIS (544/100,000 persons). We found that the incidence of CDI was 35 % higher in the Medicare data and fewer episodes were attributed to hospital acquisition when all medical claims were used to identify CDI, compared to only inpatient data lacking information on diagnosis and treatment in the outpatient setting. The incidence of CDI was 10-fold lower and the proportion of community-onset CDI was much higher in the privately insured younger LabRx population compared to the elderly Medicare population. The methods we developed to identify incident CDI can be used by other investigators to study the incidence of other infectious diseases and adverse events using large generalizable administrative datasets.

  20. Evaluating Open-Source Full-Text Search Engines for Matching ICD-10 Codes.

    PubMed

    Jurcău, Daniel-Alexandru; Stoicu-Tivadar, Vasile

    2016-01-01

    This research presents the results of evaluating multiple free, open-source engines on matching ICD-10 diagnostic codes via full-text searches. The study investigates what it takes to get an accurate match when searching for a specific diagnostic code. For each code the evaluation starts by extracting the words that make up its text and continues with building full-text search queries from the combinations of these words. The queries are then run against all the ICD-10 codes until a match indicates the code in question as a match with the highest relative score. This method identifies the minimum number of words that must be provided in order for the search engines choose the desired entry. The engines analyzed include a popular Java-based full-text search engine, a lightweight engine written in JavaScript which can even execute on the user's browser, and two popular open-source relational database management systems.

  1. Integration and Beyond

    PubMed Central

    Stead, William W.; Miller, Randolph A.; Musen, Mark A.; Hersh, William R.

    2000-01-01

    The vision of integrating information—from a variety of sources, into the way people work, to improve decisions and process—is one of the cornerstones of biomedical informatics. Thoughts on how this vision might be realized have evolved as improvements in information and communication technologies, together with discoveries in biomedical informatics, and have changed the art of the possible. This review identified three distinct generations of “integration” projects. First-generation projects create a database and use it for multiple purposes. Second-generation projects integrate by bringing information from various sources together through enterprise information architecture. Third-generation projects inter-relate disparate but accessible information sources to provide the appearance of integration. The review suggests that the ideas developed in the earlier generations have not been supplanted by ideas from subsequent generations. Instead, the ideas represent a continuum of progress along the three dimensions of workflow, structure, and extraction. PMID:10730596

  2. Circum-Arctic petroleum systems identified using decision-tree chemometrics

    USGS Publications Warehouse

    Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.

    2007-01-01

    Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.

  3. The STEP (Safety and Toxicity of Excipients for Paediatrics) database: part 2 - the pilot version.

    PubMed

    Salunke, Smita; Brandys, Barbara; Giacoia, George; Tuleu, Catherine

    2013-11-30

    The screening and careful selection of excipients is a critical step in paediatric formulation development as certain excipients acceptable in adult formulations, may not be appropriate for paediatric use. While there is extensive toxicity data that could help in better understanding and highlighting the gaps in toxicity studies, the data are often scattered around the information sources and saddled with incompatible data types and formats. This paper is the second in a series that presents the update on the Safety and Toxicity of Excipients for Paediatrics ("STEP") database being developed by Eu-US PFIs, and describes the architecture data fields and functions of the database. The STEP database is a user designed resource that compiles the safety and toxicity data of excipients that is scattered over various sources and presents it in one freely accessible source. Currently, in the pilot database data from over 2000 references/10 excipients presenting preclinical, clinical, regulatory information and toxicological reviews, with references and source links. The STEP database allows searching "FOR" excipients and "BY" excipients. This dual nature of the STEP database, in which toxicity and safety information can be searched in both directions, makes it unique from existing sources. If the pilot is successful, the aim is to increase the number of excipients in the existing database so that a database large enough to be of practical research use will be available. It is anticipated that this source will prove to be a useful platform for data management and data exchange of excipient safety information. Copyright © 2013 Elsevier B.V. All rights reserved.

  4. A precipitation database of station-based daily and monthly measurements for West Africa: Overview, quality control and harmonization

    NASA Astrophysics Data System (ADS)

    Bliefernicht, Jan; Waongo, Moussa; Annor, Thompson; Laux, Patrick; Lorenz, Manuel; Salack, Seyni; Kunstmann, Harald

    2017-04-01

    West Africa is a data sparse region. High quality and long-term precipitation data are often not readily available for applications in hydrology, agriculture, meteorology and other needs. To close this gap, we use multiple data sources to develop a precipitation database with long-term daily and monthly time series. This database was compiled from 16 archives including global databases e.g. from the Global Historical Climatology Network (GHCN), databases from research projects (e.g. the AMMA database) and databases of the national meteorological services of some West African countries. The collection consists of more than 2000 precipitation gauges with measurements dating from 1850 to 2015. Due to erroneous measurements (e.g. temporal offsets, unit conversion errors), missing values and inconsistent meta-data, the merging of this precipitation dataset is not straightforward and requires a thorough quality control and harmonization. To this end, we developed geostatistical-based algorithms for quality control of individual databases and harmonization to a joint database. The algorithms are based on a pairwise comparison of the correspondence of precipitation time series in dependence to the distance between stations. They were tested for precipitation time series from gages located in a rectangular domain covering Burkina Faso, Ghana, Benin and Togo. This harmonized and quality controlled precipitation database was recently used for several applications such as the validation of a high resolution regional climate model and the bias correction of precipitation projections provided the Coordinated Regional Climate Downscaling Experiment (CORDEX). In this presentation, we will give an overview of the novel daily and monthly precipitation database and the algorithms used for quality control and harmonization. We will also highlight the quality of global and regional archives (e.g. GHCN, GSOD, AMMA database) in comparison to the precipitation databases provided by the national meteorological services.

  5. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.

    PubMed

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.

  6. VitisExpDB: a database resource for grape functional genomics.

    PubMed

    Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

    2008-02-28

    The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores approximately 320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of approximately 20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website http://cropdisease.ars.usda.gov/vitis_at/main-page.htm.

  7. VitisExpDB: A database resource for grape functional genomics

    PubMed Central

    Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

    2008-01-01

    Background The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. Description VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of ~20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. Conclusion The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website . PMID:18307813

  8. GeneSCF: a real-time based functional enrichment tool with support for multiple organisms.

    PubMed

    Subhash, Santhilal; Kanduri, Chandrasekhar

    2016-09-13

    High-throughput technologies such as ChIP-sequencing, RNA-sequencing, DNA sequencing and quantitative metabolomics generate a huge volume of data. Researchers often rely on functional enrichment tools to interpret the biological significance of the affected genes from these high-throughput studies. However, currently available functional enrichment tools need to be updated frequently to adapt to new entries from the functional database repositories. Hence there is a need for a simplified tool that can perform functional enrichment analysis by using updated information directly from the source databases such as KEGG, Reactome or Gene Ontology etc. In this study, we focused on designing a command-line tool called GeneSCF (Gene Set Clustering based on Functional annotations), that can predict the functionally relevant biological information for a set of genes in a real-time updated manner. It is designed to handle information from more than 4000 organisms from freely available prominent functional databases like KEGG, Reactome and Gene Ontology. We successfully employed our tool on two of published datasets to predict the biologically relevant functional information. The core features of this tool were tested on Linux machines without the need for installation of more dependencies. GeneSCF is more reliable compared to other enrichment tools because of its ability to use reference functional databases in real-time to perform enrichment analysis. It is an easy-to-integrate tool with other pipelines available for downstream analysis of high-throughput data. More importantly, GeneSCF can run multiple gene lists simultaneously on different organisms thereby saving time for the users. Since the tool is designed to be ready-to-use, there is no need for any complex compilation and installation procedures.

  9. Exploring earthquake databases for the creation of magnitude-homogeneous catalogues: tools for application on a regional and global scale

    NASA Astrophysics Data System (ADS)

    Weatherill, G. A.; Pagani, M.; Garcia, J.

    2016-09-01

    The creation of a magnitude-homogenized catalogue is often one of the most fundamental steps in seismic hazard analysis. The process of homogenizing multiple catalogues of earthquakes into a single unified catalogue typically requires careful appraisal of available bulletins, identification of common events within multiple bulletins and the development and application of empirical models to convert from each catalogue's native scale into the required target. The database of the International Seismological Center (ISC) provides the most exhaustive compilation of records from local bulletins, in addition to its reviewed global bulletin. New open-source tools are developed that can utilize this, or any other compiled database, to explore the relations between earthquake solutions provided by different recording networks, and to build and apply empirical models in order to harmonize magnitude scales for the purpose of creating magnitude-homogeneous earthquake catalogues. These tools are described and their application illustrated in two different contexts. The first is a simple application in the Sub-Saharan Africa region where the spatial coverage and magnitude scales for different local recording networks are compared, and their relation to global magnitude scales explored. In the second application the tools are used on a global scale for the purpose of creating an extended magnitude-homogeneous global earthquake catalogue. Several existing high-quality earthquake databases, such as the ISC-GEM and the ISC Reviewed Bulletins, are harmonized into moment magnitude to form a catalogue of more than 562 840 events. This extended catalogue, while not an appropriate substitute for a locally calibrated analysis, can help in studying global patterns in seismicity and hazard, and is therefore released with the accompanying software.

  10. Integration of Schemas on the Pre-Design Level Using the KCPM-Approach

    NASA Astrophysics Data System (ADS)

    Vöhringer, Jürgen; Mayr, Heinrich C.

    Integration is a central research and operational issue in information system design and development. It can be conducted on the system, schema, and view or data level. On the system level, integration deals with the progressive linking and testing of system components to merge their functional and technical characteristics and behavior into a comprehensive, interoperable system. Schema integration comprises the comparison and merging of two or more schemas, usually conceptual database schemas. The integration of data deals with merging the contents of multiple sources of related data. View integration is similar to schema integration, however focuses on views and queries on these instead of schemas. All these types of integration have in common, that two or more sources are merged and previously compared, in order to identify matches and mismatches as well as conflicts and inconsistencies. The sources may stem from heterogeneous companies, organizational units or projects. Integration enables the reuse and combined use of source components.

  11. Dementia ascertainment using existing data in UK longitudinal and cohort studies: a systematic review of methodology.

    PubMed

    Sibbett, Ruth A; Russ, Tom C; Deary, Ian J; Starr, John M

    2017-07-03

    Studies investigating the risk factors for or causation of dementia must consider subjects prior to disease onset. To overcome the limitations of prospective studies and self-reported recall of information, the use of existing data is key. This review provides a narrative account of dementia ascertainment methods using sources of existing data. The literature search was performed using: MEDLINE, EMBASE, PsychInfo and Web of Science. Included articles reported a UK-based study of dementia in which cases were ascertained using existing data. Existing data included that which was routinely collected and that which was collected for previous research. After removing duplicates, abstracts were screened and the remaining articles were included for full-text review. A quality tool was used to evaluate the description of the ascertainment methodology. Of the 3545 abstracts screened, 360 articles were selected for full-text review. 47 articles were included for final consideration. Data sources for ascertainment included: death records, national datasets, research databases and hospital records among others. 36 articles used existing data alone for ascertainment, of which 27 used only a single data source. The most frequently used source was a research database. Quality scores ranged from 7/16 to 16/16. Quality scores were better for articles with dementia ascertainment as an outcome. Some papers performed validation studies of dementia ascertainment and most indicated that observed rates of dementia were lower than expected. We identified a lack of consistency in dementia ascertainment methodology using existing data. With no data source identified as a "gold-standard", we suggest the use of multiple sources. Where possible, studies should access records with evidence to confirm the diagnosis. Studies should also calculate the dementia ascertainment rate for the population being studied to enable a comparison with an expected rate.

  12. Follow-up of serious offender patients in the community: multiple methods of tracing.

    PubMed

    Jamieson, Elizabeth; Taylor, Pamela J

    2002-01-01

    Longitudinal studies of people with mental disorder are important in understanding outcome and intervention effects but attrition rates can be high. This study aimed to evaluate use of multiple record sources to trace, over 12 years, a one-year discharge cohort of high-security hospital patients. Everyone leaving such a hospital in 1984 was traced until a census date of 31 December 1995. Data were collected from several national databases (Office for National Statistics (ONS), Home Office (HO) Offenders' Index, Police National Computer Records, the Electoral Roll) and by hand-searching responsible agency records (HO, National Health Service). Using all methods, only three of the 204 patients had no follow-up information. Home Office Mental Health Unit data were an excellent source, but only for people still under discharge restrictions (<50% after eight years). Sequential tracing of hospital placements for people never or no longer under such restrictions was laborious and also produced only group-specific yield. The best indicator of community residence was ONS information on general practitioner (GP/primary care) registration. The electoral roll was useful when other sources were exhausted. Follow-up of offenders/offender-patients has generally focused on event data, such as re-offending. People untraced by that method alone, however, are unlikely to be lost to follow-up on casting a wider records net. Using multiple records, attrition at the census was 38%, but, after certain assumptions, reduced further to 5%.

  13. WE-D-9A-06: Open Source Monitor Calibration and Quality Control Software for Enterprise Display Management

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bevins, N; Vanderhoek, M; Lang, S

    2014-06-15

    Purpose: Medical display monitor calibration and quality control present challenges to medical physicists. The purpose of this work is to demonstrate and share experiences with an open source package that allows for both initial monitor setup and routine performance evaluation. Methods: A software package, pacsDisplay, has been developed over the last decade to aid in the calibration of all monitors within the radiology group in our health system. The software is used to calibrate monitors to follow the DICOM Grayscale Standard Display Function (GSDF) via lookup tables installed on the workstation. Additional functionality facilitates periodic evaluations of both primary andmore » secondary medical monitors to ensure satisfactory performance. This software is installed on all radiology workstations, and can also be run as a stand-alone tool from a USB disk. Recently, a database has been developed to store and centralize the monitor performance data and to provide long-term trends for compliance with internal standards and various accrediting organizations. Results: Implementation and utilization of pacsDisplay has resulted in improved monitor performance across the health system. Monitor testing is now performed at regular intervals and the software is being used across multiple imaging modalities. Monitor performance characteristics such as maximum and minimum luminance, ambient luminance and illuminance, color tracking, and GSDF conformity are loaded into a centralized database for system performance comparisons. Compliance reports for organizations such as MQSA, ACR, and TJC are generated automatically and stored in the same database. Conclusion: An open source software solution has simplified and improved the standardization of displays within our health system. This work serves as an example method for calibrating and testing monitors within an enterprise health system.« less

  14. Wolf Testing: Open Source Testing Software

    NASA Astrophysics Data System (ADS)

    Braasch, P.; Gay, P. L.

    2004-12-01

    Wolf Testing is software for easily creating and editing exams. Wolf Testing allows the user to create an exam from a database of questions, view it on screen, and easily print it along with the corresponding answer guide. The questions can be multiple choice, short answer, long answer, or true and false varieties. This software can be accessed securely from any location, allowing the user to easily create exams from home. New questions, which can include associated pictures, can be added through a web-interface. After adding in questions, they can be edited, deleted, or duplicated into multiple versions. Long-term test creation is simplified, as you are able to quickly see what questions you have asked in the past and insert them, with or without editing, into future tests. All tests are archived in the database. Written in PHP and MySQL, this software can be installed on any UNIX / Linux platform, including Macintosh OS X. The secure interface keeps students out, and allows you to decide who can create tests and who can edit information already in the database. Tests can be output as either html with pictures or rich text without pictures, and there are plans to add PDF and MS Word formats as well. We would like to thank Dr. Wolfgang Rueckner and the Harvard University Science Center for providing incentive to start this project, computers and resources to complete this project, and inspiration for the project's name. We would also like to thank Dr. Ronald Newburgh for his assistance in beta testing.

  15. Cross-Matching Source Observations from the Palomar Transient Factory (PTF)

    NASA Astrophysics Data System (ADS)

    Laher, Russ; Grillmair, C.; Surace, J.; Monkewitz, S.; Jackson, E.

    2009-01-01

    Over the four-year lifetime of the PTF project, approximately 40 billion instances of astronomical-source observations will be extracted from the image data. The instances will correspond to the same astronomical objects being observed at roughly 25-50 different times, and so a very large catalog containing important object-variability information will be the chief PTF product. Organizing astronomical-source catalogs is conventionally done by dividing the catalog into declination zones and sorting by right ascension within each zone (e.g., the USNOA star catalog), in order to facilitate catalog searches. This method was reincarnated as the "zones" algorithm in a SQL-Server database implementation (Szalay et al., MSR-TR-2004-32), with corrections given by Gray et al. (MSR-TR-2006-52). The primary advantage of this implementation is that all of the work is done entirely on the database server and client/server communication is eliminated. We implemented the methods outlined in Gray et al. for a PostgreSQL database. We programmed the methods as database functions in PL/pgSQL procedural language. The cross-matching is currently based on source positions, but we intend to extend it to use both positions and positional uncertainties to form a chi-square statistic for optimal thresholding. The database design includes three main tables, plus a handful of internal tables. The Sources table stores the SExtractor source extractions taken at various times; the MergedSources table stores statistics about the astronomical objects, which are the result of cross-matching records in the Sources table; and the Merges table, which associates cross-matched primary keys in the Sources table with primary keys in the MergedSoures table. Besides judicious database indexing, we have also internally partitioned the Sources table by declination zone, in order to speed up the population of Sources records and make the database more manageable. The catalog will be accessible to the public after the proprietary period through IRSA (irsa.ipac.caltech.edu).

  16. Epidemiology, quality, and reporting characteristics of systematic reviews and meta-analyses of nursing interventions published in Chinese journals.

    PubMed

    Zhang, Juxia; Wang, Jiancheng; Han, Lin; Zhang, Fengwa; Cao, Jianxun; Ma, Yuxia

    2015-01-01

    Systematic reviews (SRs) and meta-analyses (MAs) of nursing interventions have become increasingly popular in China. This review provides the first examination of epidemiological characteristics of these SRs as well as compliance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses and Assessment of Multiple Systematic Reviews guidelines. The purpose of this study was to examine epidemiologic and reporting characteristics as well as the methodologic quality of SRs and MAs of nursing interventions published in Chinese journals. Four Chinese databases were searched (the Chinese Biomedicine Literature Database, Chinese Scientific Journal Full-text Database, Chinese Journal Full-text Database, and Wanfang Database) for SRs and MAs of nursing intervention from inception through June 2013. Data were extracted into Excel (Microsoft, Redmond, WA). The Assessment of Multiple Systematic Reviews and Preferred Reporting Items for Systematic Reviews and Meta-analyses checklists were used to assess methodologic quality and reporting characteristics, respectively. A total of 144 SRs were identified, most (97.2%) of which used "systematic review" or "meta-analyses" in the titles. None of the reviews had been updated. Nearly half (41%) were written by nurses, and more than half (61%) were reported in specialist journals. The most common conditions studied were endocrine, nutritional and metabolic diseases, and neoplasms. Most (70.8%) reported information about quality assessment, whereas less than half (25%) reported assessing for publication bias. None of the reviews reported a conflict of interest. Although many SRs of nursing interventions have been published in Chinese journals, the quality of these reviews is of concern. As a potential key source of information for nurses and nursing administrators, not only were many of these reviews incomplete in the information they provided, but also some results were misleading. Improving the quality of SRs of nursing interventions conducted and published by nurses in China is urgently needed in order to increase the value of these studies. Copyright © 2015 Elsevier Inc. All rights reserved.

  17. Bayesian approach to transforming public gene expression repositories into disease diagnosis databases.

    PubMed

    Huang, Haiyan; Liu, Chun-Chi; Zhou, Xianghong Jasmine

    2010-04-13

    The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes the first study to transform public gene expression repositories into an automated disease diagnosis database. Particularly, we have developed a systematic framework, including a two-stage Bayesian learning approach, to achieve the diagnosis of one or multiple diseases for a query expression profile along a hierarchical disease taxonomy. Our approach, including standardizing cross-platform gene expression data and heterogeneous disease annotations, allows analyzing both sources of information in a unified probabilistic system. A high level of overall diagnostic accuracy was shown by cross validation. It was also demonstrated that the power of our method can increase significantly with the continued growth of public gene expression repositories. Finally, we showed how our disease diagnosis system can be used to characterize complex phenotypes and to construct a disease-drug connectivity map.

  18. The dye-sensitized solar cell database.

    PubMed

    Venkatraman, Vishwesh; Raju, Rajesh; Oikonomopoulos, Solon P; Alsberg, Bjørn K

    2018-04-03

    Dye-sensitized solar cells (DSSCs) have garnered a lot of attention in recent years. The solar energy to power conversion efficiency of a DSSC is influenced by various components of the cell such as the dye, electrolyte, electrodes and additives among others leading to varying experimental configurations. A large number of metal-based and metal-free dye sensitizers have now been reported and tools using such data to indicate new directions for design and development are on the rise. DSSCDB, the first of its kind dye-sensitized solar cell database, aims to provide users with up-to-date information from publications on the molecular structures of the dyes, experimental details and reported measurements (efficiencies and spectral properties) and thereby facilitate a comprehensive and critical evaluation of the data. Currently, the DSSCDB contains over 4000 experimental observations spanning multiple dye classes such as triphenylamines, carbazoles, coumarins, phenothiazines, ruthenium and porphyrins. The DSSCDB offers a web-based, comprehensive source of property data for dye sensitized solar cells. Access to the database is available through the following URL: www.dyedb.com .

  19. Tephrabase: A tephrochronological data

    NASA Astrophysics Data System (ADS)

    Newton, Anthony

    2015-04-01

    Development of Tephrabase, a tephrochronological database,, began over 20 years ago and was it launched in June 1995 as one of the earliest scientific databases on the web. Tephrabase was designed from the start to include a wide range of tephrochronological data including location, depth of the layer, geochemical composition (major to trace elements), physical properties (colour, grainsize, and mineral components), dating (both absolute/historical and radiometric), details of eruptions and the history of volcanic centres, as well as a reference database. Currently, Tephrabase contains details of over 1000 sites where tephra layers have been found, 3500 tephra layers, 3500 geochemical analyses and 2500 references. Tephrabase was originally developed to include tephra layers in Iceland and those of Icelandic origin found in NW Europe, it also now includes data on tephra layers from central Mexico and from the Laacher See eruption. The latter was developed as a supplement to the Iceland-centric nature of the rest of Tephrabase. A further extension to Tephrabase has seen the development of an automated method of producing tephra stratigraphic columns, calculating sediment accumulation rates between dated tephra layers in multiple profiles and mapping tephra layers across the landscape. Whilst Tephrabase has been successful and continues to be developed and updated, there are several issues which need to be. More tephrochronological databases need to be developed and these should allow connected/shared searches. This would provide worldwide coverage, but also the flexibility to develop spin off small-scale extensions, such as those described above. Data uploading needs to be improved and simplified. This includes the need to clarify issues of quality control. Again, a common standards led approach to this seems appropriate. Researchers also need to be encouraged to contribute data to these databases. Tephrabase was designed to include a variety of data, including physical properties and trace element compositions of the tephra layers. However, Tephrabase is conspicuous by not containing these data. Tephrabase and other databases need to include these. Tephra databases need to not only record details about tephra layers, but should also be tools to understand environmental change and understand volcanic histories. These can be achieved through development of databases themselves and through the creations of portals which draw data from multiple data sources.

  20. The establishment and use of the point source catalog database of the 2MASS near infrared survey

    NASA Astrophysics Data System (ADS)

    Gao, Y. F.; Shan, H. G.; Cheng, D.

    2003-02-01

    The 2MASS near infrared survey project is introduced briefly. The 2MASS point sources catalog (2MASS PSC) database and the network query system are established by using the PHP Hypertext Preprocessor and MySQL database server. By using the system, one can not only query information of sources listed in the catalog, but also draw the plots related. Moreover, after the 2MASS data are diagnosed , some research fields which can be benefited from this database are suggested.

  1. From the Children’s Oncology Group: Evidence-based recommendations for PEG-asparaginase nurse monitoring, hypersensitivity reaction management, and patient/family education

    PubMed Central

    Woods, Deborah; Winchester, Kari; Towerman, Alison; Gettinger, Katie; Carey, Christina; Timmermann, Karen; Langley, Rachel; Browne, Emily

    2017-01-01

    PEG-aspariginase is a backbone chemotherapy agent in pediatric acute lymphoblastic leukemia and in some non-Hodgkin lymphoma therapies. Nurses lack standardized guidelines for monitoring patients receiving PEG-asparaginase and for educating patients/families about hypersensitivity reaction risks. An electronic search of six databases using publication years 2000–2015 and multiple professional organizations and clinical resources was conducted. Evidence sources were reviewed for topic applicability. Each of the final 23 sources was appraised by two team members. The GRADE system was used to assign a quality and strength rating for each recommendation. Multiple recommendations were developed: four relating to nurse monitoring of patients during and after drug administration, eight guiding hypersensitivity reaction management, and four concerning patient/family educational content. These strong recommendations were based on moderate, low, or very-low quality evidence. Several recommendations relied upon generalized drug hypersensitivity guidelines. Additional research is needed to safely guide PEG-asparaginase monitoring, hypersensitivity reaction management and patient/family education. Nurses administering PEG-asparaginase play a critical role in the early identification and management of hypersensitivity reactions. PMID:28602129

  2. Evaluation of Cross-Protocol Stability of a Fully Automated Brain Multi-Atlas Parcellation Tool.

    PubMed

    Liang, Zifei; He, Xiaohai; Ceritoglu, Can; Tang, Xiaoying; Li, Yue; Kutten, Kwame S; Oishi, Kenichi; Miller, Michael I; Mori, Susumu; Faria, Andreia V

    2015-01-01

    Brain parcellation tools based on multiple-atlas algorithms have recently emerged as a promising method with which to accurately define brain structures. When dealing with data from various sources, it is crucial that these tools are robust for many different imaging protocols. In this study, we tested the robustness of a multiple-atlas, likelihood fusion algorithm using Alzheimer's Disease Neuroimaging Initiative (ADNI) data with six different protocols, comprising three manufacturers and two magnetic field strengths. The entire brain was parceled into five different levels of granularity. In each level, which defines a set of brain structures, ranging from eight to 286 regions, we evaluated the variability of brain volumes related to the protocol, age, and diagnosis (healthy or Alzheimer's disease). Our results indicated that, with proper pre-processing steps, the impact of different protocols is minor compared to biological effects, such as age and pathology. A precise knowledge of the sources of data variation enables sufficient statistical power and ensures the reliability of an anatomical analysis when using this automated brain parcellation tool on datasets from various imaging protocols, such as clinical databases.

  3. Traditional Medicine Collection Tracking System (TM-CTS): a database for ethnobotanically driven drug-discovery programs.

    PubMed

    Harris, Eric S J; Erickson, Sean D; Tolopko, Andrew N; Cao, Shugeng; Craycroft, Jane A; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E; Eisenberg, David M

    2011-05-17

    Ethnobotanically driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically driven natural product collection and drug-discovery programs. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  4. Traditional Medicine Collection Tracking System (TM-CTS): A Database for Ethnobotanically-Driven Drug-Discovery Programs

    PubMed Central

    Harris, Eric S. J.; Erickson, Sean D.; Tolopko, Andrew N.; Cao, Shugeng; Craycroft, Jane A.; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E.; Eisenberg, David M.

    2011-01-01

    Aim of the study. Ethnobotanically-driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine-Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. Materials and Methods. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. Results. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. Conclusions. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically-driven natural product collection and drug-discovery programs. PMID:21420479

  5. U.S.-MEXICO BORDER PROGRAM ARIZONA BORDER STUDY--STANDARD OPERATING PROCEDURE FOR DATABASE TREE AND DATA SOURCES (UA-D-41.0)

    EPA Science Inventory

    The purpose of this SOP is to describe the database storage organization, and to describe the sources of data for each database used during the Arizona NHEXAS project and the Border study. Keywords: data; database; organization.

    The U.S.-Mexico Border Program is sponsored by t...

  6. SSER: Species specific essential reactions database.

    PubMed

    Labena, Abraham A; Ye, Yuan-Nong; Dong, Chuan; Zhang, Fa-Z; Guo, Feng-Biao

    2017-04-19

    Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database. SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser .

  7. BIG: a large-scale data integration tool for renal physiology

    PubMed Central

    Zhao, Yue; Yang, Chin-Rang; Raghuram, Viswanathan; Parulekar, Jaya

    2016-01-01

    Due to recent advances in high-throughput techniques, we and others have generated multiple proteomic and transcriptomic databases to describe and quantify gene expression, protein abundance, or cellular signaling on the scale of the whole genome/proteome in kidney cells. The existence of so much data from diverse sources raises the following question: “How can researchers find information efficiently for a given gene product over all of these data sets without searching each data set individually?” This is the type of problem that has motivated the “Big-Data” revolution in Data Science, which has driven progress in fields such as marketing. Here we present an online Big-Data tool called BIG (Biological Information Gatherer) that allows users to submit a single online query to obtain all relevant information from all indexed databases. BIG is accessible at http://big.nhlbi.nih.gov/. PMID:27279488

  8. ChEMBL web services: streamlining access to drug discovery data and utilities

    PubMed Central

    Davies, Mark; Nowotka, Michał; Papadatos, George; Dedman, Nathan; Gaulton, Anna; Atkinson, Francis; Bellis, Louisa; Overington, John P.

    2015-01-01

    ChEMBL is now a well-established resource in the fields of drug discovery and medicinal chemistry research. The ChEMBL database curates and stores standardized bioactivity, molecule, target and drug data extracted from multiple sources, including the primary medicinal chemistry literature. Programmatic access to ChEMBL data has been improved by a recent update to the ChEMBL web services (version 2.0.x, https://www.ebi.ac.uk/chembl/api/data/docs), which exposes significantly more data from the underlying database and introduces new functionality. To complement the data-focused services, a utility service (version 1.0.x, https://www.ebi.ac.uk/chembl/api/utils/docs), which provides RESTful access to commonly used cheminformatics methods, has also been concurrently developed. The ChEMBL web services can be used together or independently to build applications and data processing workflows relevant to drug discovery and chemical biology. PMID:25883136

  9. The Saccharomyces Genome Database Variant Viewer.

    PubMed

    Sheppard, Travis K; Hitz, Benjamin C; Engel, Stacia R; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla S; Demeter, Janos; Hellerstedt, Sage T; Karra, Kalpana; Nash, Robert S; Paskov, Kelley M; Skrzypek, Marek S; Weng, Shuai; Wong, Edith D; Cherry, J Michael

    2016-01-04

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Development of user-friendly and interactive data collection system for cerebral palsy.

    PubMed

    Raharjo, I; Burns, T G; Venugopalan, J; Wang, M D

    2016-02-01

    Cerebral palsy (CP) is a permanent motor disorder that appears in early age and it requires multiple tests to assess the physical and mental capabilities of the patients. Current medical record data collection systems, e.g., EPIC, employed for CP are very general, difficult to navigate, and prone to errors. The data cannot easily be extracted which limits data analysis on this rich source of information. To overcome these limitations, we designed and prototyped a database with a graphical user interface geared towards clinical research specifically in CP. The platform with MySQL and Java framework is reliable, secure, and can be easily integrated with other programming languages for data analysis such as MATLAB. This database with GUI design is a promising tool for data collection and can be applied in many different fields aside from CP to infer useful information out of the vast amount of data being collected.

  11. Development of user-friendly and interactive data collection system for cerebral palsy

    PubMed Central

    Raharjo, I.; Burns, T. G.; Venugopalan, J.; Wang., M. D.

    2016-01-01

    Cerebral palsy (CP) is a permanent motor disorder that appears in early age and it requires multiple tests to assess the physical and mental capabilities of the patients. Current medical record data collection systems, e.g., EPIC, employed for CP are very general, difficult to navigate, and prone to errors. The data cannot easily be extracted which limits data analysis on this rich source of information. To overcome these limitations, we designed and prototyped a database with a graphical user interface geared towards clinical research specifically in CP. The platform with MySQL and Java framework is reliable, secure, and can be easily integrated with other programming languages for data analysis such as MATLAB. This database with GUI design is a promising tool for data collection and can be applied in many different fields aside from CP to infer useful information out of the vast amount of data being collected. PMID:28133638

  12. Biological data warehousing system for identifying transcriptional regulatory sites from gene expressions of microarray data.

    PubMed

    Tsou, Ann-Ping; Sun, Yi-Ming; Liu, Chia-Lin; Huang, Hsien-Da; Horng, Jorng-Tzong; Tsai, Meng-Feng; Liu, Baw-Juine

    2006-07-01

    Identification of transcriptional regulatory sites plays an important role in the investigation of gene regulation. For this propose, we designed and implemented a data warehouse to integrate multiple heterogeneous biological data sources with data types such as text-file, XML, image, MySQL database model, and Oracle database model. The utility of the biological data warehouse in predicting transcriptional regulatory sites of coregulated genes was explored using a synexpression group derived from a microarray study. Both of the binding sites of known transcription factors and predicted over-represented (OR) oligonucleotides were demonstrated for the gene group. The potential biological roles of both known nucleotides and one OR nucleotide were demonstrated using bioassays. Therefore, the results from the wet-lab experiments reinforce the power and utility of the data warehouse as an approach to the genome-wide search for important transcription regulatory elements that are the key to many complex biological systems.

  13. System, method and apparatus for generating phrases from a database

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W. (Inventor)

    2004-01-01

    A phrase generation is a method of generating sequences of terms, such as phrases, that may occur within a database of subsets containing sequences of terms, such as text. A database is provided and a relational model of the database is created. A query is then input. The query includes a term or a sequence of terms or multiple individual terms or multiple sequences of terms or combinations thereof. Next, several sequences of terms that are contextually related to the query are assembled from contextual relations in the model of the database. The sequences of terms are then sorted and output. Phrase generation can also be an iterative process used to produce sequences of terms from a relational model of a database.

  14. International Data on Radiological Sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martha Finck; Margaret Goldberg

    2010-07-01

    ABSTRACT The mission of radiological dispersal device (RDD) nuclear forensics is to identify the provenance of nuclear and radiological materials used in RDDs and to aid law enforcement in tracking nuclear materials and routes. The application of databases to radiological forensics is to match RDD source material to a source model in the database, provide guidance regarding a possible second device, and aid the FBI by providing a short list of manufacturers and distributors, and ultimately to the last legal owner of the source. The Argonne/Idaho National Laboratory RDD attribution database is a powerful technical tool in radiological forensics. Themore » database (1267 unique vendors) includes all sealed sources and a device registered in the U.S., is complemented by data from the IAEA Catalogue, and is supported by rigorous in-lab characterization of selected sealed sources regarding physical form, radiochemical composition, and age-dating profiles. Close working relationships with global partners in the commercial sealed sources industry provide invaluable technical information and expertise in the development of signature profiles. These profiles are critical to the down-selection of potential candidates in either pre- or post- event RDD attribution. The down-selection process includes a match between an interdicted (or detonated) source and a model in the database linked to one or more manufacturers and distributors.« less

  15. Artificial neural networks for retrieving absorption and reduced scattering spectra from frequency-domain diffuse reflectance spectroscopy at short source-detector separation

    PubMed Central

    Chen, Yu-Wen; Chen, Chien-Chih; Huang, Po-Jung; Tseng, Sheng-Hao

    2016-01-01

    Diffuse reflectance spectroscopy (DRS) based on the frequency-domain (FD) technique has been employed to investigate the optical properties of deep tissues such as breast and brain using source to detector separation up to 40 mm. Due to the modeling and system limitations, efficient and precise determination of turbid sample optical properties from the FD diffuse reflectance acquired at a source-detector separation (SDS) of around 1 mm has not been demonstrated. In this study, we revealed that at SDS of 1 mm, acquiring FD diffuse reflectance at multiple frequencies is necessary for alleviating the influence of inevitable measurement uncertainty on the optical property recovery accuracy. Furthermore, we developed artificial neural networks (ANNs) trained by Monte Carlo simulation generated databases that were capable of efficiently determining FD reflectance at multiple frequencies. The ANNs could work in conjunction with a least-square optimization algorithm to rapidly (within 1 second), accurately (within 10%) quantify the sample optical properties from FD reflectance measured at SDS of 1 mm. In addition, we demonstrated that incorporating the steady-state apparatus into the FD DRS system with 1 mm SDS would enable obtaining broadband absorption and reduced scattering spectra of turbid samples in the wavelength range from 650 to 1000 nm. PMID:27446671

  16. BioCarian: search engine for exploratory searches in heterogeneous biological databases.

    PubMed

    Zaki, Nazar; Tennakoon, Chandana

    2017-10-02

    There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.

  17. DASMiner: discovering and integrating data from DAS sources

    PubMed Central

    2009-01-01

    Background DAS is a widely adopted protocol for providing syntactic interoperability among biological databases. The popularity of DAS is due to a simplified and elegant mechanism for data exchange that consists of sources exposing their RESTful interfaces for data access. As a growing number of DAS services are available for molecular biology resources, there is an incentive to explore this protocol in order to advance data discovery and integration among these resources. Results We developed DASMiner, a Matlab toolkit for querying DAS data sources that enables creation of integrated biological models using the information available in DAS-compliant repositories. DASMiner is composed by a browser application and an API that work together to facilitate gathering of data from different DAS sources, which can be used for creating enriched datasets from multiple sources. The browser is used to formulate queries and navigate data contained in DAS sources. Users can execute queries against these sources in an intuitive fashion, without the need of knowing the specific DAS syntax for the particular source. Using the source's metadata provided by the DAS Registry, the browser's layout adapts to expose only the set of commands and coordinate systems supported by the specific source. For this reason, the browser can interrogate any DAS source, independently of the type of data being served. The API component of DASMiner may be used for programmatic access of DAS sources by programs in Matlab. Once the desired data is found during navigation, the query is exported in the format of an API call to be used within any Matlab application. We illustrate the use of DASMiner by creating integrative models of histone modification maps and protein-protein interaction networks. These enriched datasets were built by retrieving and integrating distributed genomic and proteomic DAS sources using the API. Conclusion The support of the DAS protocol allows that hundreds of molecular biology databases to be treated as a federated, online collection of resources. DASMiner enables full exploration of these resources, and can be used to deploy applications and create integrated views of biological systems using the information deposited in DAS repositories. PMID:19919683

  18. Linked Patient-Reported Outcomes Data From Patients With Multiple Sclerosis Recruited on an Open Internet Platform to Health Care Claims Databases Identifies a Representative Population for Real-Life Data Analysis in Multiple Sclerosis.

    PubMed

    Risson, Valery; Ghodge, Bhaskar; Bonzani, Ian C; Korn, Jonathan R; Medin, Jennie; Saraykar, Tanmay; Sengupta, Souvik; Saini, Deepanshu; Olson, Melvin

    2016-09-22

    An enormous amount of information relevant to public health is being generated directly by online communities. To explore the feasibility of creating a dataset that links patient-reported outcomes data, from a Web-based survey of US patients with multiple sclerosis (MS) recruited on open Internet platforms, to health care utilization information from health care claims databases. The dataset was generated by linkage analysis to a broader MS population in the United States using both pharmacy and medical claims data sources. US Facebook users with an interest in MS were alerted to a patient-reported survey by targeted advertisements. Eligibility criteria were diagnosis of MS by a specialist (primary progressive, relapsing-remitting, or secondary progressive), ≥12-month history of disease, age 18-65 years, and commercial health insurance. Participants completed a questionnaire including data on demographic and disease characteristics, current and earlier therapies, relapses, disability, health-related quality of life, and employment status and productivity. A unique anonymous profile was generated for each survey respondent. Each anonymous profile was linked to a number of medical and pharmacy claims datasets in the United States. Linkage rates were assessed and survey respondents' representativeness was evaluated based on differences in the distribution of characteristics between the linked survey population and the general MS population in the claims databases. The advertisement was placed on 1,063,973 Facebook users' pages generating 68,674 clicks, 3719 survey attempts, and 651 successfully completed surveys, of which 440 could be linked to any of the claims databases for 2014 or 2015 (67.6% linkage rate). Overall, no significant differences were found between patients who were linked and not linked for educational status, ethnicity, current or prior disease-modifying therapy (DMT) treatment, or presence of a relapse in the last 12 months. The frequencies of the most common MS symptoms did not differ significantly between linked patients and the general MS population in the databases. Linked patients were slightly younger and less likely to be men than those who were not linkable. Linking patient-reported outcomes data, from a Web-based survey of US patients with MS recruited on open Internet platforms, to health care utilization information from claims databases may enable rapid generation of a large population of representative patients with MS suitable for outcomes analysis.

  19. Applications of Database Machines in Library Systems.

    ERIC Educational Resources Information Center

    Salmon, Stephen R.

    1984-01-01

    Characteristics and advantages of database machines are summarized and their applications to library functions are described. The ability to attach multiple hosts to the same database and flexibility in choosing operating and database management systems for different functions without loss of access to common database are noted. (EJS)

  20. Orthographic and Phonological Neighborhood Databases across Multiple Languages.

    PubMed

    Marian, Viorica

    2017-01-01

    The increased globalization of science and technology and the growing number of bilinguals and multilinguals in the world have made research with multiple languages a mainstay for scholars who study human function and especially those who focus on language, cognition, and the brain. Such research can benefit from large-scale databases and online resources that describe and measure lexical, phonological, orthographic, and semantic information. The present paper discusses currently-available resources and underscores the need for tools that enable measurements both within and across multiple languages. A general review of language databases is followed by a targeted introduction to databases of orthographic and phonological neighborhoods. A specific focus on CLEARPOND illustrates how databases can be used to assess and compare neighborhood information across languages, to develop research materials, and to provide insight into broad questions about language. As an example of how using large-scale databases can answer questions about language, a closer look at neighborhood effects on lexical access reveals that not only orthographic, but also phonological neighborhoods can influence visual lexical access both within and across languages. We conclude that capitalizing upon large-scale linguistic databases can advance, refine, and accelerate scientific discoveries about the human linguistic capacity.

  1. Performance of an open-source heart sound segmentation algorithm on eight independent databases.

    PubMed

    Liu, Chengyu; Springer, David; Clifford, Gari D

    2017-08-01

    Heart sound segmentation is a prerequisite step for the automatic analysis of heart sound signals, facilitating the subsequent identification and classification of pathological events. Recently, hidden Markov model-based algorithms have received increased interest due to their robustness in processing noisy recordings. In this study we aim to evaluate the performance of the recently published logistic regression based hidden semi-Markov model (HSMM) heart sound segmentation method, by using a wider variety of independently acquired data of varying quality. Firstly, we constructed a systematic evaluation scheme based on a new collection of heart sound databases, which we assembled for the PhysioNet/CinC Challenge 2016. This collection includes a total of more than 120 000 s of heart sounds recorded from 1297 subjects (including both healthy subjects and cardiovascular patients) and comprises eight independent heart sound databases sourced from multiple independent research groups around the world. Then, the HSMM-based segmentation method was evaluated using the assembled eight databases. The common evaluation metrics of sensitivity, specificity, accuracy, as well as the [Formula: see text] measure were used. In addition, the effect of varying the tolerance window for determining a correct segmentation was evaluated. The results confirm the high accuracy of the HSMM-based algorithm on a separate test dataset comprised of 102 306 heart sounds. An average [Formula: see text] score of 98.5% for segmenting S1 and systole intervals and 97.2% for segmenting S2 and diastole intervals were observed. The [Formula: see text] score was shown to increases with an increases in the tolerance window size, as expected. The high segmentation accuracy of the HSMM-based algorithm on a large database confirmed the algorithm's effectiveness. The described evaluation framework, combined with the largest collection of open access heart sound data, provides essential resources for evaluators who need to test their algorithms with realistic data and share reproducible results.

  2. Classification of Antibiotic Resistance Patterns of Indicator Bacteria by Discriminant Analysis: Use in Predicting the Source of Fecal Contamination in Subtropical Waters

    PubMed Central

    Harwood, Valerie J.; Whitlock, John; Withington, Victoria

    2000-01-01

    The antibiotic resistance patterns of fecal streptococci and fecal coliforms isolated from domestic wastewater and animal feces were determined using a battery of antibiotics (amoxicillin, ampicillin, cephalothin, chlortetracycline, oxytetracycline, tetracycline, erythromycin, streptomycin, and vancomycin) at four concentrations each. The sources of animal feces included wild birds, cattle, chickens, dogs, pigs, and raccoons. Antibiotic resistance patterns of fecal streptococci and fecal coliforms from known sources were grouped into two separate databases, and discriminant analysis of these patterns was used to establish the relationship between the antibiotic resistance patterns and the bacterial source. The fecal streptococcus and fecal coliform databases classified isolates from known sources with similar accuracies. The average rate of correct classification for the fecal streptococcus database was 62.3%, and that for the fecal coliform database was 63.9%. The sources of fecal streptococci and fecal coliforms isolated from surface waters were identified by discriminant analysis of their antibiotic resistance patterns. Both databases identified the source of indicator bacteria isolated from surface waters directly impacted by septic tank discharges as human. At sample sites selected for relatively low anthropogenic impact, the dominant sources of indicator bacteria were identified as various animals. The antibiotic resistance analysis technique promises to be a useful tool in assessing sources of fecal contamination in subtropical waters, such as those in Florida. PMID:10966379

  3. Development Of New Databases For Tsunami Hazard Analysis In California

    NASA Astrophysics Data System (ADS)

    Wilson, R. I.; Barberopoulou, A.; Borrero, J. C.; Bryant, W. A.; Dengler, L. A.; Goltz, J. D.; Legg, M.; McGuire, T.; Miller, K. M.; Real, C. R.; Synolakis, C.; Uslu, B.

    2009-12-01

    The California Geological Survey (CGS) has partnered with other tsunami specialists to produce two statewide databases to facilitate the evaluation of tsunami hazard products for both emergency response and land-use planning and development. A robust, State-run tsunami deposit database is being developed that compliments and expands on existing databases from the National Geophysical Data Center (global) and the USGS (Cascadia). Whereas these existing databases focus on references or individual tsunami layers, the new State-maintained database concentrates on the location and contents of individual borings/trenches that sample tsunami deposits. These data provide an important observational benchmark for evaluating the results of tsunami inundation modeling. CGS is collaborating with and sharing the database entry form with other states to encourage its continued development beyond California’s coastline so that historic tsunami deposits can be evaluated on a regional basis. CGS is also developing an internet-based, tsunami source scenario database and forum where tsunami source experts and hydrodynamic modelers can discuss the validity of tsunami sources and their contribution to hazard assessments for California and other coastal areas bordering the Pacific Ocean. The database includes all distant and local tsunami sources relevant to California starting with the forty scenarios evaluated during the creation of the recently completed statewide series of tsunami inundation maps for emergency response planning. Factors germane to probabilistic tsunami hazard analyses (PTHA), such as event histories and recurrence intervals, are also addressed in the database and discussed in the forum. Discussions with other tsunami source experts will help CGS determine what additional scenarios should be considered in PTHA for assessing the feasibility of generating products of value to local land-use planning and development.

  4. WormQTLHD—a web database for linking human disease to natural variation data in C. elegans

    PubMed Central

    van der Velde, K. Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L. Basten; Kammenga, Jan E.; Jansen, Ritsert C.; Swertz, Morris A.; Li, Yang

    2014-01-01

    Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism—Caenorhabditis elegans—has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTLHD (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene–disease associations in man. WormQTLHD, available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene–disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench. PMID:24217915

  5. WormQTLHD--a web database for linking human disease to natural variation data in C. elegans.

    PubMed

    van der Velde, K Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L Basten; Kammenga, Jan E; Jansen, Ritsert C; Swertz, Morris A; Li, Yang

    2014-01-01

    Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism-Caenorhabditis elegans-has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTL(HD) (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene-disease associations in man. WormQTL(HD), available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene-disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench.

  6. A Prototype System for Retrieval of Gene Functional Information

    PubMed Central

    Folk, Lillian C.; Patrick, Timothy B.; Pattison, James S.; Wolfinger, Russell D.; Mitchell, Joyce A.

    2003-01-01

    Microarrays allow researchers to gather data about the expression patterns of thousands of genes simultaneously. Statistical analysis can reveal which genes show statistically significant results. Making biological sense of those results requires the retrieval of functional information about the genes thus identified, typically a manual gene-by-gene retrieval of information from various on-line databases. For experiments generating thousands of genes of interest, retrieval of functional information can become a significant bottleneck. To address this issue, we are currently developing a prototype system to automate the process of retrieval of functional information from multiple on-line sources. PMID:14728346

  7. Setting Up the JBrowse Genome Browser

    PubMed Central

    Skinner, Mitchell E; Holmes, Ian H

    2010-01-01

    JBrowse is a web-based tool for visualizing genomic data. Unlike most other web-based genome browsers, JBrowse exploits the capabilities of the user's web browser to make scrolling and zooming fast and smooth. It supports the browsers used by almost all internet users, and is relatively simple to install. JBrowse can utilize multiple types of data in a variety of common genomic data formats, including genomic feature data in bioperl databases, GFF files, and BED files, and quantitative data in wiggle files. This unit describes how to obtain the JBrowse software, set it up on a Linux or Mac OS X computer running as a web server and incorporate genome annotation data from multiple sources into JBrowse. After completing the protocols described in this unit, the reader will have a web site that other users can visit to browse the genomic data. PMID:21154710

  8. Canada in 3D - Toward a Sustainable 3D Model for Canadian Geology from Diverse Data Sources

    NASA Astrophysics Data System (ADS)

    Brodaric, B.; Pilkington, M.; Snyder, D. B.; St-Onge, M. R.; Russell, H.

    2015-12-01

    Many big science issues span large areas and require data from multiple heterogeneous sources, for example climate change, resource management, and hazard mitigation. Solutions to these issues can significantly benefit from access to a consistent and integrated geological model that would serve as a framework. However, such a model is absent for most large countries including Canada, due to the size of the landmass and the fragmentation of the source data into institutional and disciplinary silos. To overcome these barriers, the "Canada in 3D" (C3D) pilot project was recently launched by the Geological Survey of Canada. C3D is designed to be evergreen, multi-resolution, and inter-disciplinary: (a) it is to be updated regularly upon acquisition of new data; (b) portions vary in resolution and will initially consist of four layers (surficial, sedimentary, crystalline, and mantle) with intermediary patches of higher-resolution fill; and (c) a variety of independently managed data sources are providing inputs, such as geophysical, 3D and 2D geological models, drill logs, and others. Notably, scalability concerns dictate a decentralized and interoperable approach, such that only key control objects, denoting anchors for the modeling process, are imported into the C3D database while retaining provenance links to original sources. The resultant model is managed in the database, contains full modeling provenance as well as links to detailed information on rock units, and is to be visualized in desktop and online environments. It is anticipated that C3D will become the authoritative state of knowledge for the geology of Canada at a national scale.

  9. The Longhorn Array Database (LAD): An Open-Source, MIAME compliant implementation of the Stanford Microarray Database (SMD)

    PubMed Central

    Killion, Patrick J; Sherlock, Gavin; Iyer, Vishwanath R

    2003-01-01

    Background The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms. Description The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at Conclusions Our development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data. PMID:12930545

  10. Oak Ridge Reservation Environmental Protection Rad Neshaps Radionuclide Inventory Web Database and Rad Neshaps Source and Dose Database

    DOE PAGES

    Scofield, Patricia A.; Smith, Linda Lenell; Johnson, David N.

    2017-07-01

    The U.S. Environmental Protection Agency promulgated national emission standards for emissions of radionuclides other than radon from US Department of Energy facilities in Chapter 40 of the Code of Federal Regulations (CFR) 61, Subpart H. This regulatory standard limits the annual effective dose that any member of the public can receive from Department of Energy facilities to 0.1 mSv. As defined in the preamble of the final rule, all of the facilities on the Oak Ridge Reservation, i.e., the Y–12 National Security Complex, Oak Ridge National Laboratory, East Tennessee Technology Park, and any other U.S. Department of Energy operations onmore » Oak Ridge Reservation, combined, must meet the annual dose limit of 0.1 mSv. At Oak Ridge National Laboratory, there are monitored sources and numerous unmonitored sources. To maintain radiological source and inventory information for these unmonitored sources, e.g., laboratory hoods, equipment exhausts, and room exhausts not currently venting to monitored stacks on the Oak Ridge National Laboratory campus, the Environmental Protection Rad NESHAPs Inventory Web Database was developed. This database is updated annually and is used to compile emissions data for the annual Radionuclide National Emission Standards for Hazardous Air Pollutants (Rad NESHAPs) report required by 40 CFR 61.94. It also provides supporting documentation for facility compliance audits. In addition, a Rad NESHAPs source and dose database was developed to import the source and dose summary data from Clean Air Act Assessment Package—1988 computer model files. As a result, this database provides Oak Ridge Reservation and facility-specific source inventory; doses associated with each source and facility; and total doses for the Oak Ridge Reservation dose.« less

  11. Oak Ridge Reservation Environmental Protection Rad Neshaps Radionuclide Inventory Web Database and Rad Neshaps Source and Dose Database.

    PubMed

    Scofield, Patricia A; Smith, Linda L; Johnson, David N

    2017-07-01

    The U.S. Environmental Protection Agency promulgated national emission standards for emissions of radionuclides other than radon from US Department of Energy facilities in Chapter 40 of the Code of Federal Regulations (CFR) 61, Subpart H. This regulatory standard limits the annual effective dose that any member of the public can receive from Department of Energy facilities to 0.1 mSv. As defined in the preamble of the final rule, all of the facilities on the Oak Ridge Reservation, i.e., the Y-12 National Security Complex, Oak Ridge National Laboratory, East Tennessee Technology Park, and any other U.S. Department of Energy operations on Oak Ridge Reservation, combined, must meet the annual dose limit of 0.1 mSv. At Oak Ridge National Laboratory, there are monitored sources and numerous unmonitored sources. To maintain radiological source and inventory information for these unmonitored sources, e.g., laboratory hoods, equipment exhausts, and room exhausts not currently venting to monitored stacks on the Oak Ridge National Laboratory campus, the Environmental Protection Rad NESHAPs Inventory Web Database was developed. This database is updated annually and is used to compile emissions data for the annual Radionuclide National Emission Standards for Hazardous Air Pollutants (Rad NESHAPs) report required by 40 CFR 61.94. It also provides supporting documentation for facility compliance audits. In addition, a Rad NESHAPs source and dose database was developed to import the source and dose summary data from Clean Air Act Assessment Package-1988 computer model files. This database provides Oak Ridge Reservation and facility-specific source inventory; doses associated with each source and facility; and total doses for the Oak Ridge Reservation dose.

  12. Oak Ridge Reservation Environmental Protection Rad Neshaps Radionuclide Inventory Web Database and Rad Neshaps Source and Dose Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Scofield, Patricia A.; Smith, Linda Lenell; Johnson, David N.

    The U.S. Environmental Protection Agency promulgated national emission standards for emissions of radionuclides other than radon from US Department of Energy facilities in Chapter 40 of the Code of Federal Regulations (CFR) 61, Subpart H. This regulatory standard limits the annual effective dose that any member of the public can receive from Department of Energy facilities to 0.1 mSv. As defined in the preamble of the final rule, all of the facilities on the Oak Ridge Reservation, i.e., the Y–12 National Security Complex, Oak Ridge National Laboratory, East Tennessee Technology Park, and any other U.S. Department of Energy operations onmore » Oak Ridge Reservation, combined, must meet the annual dose limit of 0.1 mSv. At Oak Ridge National Laboratory, there are monitored sources and numerous unmonitored sources. To maintain radiological source and inventory information for these unmonitored sources, e.g., laboratory hoods, equipment exhausts, and room exhausts not currently venting to monitored stacks on the Oak Ridge National Laboratory campus, the Environmental Protection Rad NESHAPs Inventory Web Database was developed. This database is updated annually and is used to compile emissions data for the annual Radionuclide National Emission Standards for Hazardous Air Pollutants (Rad NESHAPs) report required by 40 CFR 61.94. It also provides supporting documentation for facility compliance audits. In addition, a Rad NESHAPs source and dose database was developed to import the source and dose summary data from Clean Air Act Assessment Package—1988 computer model files. As a result, this database provides Oak Ridge Reservation and facility-specific source inventory; doses associated with each source and facility; and total doses for the Oak Ridge Reservation dose.« less

  13. A database de-identification framework to enable direct queries on medical data for secondary use.

    PubMed

    Erdal, B S; Liu, J; Ding, J; Chen, J; Marsh, C B; Kamal, J; Clymer, B D

    2012-01-01

    To qualify the use of patient clinical records as non-human-subject for research purpose, electronic medical record data must be de-identified so there is minimum risk to protected health information exposure. This study demonstrated a robust framework for structured data de-identification that can be applied to any relational data source that needs to be de-identified. Using a real world clinical data warehouse, a pilot implementation of limited subject areas were used to demonstrate and evaluate this new de-identification process. Query results and performances are compared between source and target system to validate data accuracy and usability. The combination of hashing, pseudonyms, and session dependent randomizer provides a rigorous de-identification framework to guard against 1) source identifier exposure; 2) internal data analyst manually linking to source identifiers; and 3) identifier cross-link among different researchers or multiple query sessions by the same researcher. In addition, a query rejection option is provided to refuse queries resulting in less than preset numbers of subjects and total records to prevent users from accidental subject identification due to low volume of data. This framework does not prevent subject re-identification based on prior knowledge and sequence of events. Also, it does not deal with medical free text de-identification, although text de-identification using natural language processing can be included due its modular design. We demonstrated a framework resulting in HIPAA Compliant databases that can be directly queried by researchers. This technique can be augmented to facilitate inter-institutional research data sharing through existing middleware such as caGrid.

  14. Data Mining to Chart the Arctic: Analysis of Approaches to Incorporate Outside Source Data into NOAA Office of Coast Survey Workflow

    NASA Astrophysics Data System (ADS)

    Rennoll, V.

    2016-02-01

    The National Centers for Environmental Information provide public access to a wealth of seafloor mapping data, both from National Ocean Service hydrographic surveys and outside source collections. Utilizing the outside source data to improve nautical charts created by the National Oceanic and Atmospheric Administration (NOAA) is an appealing alternative to traditional surveys, largely in areas with significant data gaps where hydrographic surveys are not planned. However, much of the outside data are collected in transit lines and lack traditional overlapping main scheme lines and crosslines. Spanning multiple years and vessels, these transit line data collections were obtained using disparate operating procedures and have inconsistent qualities. Here, a workflow was developed to ingest these variable depth data within a defined region by assessing their quality and utility for nautical charting. The workflow was evaluated with a navigationally significant area in the Bering Sea, where bathymetric data collected from ten vessels over a period of twelve years were available. The outside data were shown to be of sufficient quality through comparisons with existing NOAA surveys and then used to demonstrate where the data could provide new or updated information on nautical charts, and provide reconnaissance for future hydrographic planning. The utility assessment of the data, however, was hindered by lack of a verified survey-scale sounding database, against which the outside source data could be compared. Having developed the workflow, it is recommended that further outside data is ingested by NOAA's Office of Coast Survey and that a database is developed with full-scale chart soundings for outside data comparisons.

  15. CDM analysis

    NASA Technical Reports Server (NTRS)

    Larson, Robert E.; Mcentire, Paul L.; Oreilly, John G.

    1993-01-01

    The C Data Manager (CDM) is an advanced tool for creating an object-oriented database and for processing queries related to objects stored in that database. The CDM source code was purchased and will be modified over the course of the Arachnid project. In this report, the modified CDM is referred to as MCDM. Using MCDM, a detailed series of experiments was designed and conducted on a Sun Sparcstation. The primary results and analysis of the CDM experiment are provided in this report. The experiments involved creating the Long-form Faint Source Catalog (LFSC) database and then analyzing it with respect to following: (1) the relationships between the volume of data and the time required to create a database; (2) the storage requirements of the database files; and (3) the properties of query algorithms. The effort focused on defining, implementing, and analyzing seven experimental scenarios: (1) find all sources by right ascension--RA; (2) find all sources by declination--DEC; (3) find all sources in the right ascension interval--RA1, RA2; (4) find all sources in the declination interval--DEC1, DEC2; (5) find all sources in the rectangle defined by--RA1, RA2, DEC1, DEC2; (6) find all sources that meet certain compound conditions; and (7) analyze a variety of query algorithms. Throughout this document, the numerical results obtained from these scenarios are reported; conclusions are presented at the end of the document.

  16. A Web-based open-source database for the distribution of hyperspectral signatures

    NASA Astrophysics Data System (ADS)

    Ferwerda, J. G.; Jones, S. D.; Du, Pei-Jun

    2006-10-01

    With the coming of age of field spectroscopy as a non-destructive means to collect information on the physiology of vegetation, there is a need for storage of signatures, and, more importantly, their metadata. Without the proper organisation of metadata, the signatures itself become limited. In order to facilitate re-distribution of data, a database for the storage & distribution of hyperspectral signatures and their metadata was designed. The database was built using open-source software, and can be used by the hyperspectral community to share their data. Data is uploaded through a simple web-based interface. The database recognizes major file-formats by ASD, GER and International Spectronics. The database source code is available for download through the hyperspectral.info web domain, and we happily invite suggestion for additions & modification for the database to be submitted through the online forums on the same website.

  17. Preliminary geologic map of the Oat Mountain 7.5' quadrangle, Southern California: a digital database

    USGS Publications Warehouse

    Yerkes, R.F.; Campbell, Russell H.

    1995-01-01

    This database, identified as "Preliminary Geologic Map of the Oat Mountain 7.5' Quadrangle, southern California: A Digital Database," has been approved for release and publication by the Director of the USGS. Although this database has been reviewed and is substantially complete, the USGS reserves the right to revise the data pursuant to further analysis and review. This database is released on condition that neither the USGS nor the U. S. Government may be held liable for any damages resulting from its use. This digital map database is compiled from previously published sources combined with some new mapping and modifications in nomenclature. The geologic map database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U. S. Geological Survey. For detailed descriptions of the units, their stratigraphic relations and sources of geologic mapping consult Yerkes and Campbell (1993). More specific information about the units may be available in the original sources.

  18. The plant phenological online database (PPODB): an online database for long-term phenological data

    NASA Astrophysics Data System (ADS)

    Dierenbach, Jonas; Badeck, Franz-W.; Schaber, Jörg

    2013-09-01

    We present an online database that provides unrestricted and free access to over 16 million plant phenological observations from over 8,000 stations in Central Europe between the years 1880 and 2009. Unique features are (1) a flexible and unrestricted access to a full-fledged database, allowing for a wide range of individual queries and data retrieval, (2) historical data for Germany before 1951 ranging back to 1880, and (3) more than 480 curated long-term time series covering more than 100 years for individual phenological phases and plants combined over Natural Regions in Germany. Time series for single stations or Natural Regions can be accessed through a user-friendly graphical geo-referenced interface. The joint databases made available with the plant phenological database PPODB render accessible an important data source for further analyses of long-term changes in phenology. The database can be accessed via www.ppodb.de .

  19. Database Quality and Access Issues Relevant to Research Using Anesthesia Information Management System Data.

    PubMed

    Epstein, Richard H; Dexter, Franklin

    2018-07-01

    For this special article, we reviewed the computer code, used to extract the data, and the text of all 47 studies published between January 2006 and August 2017 using anesthesia information management system (AIMS) data from Thomas Jefferson University Hospital (TJUH). Data from this institution were used in the largest number (P = .0007) of papers describing the use of AIMS published in this time frame. The AIMS was replaced in April 2017, making this finite sample finite. The objective of the current article was to identify factors that made TJUH successful in publishing anesthesia informatics studies. We examined the structured query language used for each study to examine the extent to which databases outside of the AIMS were used. We examined data quality from the perspectives of completeness, correctness, concordance, plausibility, and currency. Our results were that most could not have been completed without external database sources (36/47, 76.6%; P = .0003 compared with 50%). The operating room management system was linked to the AIMS and was used significantly more frequently (26/36, 72%) than other external sources. Access to these external data sources was provided, allowing exploration of data quality. The TJUH AIMS used high-resolution timestamps (to the nearest 3 milliseconds) and created audit tables to track changes to clinical documentation. Automatic data were recorded at 1-minute intervals and were not editable; data cleaning occurred during analysis. Few paired events with an expected order were out of sequence. Although most data elements were of high quality, there were notable exceptions, such as frequent missing values for estimated blood loss, height, and weight. Some values were duplicated with different units, and others were stored in varying locations. Our conclusions are that linking the TJUH AIMS to the operating room management system was a critical step in enabling publication of multiple studies using AIMS data. Access to this and other external databases by analysts with a high degree of anesthesia domain knowledge was necessary to be able to assess the quality of the AIMS data and ensure that the data pulled for studies were appropriate. For anesthesia departments seeking to increase their academic productivity using their AIMS as a data source, our experiences may provide helpful guidance.

  20. Identifying the effective evidence sources to use in developing Clinical Guidelines for Acute Stroke Management: lived experiences of the search specialist and project manager.

    PubMed

    Parkhill, Anne; Hill, Kelvin

    2009-03-01

    The Australian National Stroke Foundation appointed a search specialist to find the best available evidence for the second edition of its Clinical Guidelines for Acute Stroke Management. To identify the relative effectiveness of differing evidence sources for the guideline update. We searched and reviewed references from five valid evidence sources for clinical and economic questions: (i) electronic databases; (ii) reference lists of relevant systematic reviews, guidelines, and/or primary studies; (iii) table of contents of a number of key journals for the last 6 months; (iv) internet/grey literature; and (v) experts. Reference sources were recorded, quantified, and analysed. In the clinical portion of the guidelines document, there was a greater use of previous knowledge and sources other than electronic databases for evidence, while there was a greater use of electronic databases for the economic section. The results confirmed that searchers need to be aware of the context and range of sources for evidence searches. For best available evidence, searchers cannot rely solely on electronic databases and need to encompass many different media and sources.

  1. MACSIMS : multiple alignment of complete sequences information management system

    PubMed Central

    Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier

    2006-01-01

    Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820

  2. Monitoring Wildlife Interactions with Their Environment: An Interdisciplinary Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Charles-Smith, Lauren E.; Domnguez, Ignacio X.; Fornaro, Robert J.

    In a rapidly changing world, wildlife ecologists strive to correctly model and predict complex relationships between animals and their environment, which facilitates management decisions impacting public policy to conserve and protect delicate ecosystems. Recent advances in monitoring systems span scientific domains, including animal and weather monitoring devices and landscape classification mapping techniques. The current challenge is how to combine and use detailed output from various sources to address questions spanning multiple disciplines. WolfScout wildlife and weather tracking system is a software tool capable of filling this niche. WolfScout automates integration of the latest technological advances in wildlife GPS collars, weathermore » stations, drought conditions, and severe weather reports, and animal demographic information. The WolfScout database stores a variety of classified landscape maps including natural and manmade features. Additionally, WolfScout’s spatial database management system allows users to calculate distances between animals’ location and landscape characteristics, which are linked to the best approximation of environmental conditions at the animal’s location during the interaction. Through a secure website, data are exported in formats compatible with multiple software programs including R and ArcGIS. The WolfScout design promotes interoperability in data, between researchers, and software applications while standardizing analyses of animal interactions with their environment.« less

  3. The Impact of Online Bibliographic Databases on Teaching and Research in Political Science.

    ERIC Educational Resources Information Center

    Reichel, Mary

    The availability of online bibliographic databases greatly facilitates literature searching in political science. The advantages to searching databases online include combination of concepts, comprehensiveness, multiple database searching, free-text searching, currency, current awareness services, document delivery service, and convenience.…

  4. Children's daily well-being: The role of mothers', teachers', and siblings' autonomy support and psychological control.

    PubMed

    van der Kaap-Deeder, Jolene; Vansteenkiste, Maarten; Soenens, Bart; Mabbe, Elien

    2017-02-01

    This study examined the unique relations between multiple sources (i.e., mothers, teachers, and siblings) of perceived daily autonomy support and psychological control and children's basic psychological needs and well-being. During 5 consecutive days, 2 children from 154 families (Mage youngest child = 8.54 years; SD = .89 and Mage oldest child = 10.38 years; SD = .87) provided daily ratings of the study variables. Multilevel analyses showed that each of the sources of perceived autonomy support and psychological control related uniquely to changes in daily well-being and ill-being. These associations were mediated by experienced psychological need satisfaction and frustration, respectively. Overall, the findings testify to the dynamic role of autonomy support and psychological control in children's development. Implications for future research are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  5. The TERRA-PNW Dataset: A New Source for Standardized Plant Trait, Forest Carbon Cycling, and Soil Properties Measurements from the Pacific Northwest US, 2000-2014.

    NASA Astrophysics Data System (ADS)

    Berner, L. T.; Law, B. E.

    2015-12-01

    Plant traits include physiological, morphological, and biogeochemical characteristics that in combination determine a species sensitivity to environmental conditions. Standardized, co-located, and geo-referenced species- and plot-level measurements are needed to address variation in species sensitivity to climate change impacts and for ecosystem process model development, parameterization and testing. We present a new database of plant trait, forest carbon cycling, and soil property measurements derived from multiple TERRA-PNW projects in the Pacific Northwest US, spanning 2000-2014. The database includes measurements from over 200 forest plots across Oregon and northern California, where the data were explicitly collected for scaling and modeling regional terrestrial carbon processes with models such as Biome-BGC and the Community Land Model. Some of the data are co-located at AmeriFlux sites in the region. The database currently contains leaf trait measurements (specific leaf area, leaf longevity, leaf carbon and nitrogen) from over 1,200 branch samples and 30 species, as well as plot-level biomass and productivity components, and soil carbon and nitrogen. Standardized protocols were used across projects, as summarized in an FAO protocols document. The database continues to expand and will include agricultural crops. The database will be hosted by the Oak Ridge National Laboratory (ORLN) Distributed Active Archive Center (DAAC). We hope that other regional databases will become publicly available to help enable Earth System Modeling to simulate species-level sensitivity to climate at regional to global scales.

  6. Resolving the problem of multiple accessions of the same transcript deposited across various public databases.

    PubMed

    Weirick, Tyler; John, David; Uchida, Shizuka

    2017-03-01

    Maintaining the consistency of genomic annotations is an increasingly complex task because of the iterative and dynamic nature of assembly and annotation, growing numbers of biological databases and insufficient integration of annotations across databases. As information exchange among databases is poor, a 'novel' sequence from one reference annotation could be annotated in another. Furthermore, relationships to nearby or overlapping annotated transcripts are even more complicated when using different genome assemblies. To better understand these problems, we surveyed current and previous versions of genomic assemblies and annotations across a number of public databases containing long noncoding RNA. We identified numerous discrepancies of transcripts regarding their genomic locations, transcript lengths and identifiers. Further investigation showed that the positional differences between reference annotations of essentially the same transcript could lead to differences in its measured expression at the RNA level. To aid in resolving these problems, we present the algorithm 'Universal Genomic Accession Hash (UGAHash)' and created an open source web tool to encourage the usage of the UGAHash algorithm. The UGAHash web tool (http://ugahash.uni-frankfurt.de) can be accessed freely without registration. The web tool allows researchers to generate Universal Genomic Accessions for genomic features or to explore annotations deposited in the public databases of the past and present versions. We anticipate that the UGAHash web tool will be a valuable tool to check for the existence of transcripts before judging the newly discovered transcripts as novel. © The Author 2016. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  7. Design of a Multi Dimensional Database for the Archimed DataWarehouse.

    PubMed

    Bréant, Claudine; Thurler, Gérald; Borst, François; Geissbuhler, Antoine

    2005-01-01

    The Archimed data warehouse project started in 1993 at the Geneva University Hospital. It has progressively integrated seven data marts (or domains of activity) archiving medical data such as Admission/Discharge/Transfer (ADT) data, laboratory results, radiology exams, diagnoses, and procedure codes. The objective of the Archimed data warehouse is to facilitate the access to an integrated and coherent view of patient medical in order to support analytical activities such as medical statistics, clinical studies, retrieval of similar cases and data mining processes. This paper discusses three principal design aspects relative to the conception of the database of the data warehouse: 1) the granularity of the database, which refers to the level of detail or summarization of data, 2) the database model and architecture, describing how data will be presented to end users and how new data is integrated, 3) the life cycle of the database, in order to ensure long term scalability of the environment. Both, the organization of patient medical data using a standardized elementary fact representation and the use of the multi dimensional model have proved to be powerful design tools to integrate data coming from the multiple heterogeneous database systems part of the transactional Hospital Information System (HIS). Concurrently, the building of the data warehouse in an incremental way has helped to control the evolution of the data content. These three design aspects bring clarity and performance regarding data access. They also provide long term scalability to the system and resilience to further changes that may occur in source systems feeding the data warehouse.

  8. A Review of Stellar Abundance Databases and the Hypatia Catalog Database

    NASA Astrophysics Data System (ADS)

    Hinkel, Natalie Rose

    2018-01-01

    The astronomical community is interested in elements from lithium to thorium, from solar twins to peculiarities of stellar evolution, because they give insight into different regimes of star formation and evolution. However, while some trends between elements and other stellar or planetary properties are well known, many other trends are not as obvious and are a point of conflict. For example, stars that host giant planets are found to be consistently enriched in iron, but the same cannot be definitively said for any other element. Therefore, it is time to take advantage of large stellar abundance databases in order to better understand not only the large-scale patterns, but also the more subtle, small-scale trends within the data.In this overview to the special session, I will present a review of large stellar abundance databases that are both currently available (i.e. RAVE, APOGEE) and those that will soon be online (i.e. Gaia-ESO, GALAH). Additionally, I will discuss the Hypatia Catalog Database (www.hypatiacatalog.com) -- which includes abundances from individual literature sources that observed stars within 150pc. The Hypatia Catalog currently contains 72 elements as measured within ~6000 stars, with a total of ~240,000 unique abundance determinations. The online database offers a variety of solar normalizations, stellar properties, and planetary properties (where applicable) that can all be viewed through multiple interactive plotting interfaces as well as in a tabular format. By analyzing stellar abundances for large populations of stars and from a variety of different perspectives, a wealth of information can be revealed on both large and small scales.

  9. Querying and Computing with BioCyc Databases

    PubMed Central

    Krummenacker, Markus; Paley, Suzanne; Mueller, Lukas; Yan, Thomas; Karp, Peter D.

    2006-01-01

    Summary We describe multiple methods for accessing and querying the complex and integrated cellular data in the BioCyc family of databases: access through multiple file formats, access through Application Program Interfaces (APIs) for LISP, Perl and Java, and SQL access through the BioWarehouse relational database. Availability The Pathway Tools software and 20 BioCyc DBs in Tiers 1 and 2 are freely available to academic users; fees apply to some types of commercial use. For download instructions see http://BioCyc.org/download.shtml PMID:15961440

  10. Volcanoes of the World: Reconfiguring a scientific database to meet new goals and expectations

    NASA Astrophysics Data System (ADS)

    Venzke, Edward; Andrews, Ben; Cottrell, Elizabeth

    2015-04-01

    The Smithsonian Global Volcanism Program's (GVP) database of Holocene volcanoes and eruptions, Volcanoes of the World (VOTW), originated in 1971, and was largely populated with content from the IAVCEI Catalog of Volcanoes of Active Volcanoes and some independent datasets. Volcanic activity reported by Smithsonian's Bulletin of the Global Volcanism Network and USGS/SI Weekly Activity Reports (and their predecessors), published research, and other varied sources has expanded the database significantly over the years. Three editions of the VOTW were published in book form, creating a catalog with new ways to display data that included regional directories, a gazetteer, and a 10,000-year chronology of eruptions. The widespread dissemination of the data in electronic media since the first GVP website in 1995 has created new challenges and opportunities for this unique collection of information. To better meet current and future goals and expectations, we have recently transitioned VOTW into a SQL Server database. This process included significant schema changes to the previous relational database, data auditing, and content review. We replaced a disparate, confusing, and changeable volcano numbering system with unique and permanent volcano numbers. We reconfigured structures for recording eruption data to allow greater flexibility in describing the complexity of observed activity, adding in the ability to distinguish episodes within eruptions (in time and space) and events (including dates) rather than characteristics that take place during an episode. We have added a reference link field in multiple tables to enable attribution of sources at finer levels of detail. We now store and connect synonyms and feature names in a more consistent manner, which will allow for morphological features to be given unique numbers and linked to specific eruptions or samples; if the designated overall volcano name is also a morphological feature, it is then also listed and described as that feature. One especially significant audit involved re-evaluating the categories of evidence used to include a volcano in the Holocene list, and reviewing in detail the entries in low-certainty categories. Concurrently, we developed a new data entry system that may in the future allow trusted users outside of Smithsonian to input data into VOTW. A redesigned website now provides new search tools and data download options. We are collaborating with organizations that manage volcano and eruption databases, physical sample databases, and geochemical databases to allow real-time connections and complex queries. VOTW serves the volcanological community by providing a clear and consistent core database of distinctly identified volcanoes and eruptions to advance goals in research, civil defense, and public outreach.

  11. Integrated database for rapid mass movements in Norway

    NASA Astrophysics Data System (ADS)

    Jaedicke, C.; Lied, K.; Kronholm, K.

    2009-03-01

    Rapid gravitational slope mass movements include all kinds of short term relocation of geological material, snow or ice. Traditionally, information about such events is collected separately in different databases covering selected geographical regions and types of movement. In Norway the terrain is susceptible to all types of rapid gravitational slope mass movements ranging from single rocks hitting roads and houses to large snow avalanches and rock slides where entire mountainsides collapse into fjords creating flood waves and endangering large areas. In addition, quick clay slides occur in desalinated marine sediments in South Eastern and Mid Norway. For the authorities and inhabitants of endangered areas, the type of threat is of minor importance and mitigation measures have to consider several types of rapid mass movements simultaneously. An integrated national database for all types of rapid mass movements built around individual events has been established. Only three data entries are mandatory: time, location and type of movement. The remaining optional parameters enable recording of detailed information about the terrain, materials involved and damages caused. Pictures, movies and other documentation can be uploaded into the database. A web-based graphical user interface has been developed allowing new events to be entered, as well as editing and querying for all events. An integration of the database into a GIS system is currently under development. Datasets from various national sources like the road authorities and the Geological Survey of Norway were imported into the database. Today, the database contains 33 000 rapid mass movement events from the last five hundred years covering the entire country. A first analysis of the data shows that the most frequent type of recorded rapid mass movement is rock slides and snow avalanches followed by debris slides in third place. Most events are recorded in the steep fjord terrain of the Norwegian west coast, but major events are recorded all over the country. Snow avalanches account for most fatalities, while large rock slides causing flood waves and huge quick clay slides are the most damaging individual events in terms of damage to infrastructure and property and for causing multiple fatalities. The quality of the data is strongly influenced by the personal engagement of local observers and varying observation routines. This database is a unique source for statistical analysis including, risk analysis and the relation between rapid mass movements and climate. The database of rapid mass movement events will also facilitate validation of national hazard and risk maps.

  12. Impacts and Viability of Open Source Software on Earth Science Metadata Clearing House and Service Registry Applications

    NASA Astrophysics Data System (ADS)

    Pilone, D.; Cechini, M. F.; Mitchell, A.

    2011-12-01

    Earth Science applications typically deal with large amounts of data and high throughput rates, if not also high transaction rates. While Open Source is frequently used for smaller scientific applications, large scale, highly available systems frequently fall back to "enterprise" class solutions like Oracle RAC or commercial grade JEE Application Servers. NASA's Earth Observing System Data and Information System (EOSDIS) provides end-to-end capabilities for managing NASA's Earth science data from multiple sources - satellites, aircraft, field measurements, and various other programs. A core capability of EOSDIS, the Earth Observing System (EOS) Clearinghouse (ECHO), is a highly available search and order clearinghouse of over 100 million pieces of science data that has evolved from its early R&D days to a fully operational system. Over the course of this maturity ECHO has largely transitioned from commercial frameworks, databases, and operating systems to Open Source solutions...and in some cases, back. In this talk we discuss the progression of our technological solutions and our lessons learned in the areas of: ? High performance, large scale searching solutions ? GeoSpatial search capabilities and dealing with multiple coordinate systems ? Search and storage of variable format source (science) data ? Highly available deployment solutions ? Scalable (elastic) solutions to visual searching and image handling Throughout the evolution of the ECHO system we have had to evaluate solutions with respect to performance, cost, developer productivity, reliability, and maintainability in the context of supporting global science users. Open Source solutions have played a significant role in our architecture and development but several critical commercial components remain (or have been reinserted) to meet our operational demands.

  13. Toward multimodal signal detection of adverse drug reactions.

    PubMed

    Harpaz, Rave; DuMouchel, William; Schuemie, Martijn; Bodenreider, Olivier; Friedman, Carol; Horvitz, Eric; Ripple, Anna; Sorbello, Alfred; White, Ryen W; Winnenburg, Rainer; Shah, Nigam H

    2017-12-01

    Improving mechanisms to detect adverse drug reactions (ADRs) is key to strengthening post-marketing drug safety surveillance. Signal detection is presently unimodal, relying on a single information source. Multimodal signal detection is based on jointly analyzing multiple information sources. Building on, and expanding the work done in prior studies, the aim of the article is to further research on multimodal signal detection, explore its potential benefits, and propose methods for its construction and evaluation. Four data sources are investigated; FDA's adverse event reporting system, insurance claims, the MEDLINE citation database, and the logs of major Web search engines. Published methods are used to generate and combine signals from each data source. Two distinct reference benchmarks corresponding to well-established and recently labeled ADRs respectively are used to evaluate the performance of multimodal signal detection in terms of area under the ROC curve (AUC) and lead-time-to-detection, with the latter relative to labeling revision dates. Limited to our reference benchmarks, multimodal signal detection provides AUC improvements ranging from 0.04 to 0.09 based on a widely used evaluation benchmark, and a comparative added lead-time of 7-22 months relative to labeling revision dates from a time-indexed benchmark. The results support the notion that utilizing and jointly analyzing multiple data sources may lead to improved signal detection. Given certain data and benchmark limitations, the early stage of development, and the complexity of ADRs, it is currently not possible to make definitive statements about the ultimate utility of the concept. Continued development of multimodal signal detection requires a deeper understanding the data sources used, additional benchmarks, and further research on methods to generate and synthesize signals. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Healthcare Databases in Thailand and Japan: Potential Sources for Health Technology Assessment Research.

    PubMed

    Saokaew, Surasak; Sugimoto, Takashi; Kamae, Isao; Pratoomsoot, Chayanin; Chaiyakunapruk, Nathorn

    2015-01-01

    Health technology assessment (HTA) has been continuously used for value-based healthcare decisions over the last decade. Healthcare databases represent an important source of information for HTA, which has seen a surge in use in Western countries. Although HTA agencies have been established in Asia-Pacific region, application and understanding of healthcare databases for HTA is rather limited. Thus, we reviewed existing databases to assess their potential for HTA in Thailand where HTA has been used officially and Japan where HTA is going to be officially introduced. Existing healthcare databases in Thailand and Japan were compiled and reviewed. Databases' characteristics e.g. name of database, host, scope/objective, time/sample size, design, data collection method, population/sample, and variables were described. Databases were assessed for its potential HTA use in terms of safety/efficacy/effectiveness, social/ethical, organization/professional, economic, and epidemiological domains. Request route for each database was also provided. Forty databases- 20 from Thailand and 20 from Japan-were included. These comprised of national censuses, surveys, registries, administrative data, and claimed databases. All databases were potentially used for epidemiological studies. In addition, data on mortality, morbidity, disability, adverse events, quality of life, service/technology utilization, length of stay, and economics were also found in some databases. However, access to patient-level data was limited since information about the databases was not available on public sources. Our findings have shown that existing databases provided valuable information for HTA research with limitation on accessibility. Mutual dialogue on healthcare database development and usage for HTA among Asia-Pacific region is needed.

  15. Binary catalogue of exoplanets

    NASA Astrophysics Data System (ADS)

    Schwarz, Richard; Bazso, Akos; Zechner, Renate; Funk, Barbara

    2016-02-01

    Since 1995 there is a database which list most of the known exoplanets (The Extrasolar Planets Encyclopaedia at http://exoplanet.eu/). With the growing number of detected exoplanets in binary and multiple star systems it became more important to mark and to separate them into a new database, which is not available in the Extrasolar Planets Encyclopaedia. Therefore we established an online database (which can be found at: http://www.univie.ac.at/adg/schwarz/multiple.html) for all known exoplanets in binary star systems and in addition for multiple star systems, which will be updated regularly and linked to the Extrasolar Planets Encyclopaedia. The binary catalogue of exoplanets is available online as data file and can be used for statistical purposes. Our database is divided into two parts: the data of the stars and the planets, given in a separate list. We describe also the different parameters of the exoplanetary systems and present some applications.

  16. The ATLAS Software Installation System v2: a highly available system to install and validate Grid and Cloud sites via Panda

    NASA Astrophysics Data System (ADS)

    De Salvo, A.; Kataoka, M.; Sanchez Pineda, A.; Smirnov, Y.

    2015-12-01

    The ATLAS Installation System v2 is the evolution of the original system, used since 2003. The original tool has been completely re-designed in terms of database backend and components, adding support for submission to multiple backends, including the original Workload Management Service (WMS) and the new PanDA modules. The database engine has been changed from plain MySQL to Galera/Percona and the table structure has been optimized to allow a full High-Availability (HA) solution over Wide Area Network. The servlets, running on each frontend, have been also decoupled from local settings, to allow an easy scalability of the system, including the possibility of an HA system with multiple sites. The clients can also be run in multiple copies and in different geographical locations, and take care of sending the installation and validation jobs to the target Grid or Cloud sites. Moreover, the Installation Database is used as source of parameters by the automatic agents running in CVMFS, in order to install the software and distribute it to the sites. The system is in production for ATLAS since 2013, having as main sites in HA the INFN Roma Tier 2 and the CERN Agile Infrastructure. The Light Job Submission Framework for Installation (LJSFi) v2 engine is directly interfacing with PanDA for the Job Management, the Atlas Grid Information System (AGIS) for the site parameter configurations, and CVMFS for both core components and the installation of the software itself. LJSFi2 is also able to use other plugins, and is essentially Virtual Organization (VO) agnostic, so can be directly used and extended to cope with the requirements of any Grid or Cloud enabled VO. In this work we will present the architecture, performance, status and possible evolutions to the system for the LHC Run2 and beyond.

  17. Review and Comparison of the Search Effectiveness and User Interface of Three Major Online Chemical Databases

    ERIC Educational Resources Information Center

    Bharti, Neelam; Leonard, Michelle; Singh, Shailendra

    2016-01-01

    Online chemical databases are the largest source of chemical information and, therefore, the main resource for retrieving results from published journals, books, patents, conference abstracts, and other relevant sources. Various commercial, as well as free, chemical databases are available. SciFinder, Reaxys, and Web of Science are three major…

  18. A database of natural products and chemical entities from marine habitat

    PubMed Central

    Babu, Padavala Ajay; Puppala, Suma Sree; Aswini, Satyavarapu Lakshmi; Vani, Metta Ramya; Kumar, Chinta Narasimha; Prasanna, Tallapragada

    2008-01-01

    Marine compound database consists of marine natural products and chemical entities, collected from various literature sources, which are known to possess bioactivity against human diseases. The database is constructed using html code. The 12 categories of 182 compounds are provided with the source, compound name, 2-dimensional structure, bioactivity and clinical trial information. The database is freely available online and can be accessed at http://www.progenebio.in/mcdb/index.htm PMID:19238254

  19. Role of data warehousing in healthcare epidemiology.

    PubMed

    Wyllie, D; Davies, J

    2015-04-01

    Electronic storage of healthcare data, including individual-level risk factors for both infectious and other diseases, is increasing. These data can be integrated at hospital, regional and national levels. Data sources that contain risk factor and outcome information for a wide range of conditions offer the potential for efficient epidemiological analysis of multiple diseases. Opportunities may also arise for monitoring healthcare processes. Integrating diverse data sources presents epidemiological, practical, and ethical challenges. For example, diagnostic criteria, outcome definitions, and ascertainment methods may differ across the data sources. Data volumes may be very large, requiring sophisticated computing technology. Given the large populations involved, perhaps the most challenging aspect is how informed consent can be obtained for the development of integrated databases, particularly when it is not easy to demonstrate their potential. In this article, we discuss some of the ups and downs of recent projects as well as the potential of data warehousing for antimicrobial resistance monitoring. Copyright © 2015. Published by Elsevier Ltd.

  20. Multiple Object Retrieval in Image Databases Using Hierarchical Segmentation Tree

    ERIC Educational Resources Information Center

    Chen, Wei-Bang

    2012-01-01

    The purpose of this research is to develop a new visual information analysis, representation, and retrieval framework for automatic discovery of salient objects of user's interest in large-scale image databases. In particular, this dissertation describes a content-based image retrieval framework which supports multiple-object retrieval. The…

  1. Intrinsic Radiation Source Generation with the ISC Package: Data Comparisons and Benchmarking

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solomon, Clell J. Jr.

    The characterization of radioactive emissions from unstable isotopes (intrinsic radiation) is necessary for shielding and radiological-dose calculations from radioactive materials. While most radiation transport codes, e.g., MCNP [X-5 Monte Carlo Team, 2003], provide the capability to input user prescribed source definitions, such as radioactive emissions, they do not provide the capability to calculate the correct radioactive-source definition given the material compositions. Special modifications to MCNP have been developed in the past to allow the user to specify an intrinsic source, but these modification have not been implemented into the primary source base [Estes et al., 1988]. To facilitate the descriptionmore » of the intrinsic radiation source from a material with a specific composition, the Intrinsic Source Constructor library (LIBISC) and MCNP Intrinsic Source Constructor (MISC) utility have been written. The combination of LIBISC and MISC will be herein referred to as the ISC package. LIBISC is a statically linkable C++ library that provides the necessary functionality to construct the intrinsic-radiation source generated by a material. Furthermore, LIBISC provides the ability use different particle-emission databases, radioactive-decay databases, and natural-abundance databases allowing the user flexibility in the specification of the source, if one database is preferred over others. LIBISC also provides functionality for aging materials and producing a thick-target bremsstrahlung photon source approximation from the electron emissions. The MISC utility links to LIBISC and facilitates the description of intrinsic-radiation sources into a format directly usable with the MCNP transport code. Through a series of input keywords and arguments the MISC user can specify the material, age the material if desired, and produce a source description of the radioactive emissions from the material in an MCNP readable format. Further details of using the MISC utility can be obtained from the user guide [Solomon, 2012]. The remainder of this report presents a discussion of the databases available to LIBISC and MISC, a discussion of the models employed by LIBISC, a comparison of the thick-target bremsstrahlung model employed, a benchmark comparison to plutonium and depleted-uranium spheres, and a comparison of the available particle-emission databases.« less

  2. Source Apportionment and Risk Assessment of Emerging Contaminants: An Approach of Pharmaco-Signature in Water Systems

    PubMed Central

    Jiang, Jheng Jie; Lee, Chon Lin; Fang, Meng Der; Boyd, Kenneth G.; Gibb, Stuart W.

    2015-01-01

    This paper presents a methodology based on multivariate data analysis for characterizing potential source contributions of emerging contaminants (ECs) detected in 26 river water samples across multi-scape regions during dry and wet seasons. Based on this methodology, we unveil an approach toward potential source contributions of ECs, a concept we refer to as the “Pharmaco-signature.” Exploratory analysis of data points has been carried out by unsupervised pattern recognition (hierarchical cluster analysis, HCA) and receptor model (principal component analysis-multiple linear regression, PCA-MLR) in an attempt to demonstrate significant source contributions of ECs in different land-use zone. Robust cluster solutions grouped the database according to different EC profiles. PCA-MLR identified that 58.9% of the mean summed ECs were contributed by domestic impact, 9.7% by antibiotics application, and 31.4% by drug abuse. Diclofenac, ibuprofen, codeine, ampicillin, tetracycline, and erythromycin-H2O have significant pollution risk quotients (RQ>1), indicating potentially high risk to aquatic organisms in Taiwan. PMID:25874375

  3. Comparison of Online Agricultural Information Services.

    ERIC Educational Resources Information Center

    Reneau, Fred; Patterson, Richard

    1984-01-01

    Outlines major online agricultural information services--agricultural databases, databases with agricultural services, educational databases in agriculture--noting services provided, access to the database, and costs. Benefits of online agricultural database sources (availability of agricultural marketing, weather, commodity prices, management…

  4. The 3XMM spectral fit database

    NASA Astrophysics Data System (ADS)

    Georgantopoulos, I.; Corral, A.; Watson, M.; Carrera, F.; Webb, N.; Rosen, S.

    2016-06-01

    I will present the XMMFITCAT database which is a spectral fit inventory of the sources in the 3XMM catalogue. Spectra are available by the XMM/SSC for all 3XMM sources which have more than 50 background subtracted counts per module. This work is funded in the framework of the ESA Prodex project. The 3XMM catalog currently covers 877 sq. degrees and contains about 400,000 unique sources. Spectra are available for over 120,000 sources. Spectral fist have been performed with various spectral models. The results are available in the web page http://xraygroup.astro.noa.gr/ and also at the University of Leicester LEDAS database webpage ledas-www.star.le.ac.uk/. The database description as well as some science results in the joint area with SDSS are presented in two recent papers: Corral et al. 2015, A&A, 576, 61 and Corral et al. 2014, A&A, 569, 71. At least for extragalactic sources, the spectral fits will acquire added value when photometric redshifts become available. In the framework of a new Prodex project we have been funded to derive photometric redshifts for the 3XMM sources using machine learning techniques. I will present the techniques as well as the optical near-IR databases that will be used.

  5. MP3C - the Minor Planet Physical Properties Catalogue: a New VO Service For Multi-database Query

    NASA Astrophysics Data System (ADS)

    Tanga, Paolo; Delbo, M.; Gerakis, J.

    2013-10-01

    In the last few years we witnessed a large growth in the number of asteroids for which we have physical properties. However, these data are dispersed in a multiplicity of catalogs. Extracting data and combining them for further analysis requires custom tools, a situation further complicated by the variety of data sources, some of them standardized (Planetary Data System) others not. With these problems in mind, we created a new Virtual Observatory service named “Minor Planet Physical Properties Catalogue” (abbreviated as MP3C - http://mp3c.oca.eu/). MP3C is not a new database, but rather a portal allowing the user to access selected properties of objects by easy SQL query, even from different sources. At present, such diverse data as orbital parameters, photometric and light curve parameters, sizes and albedos derived by IRAS, AKARI and WISE, SDSS colors, SMASS taxonomy, family membership, satellite data, stellar occultation results, are included. Other data sources will be added in the near future. The physical properties output of the MP3C can be tuned by the users by query criteria based upon ranges of values of the ingested quantities. The resulting list of object can be used for interactive plots through standard VO tools such as TOPCAT. Also, their ephemerids and visibilities from given sites can be computed. We are targeting full VO compliance for providing a new standardized service to the community.

  6. HOPE: An On-Line Piloted Handling Qualities Experiment Data Book

    NASA Technical Reports Server (NTRS)

    Jackson, E. B.; Proffitt, Melissa S.

    2010-01-01

    A novel on-line database for capturing most of the information obtained during piloted handling qualities experiments (either flight or simulated) is described. The Hyperlinked Overview of Piloted Evaluations (HOPE) web application is based on an open-source object-oriented Web-based front end (Ruby-on-Rails) that can be used with a variety of back-end relational database engines. The hyperlinked, on-line data book approach allows an easily-traversed way of looking at a variety of collected data, including pilot ratings, pilot information, vehicle and configuration characteristics, test maneuvers, and individual flight test cards and repeat runs. It allows for on-line retrieval of pilot comments, both audio and transcribed, as well as time history data retrieval and video playback. Pilot questionnaires are recorded as are pilot biographies. Simple statistics are calculated for each selected group of pilot ratings, allowing multiple ways to aggregate the data set (by pilot, by task, or by vehicle configuration, for example). Any number of per-run or per-task metrics can be captured in the database. The entire run metrics dataset can be downloaded in comma-separated text for further analysis off-line. It is expected that this tool will be made available upon request

  7. SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access.

    PubMed

    Amigo, Jorge; Salas, Antonio; Phillips, Christopher; Carracedo, Angel

    2008-10-10

    In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics. We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 x 10(9) genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested. In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, Fst and In.

  8. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less

  9. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    DOE PAGES

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less

  10. Lessons learned while building the Deepwater Horizon Database: Toward improved data sharing in coastal science

    NASA Astrophysics Data System (ADS)

    Thessen, Anne E.; McGinnis, Sean; North, Elizabeth W.

    2016-02-01

    Process studies and coupled-model validation efforts in geosciences often require integration of multiple data types across time and space. For example, improved prediction of hydrocarbon fate and transport is an important societal need which fundamentally relies upon synthesis of oceanography and hydrocarbon chemistry. Yet, there are no publically accessible databases which integrate these diverse data types in a georeferenced format, nor are there guidelines for developing such a database. The objective of this research was to analyze the process of building one such database to provide baseline information on data sources and data sharing and to document the challenges and solutions that arose during this major undertaking. The resulting Deepwater Horizon Database was approximately 2.4 GB in size and contained over 8 million georeferenced data points collected from industry, government databases, volunteer networks, and individual researchers. The major technical challenges that were overcome were reconciliation of terms, units, and quality flags which were necessary to effectively integrate the disparate data sets. Assembling this database required the development of relationships with individual researchers and data managers which often involved extensive e-mail contacts. The average number of emails exchanged per data set was 7.8. Of the 95 relevant data sets that were discovered, 38 (40%) were obtained, either in whole or in part. Over one third (36%) of the requests for data went unanswered. The majority of responses were received after the first request (64%) and within the first week of the first request (67%). Although fewer than half of the potentially relevant datasets were incorporated into the database, the level of sharing (40%) was high compared to some other disciplines where sharing can be as low as 10%. Our suggestions for building integrated databases include budgeting significant time for e-mail exchanges, being cognizant of the cost versus benefits of pursuing reticent data providers, and building trust through clear, respectful communication and with flexible and appropriate attributions.

  11. Preliminary geologic map of the Piru 7.5' quadrangle, southern California: a digital database

    USGS Publications Warehouse

    Yerkes, R.F.; Campbell, Russell H.

    1995-01-01

    This Open-File report is a digital geologic map database. This pamphlet serves to introduce and describe the digital data. There is no paper map included in the Open-File report. This digital map database is compiled from previously published sources combined with some new mapping and modifications in nomenclature. The geologic map database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U. S. Geological Survey. For detailed descriptions of the units, their stratigraphic relations and sources of geologic mapping consult Yerkes and Campbell (1995). More specific information about the units may be available in the original sources.

  12. Comprehensive Routing Security Development and Deployment for the Internet

    DTIC Science & Technology

    2015-02-01

    feature enhancement and bug fixes. • MySQL : MySQL is a widely used and popular open source database package. It was chosen for database support in the...RPSTIR depends on several other open source packages. • MySQL : MySQL is used for the the local RPKI database cache. • OpenSSL: OpenSSL is used for...cryptographic libraries for X.509 certificates. • ODBC mySql Connector: ODBC (Open Database Connectivity) is a standard programming interface (API) for

  13. Transitions theory: a trajectory of theoretical development in nursing.

    PubMed

    Im, Eun-Ok

    2011-01-01

    There have been very few investigations into how any single nursing theory has actually evolved historically. In this paper, a trajectory of theoretical development in nursing is explored through reviewing the theoretical development of a single nursing theory-transitions theory. The literature related to transitions theory was searched and retrieved using multiple databases. Ninety-nine papers were analyzed according to type of theory, populations of interest, sources of theorizing, and theoretical methods. Transitions theory originated in research but was initially borrowed. It also arose in research with immigrants and from national and international collaborative research efforts. A product of mentoring, transitions theory is used widely in nursing education, research, and practice. Diverse thoughts related to transitions theory coexist. For future theoretical development in nursing, we need to remain open to new ideas and continue to engage in multiple collaborative efforts. Copyright © 2011 Elsevier Inc. All rights reserved.

  14. Preparing Nursing Home Data from Multiple Sites for Clinical Research – A Case Study Using Observational Health Data Sciences and Informatics

    PubMed Central

    Boyce, Richard D.; Handler, Steven M.; Karp, Jordan F.; Perera, Subashan; Reynolds, Charles F.

    2016-01-01

    Introduction: A potential barrier to nursing home research is the limited availability of research quality data in electronic form. We describe a case study of converting electronic health data from five skilled nursing facilities to a research quality longitudinal dataset by means of open-source tools produced by the Observational Health Data Sciences and Informatics (OHDSI) collaborative. Methods: The Long-Term Care Minimum Data Set (MDS), drug dispensing, and fall incident data from five SNFs were extracted, translated, and loaded into version 4 of the OHDSI common data model. Quality assurance involved identifying errors using the Achilles data characterization tool and comparing both quality measures and drug exposures in the new database for concordance with externally available sources. Findings: Records for a total 4,519 patients (95.1%) made it into the final database. Achilles identified 10 different types of errors that were addressed in the final dataset. Drug exposures based on dispensing were generally accurate when compared with medication administration data from the pharmacy services provider. Quality measures were generally concordant between the new database and Nursing Home Compare for measures with a prevalence ≥ 10%. Fall data recorded in MDS was found to be more complete than data from fall incident reports. Conclusions: The new dataset is ready to support observational research on topics of clinical importance in the nursing home including patient-level prediction of falls. The extraction, translation, and loading process enabled the use of OHDSI data characterization tools that improved the quality of the final dataset. PMID:27891528

  15. Database of traditional Chinese medicine and its application to studies of mechanism and to prescription validation.

    PubMed

    Chen, X; Zhou, H; Liu, Y B; Wang, J F; Li, H; Ung, C Y; Han, L Y; Cao, Z W; Chen, Y Z

    2006-12-01

    Traditional Chinese Medicine (TCM) is widely practised and is viewed as an attractive alternative to conventional medicine. Quantitative information about TCM prescriptions, constituent herbs and herbal ingredients is necessary for studying and exploring TCM. We manually collected information on TCM in books and other printed sources in Medline. The Traditional Chinese Medicine Information Database TCM-ID, at http://tcm.cz3.nus.edu.sg/group/tcm-id/tcmid.asp, was introduced for providing comprehensive information about all aspects of TCM including prescriptions, constituent herbs, herbal ingredients, molecular structure and functional properties of active ingredients, therapeutic and side effects, clinical indication and application and related matters. TCM-ID currently contains information for 1,588 prescriptions, 1,313 herbs, 5,669 herbal ingredients, and the 3D structure of 3,725 herbal ingredients. The value of the data in TCM-ID was illustrated by using some of the data for an in-silico study of molecular mechanism of the therapeutic effects of herbal ingredients and for developing a computer program to validate TCM multi-herb preparations. The development of systems biology has led to a new design principle for therapeutic intervention strategy, the concept of 'magic shrapnel' (rather than the 'magic bullet'), involving many drugs against multiple targets, administered in a single treatment. TCM offers an extensive source of examples of this concept in which several active ingredients in one prescription are aimed at numerous targets and work together to provide therapeutic benefit. The database and its mining applications described here represent early efforts toward exploring TCM for new theories in drug discovery.

  16. The XSD-Builder Specification Language—Toward a Semantic View of XML Schema Definition

    NASA Astrophysics Data System (ADS)

    Fong, Joseph; Cheung, San Kuen

    In the present database market, XML database model is a main structure for the forthcoming database system in the Internet environment. As a conceptual schema of XML database, XML Model has its limitation on presenting its data semantics. System analyst has no toolset for modeling and analyzing XML system. We apply XML Tree Model (shown in Figure 2) as a conceptual schema of XML database to model and analyze the structure of an XML database. It is important not only for visualizing, specifying, and documenting structural models, but also for constructing executable systems. The tree model represents inter-relationship among elements inside different logical schema such as XML Schema Definition (XSD), DTD, Schematron, XDR, SOX, and DSD (shown in Figure 1, an explanation of the terms in the figure are shown in Table 1). The XSD-Builder consists of XML Tree Model, source language, translator, and XSD. The source language is called XSD-Source which is mainly for providing an environment with concept of user friendliness while writing an XSD. The source language will consequently be translated by XSD-Translator. Output of XSD-Translator is an XSD which is our target and is called as an object language.

  17. An Update on Electronic Information Sources.

    ERIC Educational Resources Information Center

    Ackerman, Katherine

    1987-01-01

    This review of new developments and products in online services discusses trends in travel related services; full text databases; statistical source databases; an emphasis on regional and international business news; and user friendly systems. (Author/CLB)

  18. [Meta-analysis for correlation between multiple lung lobe lesions and prognostic influence on acquired pneumonia in hospitalized elderly patients].

    PubMed

    Huang, Wenjie; Feng, Wei; Li, Yang; Chen, Yu

    2014-11-01

    To explore the correlation regarding the prognostic influence between multiple lung lobe lesions and acquired pneumonia in hospitalized elderly patients by a Meta-analysis. We collected all studies which investigated the correlation regarding the prognostic effect between multiple lung lobe lesions and acquired pneumonia by searching China National Knowledge Infrastructure, Wanfang Database, Chinese Science and Technology Periodical Database, Chinese Biological Medical Literature Database, PubMed, and EMBase in accordance with the inclusion and exclusion criteria. Th e retrieval limit time of searches was from databases establishment to July 2014. Th e Meta-analysis was performed by using RevMan5.2 soft ware. We calculated the odds ratio (OR) and 95% confidence interval (95% CI) by using heterogeneous tests. Publication bias was assessed by Egger's test and funnel plot, and the sensitivity was analyzed. Ten studies involving 1 836 patients were finally included, with 487 cases (the dead group) and 1 349 controls (the survival group). The Meta-analysis demonstrated that multiple lung lobe lesions was highly correlated with the prognosis for the aged acquired pneumonia (OR=3.22, 95% CI 1.84 to 5.63). Multiple lung lobe lesions increase the risk of death in the prognosis of the aged patients with acquired pneumonia.

  19. The relational database model and multiple multicenter clinical trials.

    PubMed

    Blumenstein, B A

    1989-12-01

    The Southwest Oncology Group (SWOG) chose to use a relational database management system (RDBMS) for the management of data from multiple clinical trials because of the underlying relational model's inherent flexibility and the natural way multiple entity types (patients, studies, and participants) can be accommodated. The tradeoffs to using the relational model as compared to using the hierarchical model include added computing cycles due to deferred data linkages and added procedural complexity due to the necessity of implementing protections against referential integrity violations. The SWOG uses its RDBMS as a platform on which to build data operations software. This data operations software, which is written in a compiled computer language, allows multiple users to simultaneously update the database and is interactive with respect to the detection of conditions requiring action and the presentation of options for dealing with those conditions. The relational model facilitates the development and maintenance of data operations software.

  20. Maintaining Multimedia Data in a Geospatial Database

    DTIC Science & Technology

    2012-09-01

    at PostgreSQL and MySQL as spatial databases was offered. Given their results, as each database produced result sets from zero to 100,000, it was...excelled given multiple conditions. A different look at PostgreSQL and MySQL as spatial databases was offered. Given their results, as each database... MySQL ................................................................................................14  B.  BENCHMARKING DATA RETRIEVED FROM TABLE

  1. Healthcare Databases in Thailand and Japan: Potential Sources for Health Technology Assessment Research

    PubMed Central

    Saokaew, Surasak; Sugimoto, Takashi; Kamae, Isao; Pratoomsoot, Chayanin; Chaiyakunapruk, Nathorn

    2015-01-01

    Background Health technology assessment (HTA) has been continuously used for value-based healthcare decisions over the last decade. Healthcare databases represent an important source of information for HTA, which has seen a surge in use in Western countries. Although HTA agencies have been established in Asia-Pacific region, application and understanding of healthcare databases for HTA is rather limited. Thus, we reviewed existing databases to assess their potential for HTA in Thailand where HTA has been used officially and Japan where HTA is going to be officially introduced. Method Existing healthcare databases in Thailand and Japan were compiled and reviewed. Databases’ characteristics e.g. name of database, host, scope/objective, time/sample size, design, data collection method, population/sample, and variables were described. Databases were assessed for its potential HTA use in terms of safety/efficacy/effectiveness, social/ethical, organization/professional, economic, and epidemiological domains. Request route for each database was also provided. Results Forty databases– 20 from Thailand and 20 from Japan—were included. These comprised of national censuses, surveys, registries, administrative data, and claimed databases. All databases were potentially used for epidemiological studies. In addition, data on mortality, morbidity, disability, adverse events, quality of life, service/technology utilization, length of stay, and economics were also found in some databases. However, access to patient-level data was limited since information about the databases was not available on public sources. Conclusion Our findings have shown that existing databases provided valuable information for HTA research with limitation on accessibility. Mutual dialogue on healthcare database development and usage for HTA among Asia-Pacific region is needed. PMID:26560127

  2. Specialist Bibliographic Databases

    PubMed Central

    2016-01-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  3. Specialist Bibliographic Databases.

    PubMed

    Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D

    2016-05-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls.

  4. A high-performance spatial database based approach for pathology imaging algorithm evaluation

    PubMed Central

    Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.

    2013-01-01

    Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905

  5. An Introduction to MAMA (Meta-Analysis of MicroArray data) System.

    PubMed

    Zhang, Zhe; Fenstermacher, David

    2005-01-01

    Analyzing microarray data across multiple experiments has been proven advantageous. To support this kind of analysis, we are developing a software system called MAMA (Meta-Analysis of MicroArray data). MAMA utilizes a client-server architecture with a relational database on the server-side for the storage of microarray datasets collected from various resources. The client-side is an application running on the end user's computer that allows the user to manipulate microarray data and analytical results locally. MAMA implementation will integrate several analytical methods, including meta-analysis within an open-source framework offering other developers the flexibility to plug in additional statistical algorithms.

  6. Exploratory visualization of astronomical data on ultra-high-resolution wall displays

    NASA Astrophysics Data System (ADS)

    Pietriga, Emmanuel; del Campo, Fernando; Ibsen, Amanda; Primet, Romain; Appert, Caroline; Chapuis, Olivier; Hempel, Maren; Muñoz, Roberto; Eyheramendy, Susana; Jordan, Andres; Dole, Hervé

    2016-07-01

    Ultra-high-resolution wall displays feature a very high pixel density over a large physical surface, which makes them well-suited to the collaborative, exploratory visualization of large datasets. We introduce FITS-OW, an application designed for such wall displays, that enables astronomers to navigate in large collections of FITS images, query astronomical databases, and display detailed, complementary data and documents about multiple sources simultaneously. We describe how astronomers interact with their data using both the wall's touchsensitive surface and handheld devices. We also report on the technical challenges we addressed in terms of distributed graphics rendering and data sharing over the computer clusters that drive wall displays.

  7. Boosting probabilistic graphical model inference by incorporating prior knowledge from multiple sources.

    PubMed

    Praveen, Paurush; Fröhlich, Holger

    2013-01-01

    Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available.

  8. Aggregating todays data for tomorrows science: a geological use case

    NASA Astrophysics Data System (ADS)

    Glaves, H.; Kingdon, A.; Nayembil, M.; Baker, G.

    2016-12-01

    Geoscience data is made up of diverse and complex smaller datasets that, when aggregated together, build towards what is recognised as `big data'. The British Geological Survey (BGS), which acts as a repository for all subsurface data from the United Kingdom, has been collating these disparate small datasets that have been accumulated from the activities of a large number of geoscientists over many years. Recently this picture has been further complicated by the addition of new data sources such as near real-time sensor data, and industry or community data that is increasingly delivered via automatic donations. Many of these datasets have been aggregated in relational databases to form larger ones that are used to address a variety of issues ranging from development of national infrastructure to disaster response. These complex domain-specific SQL databases deliver effective data management using normalised subject-based database designs in a secure environment. However, the isolated subject-oriented design of these systems inhibits efficient cross-domain querying of the datasets. Additionally, the tools provided often do not enable effective data discovery as they have problems resolving the complex underlying normalised structures. Recent requirements to understand sub-surface geology in three dimensions have led BGS to develop new data systems. One such solution is PropBase which delivers a generic denormalised data structure within an RDBMS to store geological property data. Propbase facilitates rapid and standardised data discovery and access, incorporating 2D and 3D physical and chemical property data, including associated metadata. It also provides a dedicated web interface to deliver complex multiple data sets from a single database in standardised common output formats (e.g. CSV, GIS shape files) without the need for complex data conditioning. PropBase facilitates new scientific research, previously considered impractical, by enabling property data searches across multiple databases. Using the Propbase exemplar this presentation will seek to illustrate how BGS has developed systems for aggregating `small datasets' to create the `big data' necessary for the data analytics, mining, processing and visualisation needed for future geoscientific research.

  9. Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces.

    PubMed

    Ezra Tsur, Elishai

    2017-01-01

    Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.

  10. New Zealand's National Landslide Database

    NASA Astrophysics Data System (ADS)

    Rosser, B.; Dellow, S.; Haubrook, S.; Glassey, P.

    2016-12-01

    Since 1780, landslides have caused an average of about 3 deaths a year in New Zealand and have cost the economy an average of at least NZ$250M/a (0.1% GDP). To understand the risk posed by landslide hazards to society, a thorough knowledge of where, when and why different types of landslides occur is vital. The main objective for establishing the database was to provide a centralised national-scale, publically available database to collate landslide information that could be used for landslide hazard and risk assessment. Design of a national landslide database for New Zealand required consideration of both existing landslide data stored in a variety of digital formats, and future data, yet to be collected. Pre-existing databases were developed and populated with data reflecting the needs of the landslide or hazard project, and the database structures of the time. Bringing these data into a single unified database required a new structure capable of storing and delivering data at a variety of scales and accuracy and with different attributes. A "unified data model" was developed to enable the database to hold old and new landslide data irrespective of scale and method of capture. The database contains information on landslide locations and where available: 1) the timing of landslides and the events that may have triggered them; 2) the type of landslide movement; 3) the volume and area; 4) the source and debris tail; and 5) the impacts caused by the landslide. Information from a variety of sources including aerial photographs (and other remotely sensed data), field reconnaissance and media accounts has been collated and is presented for each landslide along with metadata describing the data sources and quality. There are currently nearly 19,000 landslide records in the database that include point locations, polygons of landslide source and deposit areas, and linear features. Several large datasets are awaiting upload which will bring the total number of landslides to over 100,000. The geo-spatial database is publicly available via the Internet. Software components, including the underlying database (PostGIS), Web Map Server (GeoServer) and web application use open-source software. The hope is that others will add relevant information to the database as well as download the data contained in it.

  11. SZDB: A Database for Schizophrenia Genetic Research

    PubMed Central

    Wu, Yong; Yao, Yong-Gang

    2017-01-01

    Abstract Schizophrenia (SZ) is a debilitating brain disorder with a complex genetic architecture. Genetic studies, especially recent genome-wide association studies (GWAS), have identified multiple variants (loci) conferring risk to SZ. However, how to efficiently extract meaningful biological information from bulk genetic findings of SZ remains a major challenge. There is a pressing need to integrate multiple layers of data from various sources, eg, genetic findings from GWAS, copy number variations (CNVs), association and linkage studies, gene expression, protein–protein interaction (PPI), co-expression, expression quantitative trait loci (eQTL), and Encyclopedia of DNA Elements (ENCODE) data, to provide a comprehensive resource to facilitate the translation of genetic findings into SZ molecular diagnosis and mechanism study. Here we developed the SZDB database (http://www.szdb.org/), a comprehensive resource for SZ research. SZ genetic data, gene expression data, network-based data, brain eQTL data, and SNP function annotation information were systematically extracted, curated and deposited in SZDB. In-depth analyses and systematic integration were performed to identify top prioritized SZ genes and enriched pathways. Multiple types of data from various layers of SZ research were systematically integrated and deposited in SZDB. In-depth data analyses and integration identified top prioritized SZ genes and enriched pathways. We further showed that genes implicated in SZ are highly co-expressed in human brain and proteins encoded by the prioritized SZ risk genes are significantly interacted. The user-friendly SZDB provides high-confidence candidate variants and genes for further functional characterization. More important, SZDB provides convenient online tools for data search and browse, data integration, and customized data analyses. PMID:27451428

  12. Web Application Software for Ground Operations Planning Database (GOPDb) Management

    NASA Technical Reports Server (NTRS)

    Lanham, Clifton; Kallner, Shawn; Gernand, Jeffrey

    2013-01-01

    A Web application facilitates collaborative development of the ground operations planning document. This will reduce costs and development time for new programs by incorporating the data governance, access control, and revision tracking of the ground operations planning data. Ground Operations Planning requires the creation and maintenance of detailed timelines and documentation. The GOPDb Web application was created using state-of-the-art Web 2.0 technologies, and was deployed as SaaS (Software as a Service), with an emphasis on data governance and security needs. Application access is managed using two-factor authentication, with data write permissions tied to user roles and responsibilities. Multiple instances of the application can be deployed on a Web server to meet the robust needs for multiple, future programs with minimal additional cost. This innovation features high availability and scalability, with no additional software that needs to be bought or installed. For data governance and security (data quality, management, business process management, and risk management for data handling), the software uses NAMS. No local copy/cloning of data is permitted. Data change log/tracking is addressed, as well as collaboration, work flow, and process standardization. The software provides on-line documentation and detailed Web-based help. There are multiple ways that this software can be deployed on a Web server to meet ground operations planning needs for future programs. The software could be used to support commercial crew ground operations planning, as well as commercial payload/satellite ground operations planning. The application source code and database schema are owned by NASA.

  13. ChEMBL web services: streamlining access to drug discovery data and utilities.

    PubMed

    Davies, Mark; Nowotka, Michał; Papadatos, George; Dedman, Nathan; Gaulton, Anna; Atkinson, Francis; Bellis, Louisa; Overington, John P

    2015-07-01

    ChEMBL is now a well-established resource in the fields of drug discovery and medicinal chemistry research. The ChEMBL database curates and stores standardized bioactivity, molecule, target and drug data extracted from multiple sources, including the primary medicinal chemistry literature. Programmatic access to ChEMBL data has been improved by a recent update to the ChEMBL web services (version 2.0.x, https://www.ebi.ac.uk/chembl/api/data/docs), which exposes significantly more data from the underlying database and introduces new functionality. To complement the data-focused services, a utility service (version 1.0.x, https://www.ebi.ac.uk/chembl/api/utils/docs), which provides RESTful access to commonly used cheminformatics methods, has also been concurrently developed. The ChEMBL web services can be used together or independently to build applications and data processing workflows relevant to drug discovery and chemical biology. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Paradise: A Parallel Information System for EOSDIS

    NASA Technical Reports Server (NTRS)

    DeWitt, David

    1996-01-01

    The Paradise project was begun-in 1993 in order to explore the application of the parallel and object-oriented database system technology developed as a part of the Gamma, Exodus. and Shore projects to the design and development of a scaleable, geo-spatial database system for storing both massive spatial and satellite image data sets. Paradise is based on an object-relational data model. In addition to the standard attribute types such as integers, floats, strings and time, Paradise also provides a set of and multimedia data types, designed to facilitate the storage and querying of complex spatial and multimedia data sets. An individual tuple can contain any combination of this rich set of data types. For example, in the EOSDIS context, a tuple might mix terrain and map data for an area along with the latest satellite weather photo of the area. The use of a geo-spatial metaphor simplifies the task of fusing disparate forms of data from multiple data sources including text, image, map, and video data sets.

  15. The Development of Ontology from Multiple Databases

    NASA Astrophysics Data System (ADS)

    Kasim, Shahreen; Aswa Omar, Nurul; Fudzee, Mohd Farhan Md; Azhar Ramli, Azizul; Aizi Salamat, Mohamad; Mahdin, Hairulnizam

    2017-08-01

    The area of halal industry is the fastest growing global business across the world. The halal food industry is thus crucial for Muslims all over the world as it serves to ensure them that the food items they consume daily are syariah compliant. Currently, ontology has been widely used in computer sciences area such as web on the heterogeneous information processing, semantic web, and information retrieval. However, ontology has still not been used widely in the halal industry. Today, Muslim community still have problem to verify halal status for products in the market especially foods consisting of E number. This research tried to solve problem in validating the halal status from various halal sources. There are various chemical ontology from multilple databases found to help this ontology development. The E numbers in this chemical ontology are codes for chemicals that can be used as food additives. With this E numbers ontology, Muslim community could identify and verify the halal status effectively for halal products in the market.

  16. A systematic review of administrative and clinical databases of infants admitted to neonatal units.

    PubMed

    Statnikov, Yevgeniy; Ibrahim, Buthaina; Modi, Neena

    2017-05-01

    High quality information, increasingly captured in clinical databases, is a useful resource for evaluating and improving newborn care. We conducted a systematic review to identify neonatal databases, and define their characteristics. We followed a preregistered protocol using MesH terms to search MEDLINE, EMBASE, CINAHL, Web of Science and OVID Maternity and Infant Care Databases for articles identifying patient level databases covering more than one neonatal unit. Full-text articles were reviewed and information extracted on geographical coverage, criteria for inclusion, data source, and maternal and infant characteristics. We identified 82 databases from 2037 publications. Of the country-specific databases there were 39 regional and 39 national. Sixty databases restricted entries to neonatal unit admissions by birth characteristic or insurance cover; 22 had no restrictions. Data were captured specifically for 53 databases; 21 administrative sources; 8 clinical sources. Two clinical databases hold the largest range of data on patient characteristics, USA's Pediatrix BabySteps Clinical Data Warehouse and UK's National Neonatal Research Database. A number of neonatal databases exist that have potential to contribute to evaluating neonatal care. The majority is created by entering data specifically for the database, duplicating information likely already captured in other administrative and clinical patient records. This repetitive data entry represents an unnecessary burden in an environment where electronic patient records are increasingly used. Standardisation of data items is necessary to facilitate linkage within and between countries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  17. Particulate Matter and Ozone Prediction and Source Attribution for U.S. Air Quality Management in a Changing World

    NASA Astrophysics Data System (ADS)

    Sanyal, S.; Wuebbles, D. J.

    2017-12-01

    In this study, the focus is on how global changes in climate and emissions will affect the U.S. air quality, especially on fine particulate matter and ozone, projecting their future trends and quantifying key source attribution. We are conducting three primary experiments : (1) historical simulations for period 1994-2013 to establish the credibility of the system and refine process-level understanding of U.S. regional air quality; (2) projections for period 2041-2060 to quantify individual and combined impacts of global climate and emissions changes under multiple scenarios; (3) sensitivity analyses to determine future changes in pollution sources and their relative contributions from anthropogenic and natural emissions, long-range pollutant transport, and climate change effects. Here we will present the result from the first experiment with the global model CESM1.2 (with fully coupled chemistry using CAM-chem5) driven by NASA Modern-Era Retrospective analysis for Research and Applications (MERRA) reanalysis data at 0.9o x 1.25o resolution. We will present the comparison between the results from model simulation with observation data from EPA database. Since there is always a challenge in comparing gridded prediction from model data with point data from the observation databases, because the model simulations calculate the average outcome over a grid for a given set of conditions while the stochastic component (e.g. sub-grid variations) embedded in the observations are not accounted for, we are using extensive statistical measure to do the comparison. We will also determine relative contributions from multiscale (local, regional, global) processes, major source regions (Mexico, Canada, Asia, Africa) and types (natural, anthropogenic) and associated uncertainties (climate decadal oscillations/interannual variations, emissions and model structure errors).

  18. Challenges and Experiences of Building Multidisciplinary Datasets across Cultures

    NASA Astrophysics Data System (ADS)

    Jamiyansharav, K.; Laituri, M.; Fernandez-Gimenez, M.; Fassnacht, S. R.; Venable, N. B. H.; Allegretti, A. M.; Reid, R.; Baival, B.; Jamsranjav, C.; Ulambayar, T.; Linn, S.; Angerer, J.

    2017-12-01

    Efficient data sharing and management are key challenges to multidisciplinary scientific research. These challenges are further complicated by adding a multicultural component. We address the construction of a complex database for social-ecological analysis in Mongolia. Funded by the National Science Foundation (NSF) Dynamics of Coupled Natural and Human (CNH) Systems, the Mongolian Rangelands and Resilience (MOR2) project focuses on the vulnerability of Mongolian pastoral systems to climate change and adaptive capacity. The MOR2 study spans over three years of fieldwork in 36 paired districts (Soum) from 18 provinces (Aimag) of Mongolia that covers steppe, mountain forest steppe, desert steppe and eastern steppe ecological zones. Our project team is composed of hydrologists, social scientists, geographers, and ecologists. The MOR2 database includes multiple ecological, social, meteorological, geospatial and hydrological datasets, as well as archives of original data and survey in multiple formats. Managing this complex database requires significant organizational skills, attention to detail and ability to communicate within collective team members from diverse disciplines and across multiple institutions in the US and Mongolia. We describe the database's rich content, organization, structure and complexity. We discuss lessons learned, best practices and recommendations for complex database management, sharing, and archiving in creating a cross-cultural and multi-disciplinary database.

  19. Network-based drug discovery by integrating systems biology and computational technologies

    PubMed Central

    Leung, Elaine L.; Cao, Zhi-Wei; Jiang, Zhi-Hong; Zhou, Hua

    2013-01-01

    Network-based intervention has been a trend of curing systemic diseases, but it relies on regimen optimization and valid multi-target actions of the drugs. The complex multi-component nature of medicinal herbs may serve as valuable resources for network-based multi-target drug discovery due to its potential treatment effects by synergy. Recently, robustness of multiple systems biology platforms shows powerful to uncover molecular mechanisms and connections between the drugs and their targeting dynamic network. However, optimization methods of drug combination are insufficient, owning to lacking of tighter integration across multiple ‘-omics’ databases. The newly developed algorithm- or network-based computational models can tightly integrate ‘-omics’ databases and optimize combinational regimens of drug development, which encourage using medicinal herbs to develop into new wave of network-based multi-target drugs. However, challenges on further integration across the databases of medicinal herbs with multiple system biology platforms for multi-target drug optimization remain to the uncertain reliability of individual data sets, width and depth and degree of standardization of herbal medicine. Standardization of the methodology and terminology of multiple system biology and herbal database would facilitate the integration. Enhance public accessible databases and the number of research using system biology platform on herbal medicine would be helpful. Further integration across various ‘-omics’ platforms and computational tools would accelerate development of network-based drug discovery and network medicine. PMID:22877768

  20. SPARQL-enabled identifier conversion with Identifiers.org

    PubMed Central

    Wimalaratne, Sarala M.; Bolleman, Jerven; Juty, Nick; Katayama, Toshiaki; Dumontier, Michel; Redaschi, Nicole; Le Novère, Nicolas; Hermjakob, Henning; Laibe, Camille

    2015-01-01

    Motivation: On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. Results: We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. Availability and implementation: The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql. Contact: sarala@ebi.ac.uk PMID:25638809

  1. Developing Surveillance Methodology for Agricultural and Logging Injury in New Hampshire Using Electronic Administrative Data Sets.

    PubMed

    Scott, Erika E; Hirabayashi, Liane; Krupa, Nicole L; Sorensen, Julie A; Jenkins, Paul L

    2015-08-01

    Agriculture and logging rank among industries with the highest rates of occupational fatality and injury. Establishing a nonfatal injury surveillance system is a top priority in the National Occupational Research Agenda. Sources of data such as patient care reports (PCRs) and hospitalization data have recently transitioned to electronic databases. Using narrative and location codes from PCRs, along with International Classification of Diseases, 9th Revision, external cause of injury codes (E-codes) in hospital data, researchers are designing a surveillance system to track farm and logging injury. A total of 357 true agricultural or logging cases were identified. These data indicate that it is possible to identify agricultural and logging injury events in PCR and hospital data. Multiple data sources increase catchment; nevertheless, limitations in methods of identification of agricultural and logging injury contribute to the likely undercount of injury events.

  2. ScanRanker: Quality Assessment of Tandem Mass Spectra via Sequence Tagging

    PubMed Central

    Ma, Ze-Qiang; Chambers, Matthew C.; Ham, Amy-Joan L.; Cheek, Kristin L.; Whitwell, Corbin W.; Aerni, Hans-Rudolf; Schilling, Birgit; Miller, Aaron W.; Caprioli, Richard M.; Tabb, David L.

    2011-01-01

    In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search, but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu. PMID:21520941

  3. SPARQL-enabled identifier conversion with Identifiers.org.

    PubMed

    Wimalaratne, Sarala M; Bolleman, Jerven; Juty, Nick; Katayama, Toshiaki; Dumontier, Michel; Redaschi, Nicole; Le Novère, Nicolas; Hermjakob, Henning; Laibe, Camille

    2015-06-01

    On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql. © The Author 2015. Published by Oxford University Press.

  4. Designing a Clinical Data Warehouse Architecture to Support Quality Improvement Initiatives.

    PubMed

    Chelico, John D; Wilcox, Adam B; Vawdrey, David K; Kuperman, Gilad J

    2016-01-01

    Clinical data warehouses, initially directed towards clinical research or financial analyses, are evolving to support quality improvement efforts, and must now address the quality improvement life cycle. In addition, data that are needed for quality improvement often do not reside in a single database, requiring easier methods to query data across multiple disparate sources. We created a virtual data warehouse at NewYork Presbyterian Hospital that allowed us to bring together data from several source systems throughout the organization. We also created a framework to match the maturity of a data request in the quality improvement life cycle to proper tools needed for each request. As projects progress in the Define, Measure, Analyze, Improve, Control stages of quality improvement, there is a proper matching of resources the data needs at each step. We describe the analysis and design creating a robust model for applying clinical data warehousing to quality improvement.

  5. Designing a Clinical Data Warehouse Architecture to Support Quality Improvement Initiatives

    PubMed Central

    Chelico, John D.; Wilcox, Adam B.; Vawdrey, David K.; Kuperman, Gilad J.

    2016-01-01

    Clinical data warehouses, initially directed towards clinical research or financial analyses, are evolving to support quality improvement efforts, and must now address the quality improvement life cycle. In addition, data that are needed for quality improvement often do not reside in a single database, requiring easier methods to query data across multiple disparate sources. We created a virtual data warehouse at NewYork Presbyterian Hospital that allowed us to bring together data from several source systems throughout the organization. We also created a framework to match the maturity of a data request in the quality improvement life cycle to proper tools needed for each request. As projects progress in the Define, Measure, Analyze, Improve, Control stages of quality improvement, there is a proper matching of resources the data needs at each step. We describe the analysis and design creating a robust model for applying clinical data warehousing to quality improvement. PMID:28269833

  6. cMapper: gene-centric connectivity mapper for EBI-RDF platform.

    PubMed

    Shoaib, Muhammad; Ansari, Adnan Ahmad; Ahn, Sung-Min

    2017-01-15

    In this era of biological big data, data integration has become a common task and a challenge for biologists. The Resource Description Framework (RDF) was developed to enable interoperability of heterogeneous datasets. The EBI-RDF platform enables an efficient data integration of six independent biological databases using RDF technologies and shared ontologies. However, to take advantage of this platform, biologists need to be familiar with RDF technologies and SPARQL query language. To overcome this practical limitation of the EBI-RDF platform, we developed cMapper, a web-based tool that enables biologists to search the EBI-RDF databases in a gene-centric manner without a thorough knowledge of RDF and SPARQL. cMapper allows biologists to search data entities in the EBI-RDF platform that are connected to genes or small molecules of interest in multiple biological contexts. The input to cMapper consists of a set of genes or small molecules, and the output are data entities in six independent EBI-RDF databases connected with the given genes or small molecules in the user's query. cMapper provides output to users in the form of a graph in which nodes represent data entities and the edges represent connections between data entities and inputted set of genes or small molecules. Furthermore, users can apply filters based on database, taxonomy, organ and pathways in order to focus on a core connectivity graph of their interest. Data entities from multiple databases are differentiated based on background colors. cMapper also enables users to investigate shared connections between genes or small molecules of interest. Users can view the output graph on a web browser or download it in either GraphML or JSON formats. cMapper is available as a web application with an integrated MySQL database. The web application was developed using Java and deployed on Tomcat server. We developed the user interface using HTML5, JQuery and the Cytoscape Graph API. cMapper can be accessed at http://cmapper.ewostech.net Readers can download the development manual from the website http://cmapper.ewostech.net/docs/cMapperDocumentation.pdf. Source Code is available at https://github.com/muhammadshoaib/cmapperContact:smahn@gachon.ac.krSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Fast in-database cross-matching of high-cadence, high-density source lists with an up-to-date sky model

    NASA Astrophysics Data System (ADS)

    Scheers, B.; Bloemen, S.; Mühleisen, H.; Schellart, P.; van Elteren, A.; Kersten, M.; Groot, P. J.

    2018-04-01

    Coming high-cadence wide-field optical telescopes will image hundreds of thousands of sources per minute. Besides inspecting the near real-time data streams for transient and variability events, the accumulated data archive is a wealthy laboratory for making complementary scientific discoveries. The goal of this work is to optimise column-oriented database techniques to enable the construction of a full-source and light-curve database for large-scale surveys, that is accessible by the astronomical community. We adopted LOFAR's Transients Pipeline as the baseline and modified it to enable the processing of optical images that have much higher source densities. The pipeline adds new source lists to the archive database, while cross-matching them with the known cataloguedsources in order to build a full light-curve archive. We investigated several techniques of indexing and partitioning the largest tables, allowing for faster positional source look-ups in the cross matching algorithms. We monitored all query run times in long-term pipeline runs where we processed a subset of IPHAS data that have image source density peaks over 170,000 per field of view (500,000 deg-2). Our analysis demonstrates that horizontal table partitions of declination widths of one-degree control the query run times. Usage of an index strategy where the partitions are densely sorted according to source declination yields another improvement. Most queries run in sublinear time and a few (< 20%) run in linear time, because of dependencies on input source-list and result-set size. We observed that for this logical database partitioning schema the limiting cadence the pipeline achieved with processing IPHAS data is 25 s.

  8. Flow Analysis Tool White Paper

    NASA Technical Reports Server (NTRS)

    Boscia, Nichole K.

    2012-01-01

    Faster networks are continually being built to accommodate larger data transfers. While it is intuitive to think that implementing faster networks will result in higher throughput rates, this is often not the case. There are many elements involved in data transfer, many of which are beyond the scope of the network itself. Although networks may get bigger and support faster technologies, the presence of other legacy components, such as older application software or kernel parameters, can often cause bottlenecks. Engineers must be able to identify when data flows are reaching a bottleneck that is not imposed by the network and then troubleshoot it using the tools available to them. The current best practice is to collect as much information as possible on the network traffic flows so that analysis is quick and easy. Unfortunately, no single method of collecting this information can sufficiently capture the whole endto- end picture. This becomes even more of a hurdle when large, multi-user systems are involved. In order to capture all the necessary information, multiple data sources are required. This paper presents a method for developing a flow analysis tool to effectively collect network flow data from multiple sources and provide that information to engineers in a clear, concise way for analysis. The purpose of this method is to collect enough information to quickly (and automatically) identify poorly performing flows along with the cause of the problem. The method involves the development of a set of database tables that can be populated with flow data from multiple sources, along with an easyto- use, web-based front-end interface to help network engineers access, organize, analyze, and manage all the information.

  9. California Fault Parameters for the National Seismic Hazard Maps and Working Group on California Earthquake Probabilities 2007

    USGS Publications Warehouse

    Wills, Chris J.; Weldon, Ray J.; Bryant, W.A.

    2008-01-01

    This report describes development of fault parameters for the 2007 update of the National Seismic Hazard Maps and the Working Group on California Earthquake Probabilities (WGCEP, 2007). These reference parameters are contained within a database intended to be a source of values for use by scientists interested in producing either seismic hazard or deformation models to better understand the current seismic hazards in California. These parameters include descriptions of the geometry and rates of movements of faults throughout the state. These values are intended to provide a starting point for development of more sophisticated deformation models which include known rates of movement on faults as well as geodetic measurements of crustal movement and the rates of movements of the tectonic plates. The values will be used in developing the next generation of the time-independent National Seismic Hazard Maps, and the time-dependant seismic hazard calculations being developed for the WGCEP. Due to the multiple uses of this information, development of these parameters has been coordinated between USGS, CGS and SCEC. SCEC provided the database development and editing tools, in consultation with USGS, Golden. This database has been implemented in Oracle and supports electronic access (e.g., for on-the-fly access). A GUI-based application has also been developed to aid in populating the database. Both the continually updated 'living' version of this database, as well as any locked-down official releases (e.g., used in a published model for calculating earthquake probabilities or seismic shaking hazards) are part of the USGS Quaternary Fault and Fold Database http://earthquake.usgs.gov/regional/qfaults/ . CGS has been primarily responsible for updating and editing of the fault parameters, with extensive input from USGS and SCEC scientists.

  10. dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts

    PubMed Central

    Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre

    2013-01-01

    The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284

  11. EpiCollect: linking smartphones to web applications for epidemiology, ecology and community data collection.

    PubMed

    Aanensen, David M; Huntley, Derek M; Feil, Edward J; al-Own, Fada'a; Spratt, Brian G

    2009-09-16

    Epidemiologists and ecologists often collect data in the field and, on returning to their laboratory, enter their data into a database for further analysis. The recent introduction of mobile phones that utilise the open source Android operating system, and which include (among other features) both GPS and Google Maps, provide new opportunities for developing mobile phone applications, which in conjunction with web applications, allow two-way communication between field workers and their project databases. Here we describe a generic framework, consisting of mobile phone software, EpiCollect, and a web application located within www.spatialepidemiology.net. Data collected by multiple field workers can be submitted by phone, together with GPS data, to a common web database and can be displayed and analysed, along with previously collected data, using Google Maps (or Google Earth). Similarly, data from the web database can be requested and displayed on the mobile phone, again using Google Maps. Data filtering options allow the display of data submitted by the individual field workers or, for example, those data within certain values of a measured variable or a time period. Data collection frameworks utilising mobile phones with data submission to and from central databases are widely applicable and can give a field worker similar display and analysis tools on their mobile phone that they would have if viewing the data in their laboratory via the web. We demonstrate their utility for epidemiological data collection and display, and briefly discuss their application in ecological and community data collection. Furthermore, such frameworks offer great potential for recruiting 'citizen scientists' to contribute data easily to central databases through their mobile phone.

  12. The NASA Ames PAH IR Spectroscopic Database: Computational Version 3.00 with Updated Content and the Introduction of Multiple Scaling Factors

    NASA Astrophysics Data System (ADS)

    Bauschlicher, Charles W., Jr.; Ricca, A.; Boersma, C.; Allamandola, L. J.

    2018-02-01

    Version 3.00 of the library of computed spectra in the NASA Ames PAH IR Spectroscopic Database (PAHdb) is described. Version 3.00 introduces the use of multiple scale factors, instead of the single scaling factor used previously, to align the theoretical harmonic frequencies with the experimental fundamentals. The use of multiple scale factors permits the use of a variety of basis sets; this allows new PAH species to be included in the database, such as those containing oxygen, and yields an improved treatment of strained species and those containing nitrogen. In addition, the computed spectra of 2439 new PAH species have been added. The impact of these changes on the analysis of an astronomical spectrum through database-fitting is considered and compared with a fit using Version 2.00 of the library of computed spectra. Finally, astronomical constraints are defined for the PAH spectral libraries in PAHdb.

  13. Enhancing GADRAS Source Term Inputs for Creation of Synthetic Spectra.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Horne, Steven M.; Harding, Lee

    The Gamma Detector Response and Analysis Software (GADRAS) team has enhanced the source term input for the creation of synthetic spectra. These enhancements include the following: allowing users to programmatically provide source information to GADRAS through memory, rather than through a string limited to 256 characters; allowing users to provide their own source decay database information; and updating the default GADRAS decay database to fix errors and include coincident gamma information.

  14. A Source-based Measurement Database for Occupational Exposure Assessment of Electromagnetic Fields in the INTEROCC Study: A Literature Review Approach

    PubMed Central

    Vila, Javier; Bowman, Joseph D.; Richardson, Lesley; Kincl, Laurel; Conover, Dave L.; McLean, Dave; Mann, Simon; Vecchia, Paolo; van Tongeren, Martie; Cardis, Elisabeth

    2016-01-01

    Introduction: To date, occupational exposure assessment of electromagnetic fields (EMF) has relied on occupation-based measurements and exposure estimates. However, misclassification due to between-worker variability remains an unsolved challenge. A source-based approach, supported by detailed subject data on determinants of exposure, may allow for a more individualized exposure assessment. Detailed information on the use of occupational sources of exposure to EMF was collected as part of the INTERPHONE-INTEROCC study. To support a source-based exposure assessment effort within this study, this work aimed to construct a measurement database for the occupational sources of EMF exposure identified, assembling available measurements from the scientific literature. Methods: First, a comprehensive literature search was performed for published and unpublished documents containing exposure measurements for the EMF sources identified, a priori as well as from answers of study subjects. Then, the measurements identified were assessed for quality and relevance to the study objectives. Finally, the measurements selected and complementary information were compiled into an Occupational Exposure Measurement Database (OEMD). Results: Currently, the OEMD contains 1624 sets of measurements (>3000 entries) for 285 sources of EMF exposure, organized by frequency band (0 Hz to 300 GHz) and dosimetry type. Ninety-five documents were selected from the literature (almost 35% of them are unpublished technical reports), containing measurements which were considered informative and valid for our purpose. Measurement data and complementary information collected from these documents came from 16 different countries and cover the time period between 1974 and 2013. Conclusion: We have constructed a database with measurements and complementary information for the most common sources of exposure to EMF in the workplace, based on the responses to the INTERPHONE-INTEROCC study questionnaire. This database covers the entire EMF frequency range and represents the most comprehensive resource of information on occupational EMF exposure. It is available at www.crealradiation.com/index.php/en/databases. PMID:26493616

  15. The Open Spectral Database: an open platform for sharing and searching spectral data.

    PubMed

    Chalk, Stuart J

    2016-01-01

    A number of websites make available spectral data for download (typically as JCAMP-DX text files) and one (ChemSpider) that also allows users to contribute spectral files. As a result, searching and retrieving such spectral data can be time consuming, and difficult to reuse if the data is compressed in the JCAMP-DX file. What is needed is a single resource that allows submission of JCAMP-DX files, export of the raw data in multiple formats, searching based on multiple chemical identifiers, and is open in terms of license and access. To address these issues a new online resource called the Open Spectral Database (OSDB) http://osdb.info/ has been developed and is now available. Built using open source tools, using open code (hosted on GitHub), providing open data, and open to community input about design and functionality, the OSDB is available for anyone to submit spectral data, making it searchable and available to the scientific community. This paper details the concept and coding, internal architecture, export formats, Representational State Transfer (REST) Application Programming Interface and options for submission of data. The OSDB website went live in November 2015. Concurrently, the GitHub repository was made available at https://github.com/stuchalk/OSDB/, and is open for collaborators to join the project, submit issues, and contribute code. The combination of a scripting environment (PHPStorm), a PHP Framework (CakePHP), a relational database (MySQL) and a code repository (GitHub) provides all the capabilities to easily develop REST based websites for ingestion, curation and exposure of open chemical data to the community at all levels. It is hoped this software stack (or equivalent ones in other scripting languages) will be leveraged to make more chemical data available for both humans and computers.

  16. Online bibliographic sources in hydrology

    USGS Publications Warehouse

    Wild, Emily C.; Havener, W. Michael

    2001-01-01

    Traditional commercial bibliographic databases and indexes provide some access to hydrology materials produced by the government; however, these sources do not provide comprehensive coverage of relevant hydrologic publications. This paper discusses bibliographic information available from the federal government and state geological surveys, water resources agencies, and depositories. In addition to information in these databases, the paper describes the scope, styles of citing, subject terminology, and the ways these information sources are currently being searched, formally and informally, by hydrologists. Information available from the federal and state agencies and from the state depositories might be missed by limiting searches to commercially distributed databases.

  17. Validation of asthma recording in electronic health records: a systematic review

    PubMed Central

    Nissen, Francis; Quint, Jennifer K; Wilkinson, Samantha; Mullerova, Hana; Smeeth, Liam; Douglas, Ian J

    2017-01-01

    Objective To describe the methods used to validate asthma diagnoses in electronic health records and summarize the results of the validation studies. Background Electronic health records are increasingly being used for research on asthma to inform health services and health policy. Validation of the recording of asthma diagnoses in electronic health records is essential to use these databases for credible epidemiological asthma research. Methods We searched EMBASE and MEDLINE databases for studies that validated asthma diagnoses detected in electronic health records up to October 2016. Two reviewers independently assessed the full text against the predetermined inclusion criteria. Key data including author, year, data source, case definitions, reference standard, and validation statistics (including sensitivity, specificity, positive predictive value [PPV], and negative predictive value [NPV]) were summarized in two tables. Results Thirteen studies met the inclusion criteria. Most studies demonstrated a high validity using at least one case definition (PPV >80%). Ten studies used a manual validation as the reference standard; each had at least one case definition with a PPV of at least 63%, up to 100%. We also found two studies using a second independent database to validate asthma diagnoses. The PPVs of the best performing case definitions ranged from 46% to 58%. We found one study which used a questionnaire as the reference standard to validate a database case definition; the PPV of the case definition algorithm in this study was 89%. Conclusion Attaining high PPVs (>80%) is possible using each of the discussed validation methods. Identifying asthma cases in electronic health records is possible with high sensitivity, specificity or PPV, by combining multiple data sources, or by focusing on specific test measures. Studies testing a range of case definitions show wide variation in the validity of each definition, suggesting this may be important for obtaining asthma definitions with optimal validity. PMID:29238227

  18. Detecting and accounting for multiple sources of positional variance in peak list registration analysis and spin system grouping.

    PubMed

    Smelter, Andrey; Rouchka, Eric C; Moseley, Hunter N B

    2017-08-01

    Peak lists derived from nuclear magnetic resonance (NMR) spectra are commonly used as input data for a variety of computer assisted and automated analyses. These include automated protein resonance assignment and protein structure calculation software tools. Prior to these analyses, peak lists must be aligned to each other and sets of related peaks must be grouped based on common chemical shift dimensions. Even when programs can perform peak grouping, they require the user to provide uniform match tolerances or use default values. However, peak grouping is further complicated by multiple sources of variance in peak position limiting the effectiveness of grouping methods that utilize uniform match tolerances. In addition, no method currently exists for deriving peak positional variances from single peak lists for grouping peaks into spin systems, i.e. spin system grouping within a single peak list. Therefore, we developed a complementary pair of peak list registration analysis and spin system grouping algorithms designed to overcome these limitations. We have implemented these algorithms into an approach that can identify multiple dimension-specific positional variances that exist in a single peak list and group peaks from a single peak list into spin systems. The resulting software tools generate a variety of useful statistics on both a single peak list and pairwise peak list alignment, especially for quality assessment of peak list datasets. We used a range of low and high quality experimental solution NMR and solid-state NMR peak lists to assess performance of our registration analysis and grouping algorithms. Analyses show that an algorithm using a single iteration and uniform match tolerances approach is only able to recover from 50 to 80% of the spin systems due to the presence of multiple sources of variance. Our algorithm recovers additional spin systems by reevaluating match tolerances in multiple iterations. To facilitate evaluation of the algorithms, we developed a peak list simulator within our nmrstarlib package that generates user-defined assigned peak lists from a given BMRB entry or database of entries. In addition, over 100,000 simulated peak lists with one or two sources of variance were generated to evaluate the performance and robustness of these new registration analysis and peak grouping algorithms.

  19. Dietary intake and main sources of plant lignans in five European countries

    PubMed Central

    Tetens, Inge; Turrini, Aida; Tapanainen, Heli; Christensen, Tue; Lampe, Johanna W.; Fagt, Sisse; Håkansson, Niclas; Lundquist, Annamari; Hallund, Jesper; Valsta, Liisa M.

    2013-01-01

    Background Dietary intakes of plant lignans have been hypothesized to be inversely associated with the risk of developing cardiovascular disease and cancer. Earlier studies were based on a Finnish lignan database (Fineli®) with two lignan precursors, secoisolariciresinol (SECO) and matairesinol (MAT). More recently, a Dutch database, including SECO and MAT and the newly recognized lignan precursors lariciresinol (LARI) and pinoresinol (PINO), was compiled. The objective was to re-estimate and re-evaluate plant lignan intakes and to identify the main sources of plant lignans in five European countries using the Finnish and Dutch lignan databases, respectively. Methods Forty-two food groups known to contribute to the total lignan intake were selected and attributed a value for SECO and MAT from the Finnish lignan database (Fineli®) or for SECO, MAT, LARI, and PINO from the Dutch database. Total intake of lignans was estimated from food consumption data for adult men and women (19–79 years) from Denmark, Finland, Italy, Sweden, United Kingdom, and the contribution of aggregated food groups calculated using the Dutch lignin database. Results Mean dietary lignan intakes estimated using the Dutch database ranged from 1 to 2 mg/day, which was approximately four-fold higher than the intakes estimated from the Fineli® database. When LARI and PINO were included in the estimation of the total lignan intakes, cereals, grain products, vegetables, fruit and berries were the most important dietary sources of lignans. Conclusion Total lignin intake was approximately four-fold higher in the Dutch lignin database, which includes the lignin precursors LARI and PINO, compared to estimates based on the Finnish database based only on SECO and MAT. The main sources of lignans according to the Dutch database in the five countries studied were cereals and grain products, vegetables, fruit, berries, and beverages. PMID:23766759

  20. Understanding the Groundwater Hydrology of a Geographically-Isolated Prairie Fen: Implications for Conservation.

    PubMed

    Sampath, Prasanna Venkatesh; Liao, Hua-Sheng; Curtis, Zachary Kristopher; Doran, Patrick J; Herbert, Matthew E; May, Christopher A; Li, Shu-Guang

    2015-01-01

    The sources of water and corresponding delivery mechanisms to groundwater-fed fens are not well understood due to the multi-scale geo-morphologic variability of the glacial landscape in which they occur. This lack of understanding limits the ability to effectively conserve these systems and the ecosystem services they provide, including biodiversity and water provisioning. While fens tend to occur in clusters around regional groundwater mounds, Ives Road Fen in southern Michigan is an example of a geographically-isolated fen. In this paper, we apply a multi-scale groundwater modeling approach to understand the groundwater sources for Ives Road fen. We apply Transition Probability geo-statistics on more than 3000 well logs from a state-wide water well database to characterize the complex geology using conditional simulations. We subsequently implement a 3-dimensional reverse particle tracking to delineate groundwater contribution areas to the fen. The fen receives water from multiple sources: local recharge, regional recharge from an extensive till plain, a regional groundwater mound, and a nearby pond. The regional sources deliver water through a tortuous, 3-dimensional "pipeline" consisting of a confined aquifer lying beneath an extensive clay layer. Water in this pipeline reaches the fen by upwelling through openings in the clay layer. The pipeline connects the geographically-isolated fen to the same regional mound that provides water to other fen clusters in southern Michigan. The major implication of these findings is that fen conservation efforts must be expanded from focusing on individual fens and their immediate surroundings, to studying the much larger and inter-connected hydrologic network that sustains multiple fens.

  1. Survey on the use of information sources in the field of aging.

    PubMed

    Bird, G; Heekin, J M

    1994-01-01

    This article presents the results of a survey conducted over the summer of 1992 on the use of information sources by professionals in the field of aging. In particular, factors affecting the use of electronic information sources were investigated. The data provide a demographic profile of North American gerontologists, with a predictably wide range of disciplines and types of practice represented. Several factors were found to have an impact on the gerontologists' utilization of electronic information sources. Respondents who used a larger-than-average number of computer applications were found to make relatively more use of electronic sources, including online searches, CD-ROM indexes, library OPACs, and other databases searched by remote access. Attendance at library workshops was found to increase the amount of end-user searching but not the amount of library-mediated searching. Respondents also reported which databases they used and which they considered most important. MEDLINE was the most frequently mentioned database across all disciplines, including the health and social sciences. Computer databases were ranked least important out of six listed sources of information, and only 5% of respondents reported having used an electronic current awareness profile.

  2. Comparative and Predictive Multimedia Assessments Using Monte Carlo Uncertainty Analyses

    NASA Astrophysics Data System (ADS)

    Whelan, G.

    2002-05-01

    Multiple-pathway frameworks (sometimes referred to as multimedia models) provide a platform for combining medium-specific environmental models and databases, such that they can be utilized in a more holistic assessment of contaminant fate and transport in the environment. These frameworks provide a relatively seamless transfer of information from one model to the next and from databases to models. Within these frameworks, multiple models are linked, resulting in models that consume information from upstream models and produce information to be consumed by downstream models. The Framework for Risk Analysis in Multimedia Environmental Systems (FRAMES) is an example, which allows users to link their models to other models and databases. FRAMES is an icon-driven, site-layout platform that is an open-architecture, object-oriented system that interacts with environmental databases; helps the user construct a Conceptual Site Model that is real-world based; allows the user to choose the most appropriate models to solve simulation requirements; solves the standard risk paradigm of release transport and fate; and exposure/risk assessments to people and ecology; and presents graphical packages for analyzing results. FRAMES is specifically designed allow users to link their own models into a system, which contains models developed by others. This paper will present the use of FRAMES to evaluate potential human health exposures using real site data and realistic assumptions from sources, through the vadose and saturated zones, to exposure and risk assessment at three real-world sites, using the Multimedia Environmental Pollutant Assessment System (MEPAS), which is a multimedia model contained within FRAMES. These real-world examples use predictive and comparative approaches coupled with a Monte Carlo analysis. A predictive analysis is where models are calibrated to monitored site data, prior to the assessment, and a comparative analysis is where models are not calibrated but based solely on literature or judgement and is usually used to compare alternatives. In many cases, a combination is employed where the model is calibrated to a portion of the data (e.g., to determine hydrodynamics), then used to compare alternatives. Three subsurface-based multimedia examples are presented, increasing in complexity. The first presents the application of a predictive, deterministic assessment; the second presents a predictive and comparative, Monte Carlo analysis; and the third presents a comparative, multi-dimensional Monte Carlo analysis. Endpoints are typically presented in terms of concentration, hazard, risk, and dose, and because the vadose zone model typically represents a connection between a source and the aquifer, it does not generally represent the final medium in a multimedia risk assessment.

  3. Evaluation of DNA mixtures from database search.

    PubMed

    Chung, Yuk-Ka; Hu, Yue-Qing; Fung, Wing K

    2010-03-01

    With the aim of bridging the gap between DNA mixture analysis and DNA database search, a novel approach is proposed to evaluate the forensic evidence of DNA mixtures when the suspect is identified by the search of a database of DNA profiles. General formulae are developed for the calculation of the likelihood ratio for a two-person mixture under general situations including multiple matches and imperfect evidence. The influence of the prior probabilities on the weight of evidence under the scenario of multiple matches is demonstrated by a numerical example based on Hong Kong data. Our approach is shown to be capable of presenting the forensic evidence of DNA mixtures in a comprehensive way when the suspect is identified through database search.

  4. Building Databases for Education. ERIC Digest.

    ERIC Educational Resources Information Center

    Klausmeier, Jane A.

    This digest provides a brief explanation of what a database is; explains how a database can be used; identifies important factors that should be considered when choosing database management system software; and provides citations to sources for finding reviews and evaluations of database management software. The digest is concerned primarily with…

  5. Development of a data entry auditing protocol and quality assurance for a tissue bank database.

    PubMed

    Khushi, Matloob; Carpenter, Jane E; Balleine, Rosemary L; Clarke, Christine L

    2012-03-01

    Human transcription error is an acknowledged risk when extracting information from paper records for entry into a database. For a tissue bank, it is critical that accurate data are provided to researchers with approved access to tissue bank material. The challenges of tissue bank data collection include manual extraction of data from complex medical reports that are accessed from a number of sources and that differ in style and layout. As a quality assurance measure, the Breast Cancer Tissue Bank (http:\\\\www.abctb.org.au) has implemented an auditing protocol and in order to efficiently execute the process, has developed an open source database plug-in tool (eAuditor) to assist in auditing of data held in our tissue bank database. Using eAuditor, we have identified that human entry errors range from 0.01% when entering donor's clinical follow-up details, to 0.53% when entering pathological details, highlighting the importance of an audit protocol tool such as eAuditor in a tissue bank database. eAuditor was developed and tested on the Caisis open source clinical-research database; however, it can be integrated in other databases where similar functionality is required.

  6. Bioinformatics Approaches to Classifying Allergens and Predicting Cross-Reactivity

    PubMed Central

    Schein, Catherine H.; Ivanciuc, Ovidiu; Braun, Werner

    2007-01-01

    The major advances in understanding why patients respond to several seemingly different stimuli have been through the isolation, sequencing and structural analysis of proteins that induce an IgE response. The most significant finding is that allergenic proteins from very different sources can have nearly identical sequences and structures, and that this similarity can account for clinically observed cross-reactivity. The increasing amount of information on the sequence, structure and IgE epitopes of allergens is now available in several databases and powerful bioinformatics search tools allow user access to relevant information. Here, we provide an overview of these databases and describe state-of-the art bioinformatics tools to identify the common proteins that may be at the root of multiple allergy syndromes. Progress has also been made in quantitatively defining characteristics that discriminate allergens from non-allergens. Search and software tools for this purpose have been developed and implemented in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/). SDAP contains information for over 800 allergens and extensive bibliographic references in a relational database with links to other publicly available databases. SDAP is freely available on the Web to clinicians and patients, and can be used to find structural and functional relations among known allergens and to identify potentially cross-reacting antigens. Here we illustrate how these bioinformatics tools can be used to group allergens, and to detect areas that may account for common patterns of IgE binding and cross-reactivity. Such results can be used to guide treatment regimens for allergy sufferers. PMID:17276876

  7. Historical reconstructions of California wildfires vary by data source

    USGS Publications Warehouse

    Syphard, Alexandra D.; Keeley, Jon E.

    2016-01-01

    Historical data are essential for understanding how fire activity responds to different drivers. It is important that the source of data is commensurate with the spatial and temporal scale of the question addressed, but fire history databases are derived from different sources with different restrictions. In California, a frequently used fire history dataset is the State of California Fire and Resource Assessment Program (FRAP) fire history database, which circumscribes fire perimeters at a relatively fine scale. It includes large fires on both state and federal lands but only covers fires that were mapped or had other spatially explicit data. A different database is the state and federal governments’ annual reports of all fires. They are more complete than the FRAP database but are only spatially explicit to the level of county (California Department of Forestry and Fire Protection – Cal Fire) or forest (United States Forest Service – USFS). We found substantial differences between the FRAP database and the annual summaries, with the largest and most consistent discrepancy being in fire frequency. The FRAP database missed the majority of fires and is thus a poor indicator of fire frequency or indicators of ignition sources. The FRAP database is also deficient in area burned, especially before 1950. Even in contemporary records, the huge number of smaller fires not included in the FRAP database account for substantial cumulative differences in area burned. Wildfires in California account for nearly half of the western United States fire suppression budget. Therefore, the conclusions about data discrepancies and the implications for fire research are of broad importance.

  8. Active fault databases: building a bridge between earthquake geologists and seismic hazard practitioners, the case of the QAFI v.3 database

    NASA Astrophysics Data System (ADS)

    García-Mayordomo, Julián; Martín-Banda, Raquel; Insua-Arévalo, Juan M.; Álvarez-Gómez, José A.; Martínez-Díaz, José J.; Cabral, João

    2017-08-01

    Active fault databases are a very powerful and useful tool in seismic hazard assessment, particularly when singular faults are considered seismogenic sources. Active fault databases are also a very relevant source of information for earth scientists, earthquake engineers and even teachers or journalists. Hence, active fault databases should be updated and thoroughly reviewed on a regular basis in order to keep a standard quality and uniformed criteria. Desirably, active fault databases should somehow indicate the quality of the geological data and, particularly, the reliability attributed to crucial fault-seismic parameters, such as maximum magnitude and recurrence interval. In this paper we explain how we tackled these issues during the process of updating and reviewing the Quaternary Active Fault Database of Iberia (QAFI) to its current version 3. We devote particular attention to describing the scheme devised for classifying the quality and representativeness of the geological evidence of Quaternary activity and the accuracy of the slip rate estimation in the database. Subsequently, we use this information as input for a straightforward rating of the level of reliability of maximum magnitude and recurrence interval fault seismic parameters. We conclude that QAFI v.3 is a much better database than version 2 either for proper use in seismic hazard applications or as an informative source for non-specialized users. However, we already envision new improvements for a future update.

  9. FishTraits Database

    USGS Publications Warehouse

    Angermeier, Paul L.; Frimpong, Emmanuel A.

    2009-01-01

    The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. FishTraits is a database of >100 traits for 809 (731 native and 78 exotic) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database contains information on four major categories of traits: (1) trophic ecology, (2) body size and reproductive ecology (life history), (3) habitat associations, and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status is also included. Together, we refer to the traits, distribution, and conservation status information as attributes. Descriptions of attributes are available here. Many sources were consulted to compile attributes, including state and regional species accounts and other databases.

  10. A Bayesian Multivariate Receptor Model for Estimating Source Contributions to Particulate Matter Pollution using National Databases.

    PubMed

    Hackstadt, Amber J; Peng, Roger D

    2014-11-01

    Time series studies have suggested that air pollution can negatively impact health. These studies have typically focused on the total mass of fine particulate matter air pollution or the individual chemical constituents that contribute to it, and not source-specific contributions to air pollution. Source-specific contribution estimates are useful from a regulatory standpoint by allowing regulators to focus limited resources on reducing emissions from sources that are major contributors to air pollution and are also desired when estimating source-specific health effects. However, researchers often lack direct observations of the emissions at the source level. We propose a Bayesian multivariate receptor model to infer information about source contributions from ambient air pollution measurements. The proposed model incorporates information from national databases containing data on both the composition of source emissions and the amount of emissions from known sources of air pollution. The proposed model is used to perform source apportionment analyses for two distinct locations in the United States (Boston, Massachusetts and Phoenix, Arizona). Our results mirror previous source apportionment analyses that did not utilize the information from national databases and provide additional information about uncertainty that is relevant to the estimation of health effects.

  11. Catalogue of UV sources in the Galaxy

    NASA Astrophysics Data System (ADS)

    Beitia-Antero, L.; Gómez de Castro, A. I.

    2017-03-01

    The Galaxy Evolution Explorer (GALEX) ultraviolet (UV) database contains the largest photometric catalogue in the ultraviolet range; as a result GALEX photometric bands, Near UV band (NUV) and the Far UV band (FUV), have become standards. Nevertheless, the GALEX catalogue does not include bright UV sources due to the high sensitivity of its detectors, neither sources in the Galactic plane. In order to extend the GALEX database for future UV missions, we have obtained synthetic FUV and NUV photometry using the database of UV spectra generated by the International Ultraviolet Explorer (IUE). This database contains 63,755 spectra in the low dispersion mode (λ / δ λ ˜ 300) obtained during its 18-year lifetime. For stellar sources in the IUE database, we have selected spectra with high Signal-To-NoiseRatio (SNR) and computed FUV and NUV magnitudes using the GALEX transmission curves along with the conversion equations between flux and magnitudes provided by the mission. Besides, we have performed variability tests to determine whether the sources were variable (during the IUE observations). As a result, we have generated two different catalogues: one for non-variable stars and another one for variable sources. The former contains FUV and NUV magnitudes, while the latter gives the basic information and the FUV magnitude for each observation. The consistency of the magnitudes has been tested using White Dwarfs contained in both GALEX and IUE samples. The catalogues are available through the Centre des Donées Stellaires. The sources are distributed throughout the whole sky, with a special coverage of the Galactic plane.

  12. Integration of multiple DICOM Web servers into an enterprise-wide Web-based electronic medical record

    NASA Astrophysics Data System (ADS)

    Stewart, Brent K.; Langer, Steven G.; Martin, Kelly P.

    1999-07-01

    The purpose of this paper is to integrate multiple DICOM image webservers into the currently existing enterprises- wide web-browsable electronic medical record. Over the last six years the University of Washington has created a clinical data repository combining in a distributed relational database information from multiple departmental databases (MIND). A character cell-based view of this data called the Mini Medical Record (MMR) has been available for four years, MINDscape, unlike the text-based MMR. provides a platform independent, dynamic, web browser view of the MIND database that can be easily linked with medical knowledge resources on the network, like PubMed and the Federated Drug Reference. There are over 10,000 MINDscape user accounts at the University of Washington Academic Medical Centers. The weekday average number of hits to MINDscape is 35,302 and weekday average number of individual users is 1252. DICOM images from multiple webservers are now being viewed through the MINDscape electronic medical record.

  13. Distribution System Upgrade Unit Cost Database

    DOE Data Explorer

    Horowitz, Kelsey

    2017-11-30

    This database contains unit cost information for different components that may be used to integrate distributed photovotaic (D-PV) systems onto distribution systems. Some of these upgrades and costs may also apply to integration of other distributed energy resources (DER). Which components are required, and how many of each, is system-specific and should be determined by analyzing the effects of distributed PV at a given penetration level on the circuit of interest in combination with engineering assessments on the efficacy of different solutions to increase the ability of the circuit to host additional PV as desired. The current state of the distribution system should always be considered in these types of analysis. The data in this database was collected from a variety of utilities, PV developers, technology vendors, and published research reports. Where possible, we have included information on the source of each data point and relevant notes. In some cases where data provided is sensitive or proprietary, we were not able to specify the source, but provide other information that may be useful to the user (e.g. year, location where equipment was installed). NREL has carefully reviewed these sources prior to inclusion in this database. Additional information about the database, data sources, and assumptions is included in the "Unit_cost_database_guide.doc" file included in this submission. This guide provides important information on what costs are included in each entry. Please refer to this guide before using the unit cost database for any purpose.

  14. Trichloroethylene: Mechanistic, epidemiologic and other supporting evidence of carcinogenic hazard.

    PubMed

    Rusyn, Ivan; Chiu, Weihsueh A; Lash, Lawrence H; Kromhout, Hans; Hansen, Johnni; Guyton, Kathryn Z

    2014-01-01

    The chlorinated solvent trichloroethylene (TCE) is a ubiquitous environmental pollutant. The carcinogenic hazard of TCE was the subject of a 2012 evaluation by a Working Group of the International Agency for Research on Cancer (IARC). Information on exposures, relevant data from epidemiologic studies, bioassays in experimental animals, and toxicity and mechanism of action studies was used to conclude that TCE is carcinogenic to humans (Group 1). This article summarizes the key evidence forming the scientific bases for the IARC classification. Exposure to TCE from environmental sources (including hazardous waste sites and contaminated water) is common throughout the world. While workplace use of TCE has been declining, occupational exposures remain of concern, especially in developing countries. The strongest human evidence is from studies of occupational TCE exposure and kidney cancer. Positive, although less consistent, associations were reported for liver cancer and non-Hodgkin lymphoma. TCE is carcinogenic at multiple sites in multiple species and strains of experimental animals. The mechanistic evidence includes extensive data on the toxicokinetics and genotoxicity of TCE and its metabolites. Together, available evidence provided a cohesive database supporting the human cancer hazard of TCE, particularly in the kidney. For other target sites of carcinogenicity, mechanistic and other data were found to be more limited. Important sources of susceptibility to TCE toxicity and carcinogenicity were also reviewed by the Working Group. In all, consideration of the multiple evidence streams presented herein informed the IARC conclusions regarding the carcinogenicity of TCE. © 2013.

  15. Trichloroethylene: Mechanistic, Epidemiologic and Other Supporting Evidence of Carcinogenic Hazard

    PubMed Central

    Rusyn, Ivan; Chiu, Weihsueh A.; Lash, Lawrence H.; Kromhout, Hans; Hansen, Johnni; Guyton, Kathryn Z.

    2013-01-01

    The chlorinated solvent trichloroethylene (TCE) is a ubiquitous environmental pollutant. The carcinogenic hazard of TCE was the subject of a 2012 evaluation by a Working Group of the International Agency for Research on Cancer (IARC). Information on exposures, relevant data from epidemiologic studies, bioassays in experimental animals, and toxicity and mechanism of action studies was used to conclude that TCE is carcinogenic to humans (Group 1). This article summarizes the key evidence forming the scientific bases for the IARC classification. Exposure to TCE from environmental sources (including from hazardous waste sites and contaminated water) is common throughout the world. While workplace use of TCE has been declining, occupational exposures remain of concern, especially in developing countries. Strongest human evidence is from studies of occupational TCE exposure and kidney cancer. Positive, although less consistent, associations were reported for liver cancer and non-Hodgkin's lymphoma. TCE is carcinogenic at multiple sites in multiple species and strains of experimental animals. The mechanistic evidence includes extensive data on the toxicokinetics and genotoxicity of TCE and its metabolites. Together, available evidence provided a cohesive database supporting the human cancer hazard of TCE, particularly in the kidney. For other target sites of carcinogenicity, mechanistic and other data were found to be more limited. Important sources of susceptibility to TCE toxicity and carcinogenicity were also reviewed by the Working Group. In all, consideration of the multiple evidence streams presented herein informed the IARC conclusions regarding the carcinogenicity of TCE. PMID:23973663

  16. System for Performing Single Query Searches of Heterogeneous and Dispersed Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)

    2017-01-01

    The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.

  17. Capture and Three Dimensional Projection of New South Wales Strata Plans in Landxml Format

    NASA Astrophysics Data System (ADS)

    Harding, B.; Foreman, A.

    2017-10-01

    New South Wales is embarking on a major reform program named Cadastre NSW. This reform aims to move to a single source of truth for the digital representation of cadastre. The current lack of a single source cadastre has hindered users from government and industry due to duplication of effort and misalignment between databases from different sources. For this reform to be successful, there are some challenges that need to be addressed. "Cadastre 2034 - Powering Land & Real Property" (2015) published by the Intergovernmental Committee on Surveying and Mapping (ICSM) identifies that current cadastres do not represent real property in three dimensions. In future vertical living lifestyles will create complex property scenarios that the Digital Cadastral Database (DCDB) will need to contend with. While the NSW DCDB currently holds over 3 million lots and 5 million features, one of its limitations is that it does not indicate land ownership above or below the ground surface. NSW Spatial Services is currently capturing survey plans into LandXML format. To prepare for the future, research is being undertaken to also capture multi-level Strata Plans through a modified recipe. During this research, multiple Strata Plans representing a range of ages and development types have been investigated and converted to LandXML. Since it is difficult to visualise the plans in a two dimensional format, quality control purposes require a method to display these plans in three dimensions. Overall investigations have provided Spatial Services with enough information to confirm that the capture and display of Strata Plans in the LandXML format is possible.

  18. Database of potential sources for earthquakes larger than magnitude 6 in Northern California

    USGS Publications Warehouse

    ,

    1996-01-01

    The Northern California Earthquake Potential (NCEP) working group, composed of many contributors and reviewers in industry, academia and government, has pooled its collective expertise and knowledge of regional tectonics to identify potential sources of large earthquakes in northern California. We have created a map and database of active faults, both surficial and buried, that forms the basis for the northern California portion of the national map of probabilistic seismic hazard. The database contains 62 potential sources, including fault segments and areally distributed zones. The working group has integrated constraints from broadly based plate tectonic and VLBI models with local geologic slip rates, geodetic strain rate, and microseismicity. Our earthquake source database derives from a scientific consensus that accounts for conflict in the diverse data. Our preliminary product, as described in this report brings to light many gaps in the data, including a need for better information on the proportion of deformation in fault systems that is aseismic.

  19. Use of large healthcare databases for rheumatology clinical research.

    PubMed

    Desai, Rishi J; Solomon, Daniel H

    2017-03-01

    Large healthcare databases, which contain data collected during routinely delivered healthcare to patients, can serve as a valuable resource for generating actionable evidence to assist medical and healthcare policy decision-making. In this review, we summarize use of large healthcare databases in rheumatology clinical research. Large healthcare data are critical to evaluate medication safety and effectiveness in patients with rheumatologic conditions. Three major sources of large healthcare data are: first, electronic medical records, second, health insurance claims, and third, patient registries. Each of these sources offers unique advantages, but also has some inherent limitations. To address some of these limitations and maximize the utility of these data sources for evidence generation, recent efforts have focused on linking different data sources. Innovations such as randomized registry trials, which aim to facilitate design of low-cost randomized controlled trials built on existing infrastructure provided by large healthcare databases, are likely to make clinical research more efficient in coming years. Harnessing the power of information contained in large healthcare databases, while paying close attention to their inherent limitations, is critical to generate a rigorous evidence-base for medical decision-making and ultimately enhancing patient care.

  20. [Exploration and construction of the full-text database of acupuncture literature in the Republic of China].

    PubMed

    Fei, Lin; Zhao, Jing; Leng, Jiahao; Zhang, Shujian

    2017-10-12

    The ALIPORC full-text database is targeted at a specific full-text database of acupuncture literature in the Republic of China. Starting in 2015, till now, the database has been getting completed, focusing on books relevant with acupuncture, articles and advertising documents, accomplished or published in the Republic of China. The construction of this database aims to achieve the source sharing of acupuncture medical literature in the Republic of China through the retrieval approaches to diversity and accurate content presentation, contributes to the exchange of scholars, reduces the paper damage caused by paging and simplify the retrieval of the rare literature. The writers have made the explanation of the database in light of sources, characteristics and current situation of construction; and have discussed on improving the efficiency and integrity of the database and deepening the development of acupuncture literature in the Republic of China.

  1. The prevalence and incidence of systemic lupus erythematosus in children and adults: a population-based study in a mountain community in northern Italy.

    PubMed

    Tsioni, Vasiliki; Andreoli, Laura; Meini, Antonella; Frassi, Micol; Raffetti, Elena; Airò, Paolo; Allegri, Flavio; Donato, Francesco; Tincani, Angela

    2015-01-01

    To estimate prevalence and incidence of systemic lupus erythematosus (SLE) in paediatric and adult populations in Italy. The study was carried out in Valtrompia, a valley in northern Italy, where a relatively close community lives, in 2009-2012. The only referral centre for SLE in the area is the Rheumatology Unit of the University Hospital of Brescia. The ascertainment of SLE cases was performed through the integration of three sources: 1) hospital database; 2) database of the Rheumatology laboratory; 3) database of general practitioners and general paediatricians practicing in the area. Each patient was evaluated by a rheumatologist for confirmation of SLE classification based on the presence of at least 4 criteria according to the American College of Rheumatology. Forty-four SLE patients (39 females, 89%) were identified. The prevalence of SLE at 31st December 2012 was 39.2 (95% C.I. 28.5-52.6) cases per 100,000 individuals in all subjects, and 42.3 (30.5-57.2) and 15.3 (1.8-55.1) in adults and children, respectively. Nine new cases of SLE were diagnosed over the 4 years of the study period, with an annual incidence rate of 2.0 (0.9-3.8) per 100,000 individuals. This is the first study estimating the prevalence and incidence of SLE in Italy in both adult and paediatric population. Prevalence and incidence rates in line with those reported in other Mediterranean European countries. The accurate assessment of the SLE frequency is supported by the choice of a well-defined area, the integration of multiple data sources and the revision of each case by a rheumatologist.

  2. Resources | Division of Cancer Prevention

    Cancer.gov

    Manual of Operations Version 3, 12/13/2012 (PDF, 162KB) Database Sources Consortium for Functional Glycomics databases Design Studies Related to the Development of Distributed, Web-based European Carbohydrate Databases (EUROCarbDB) |

  3. The Microphysiology Systems Database for Analyzing and Modeling Compound Interactions with Human and Animal Organ Models

    PubMed Central

    Vernetti, Lawrence; Bergenthal, Luke; Shun, Tong Ying; Taylor, D. Lansing

    2016-01-01

    Abstract Microfluidic human organ models, microphysiology systems (MPS), are currently being developed as predictive models of drug safety and efficacy in humans. To design and validate MPS as predictive of human safety liabilities requires safety data for a reference set of compounds, combined with in vitro data from the human organ models. To address this need, we have developed an internet database, the MPS database (MPS-Db), as a powerful platform for experimental design, data management, and analysis, and to combine experimental data with reference data, to enable computational modeling. The present study demonstrates the capability of the MPS-Db in early safety testing using a human liver MPS to relate the effects of tolcapone and entacapone in the in vitro model to human in vivo effects. These two compounds were chosen to be evaluated as a representative pair of marketed drugs because they are structurally similar, have the same target, and were found safe or had an acceptable risk in preclinical and clinical trials, yet tolcapone induced unacceptable levels of hepatotoxicity while entacapone was found to be safe. Results demonstrate the utility of the MPS-Db as an essential resource for relating in vitro organ model data to the multiple biochemical, preclinical, and clinical data sources on in vivo drug effects. PMID:28781990

  4. Web-based access to near real-time and archived high-density time-series data: cyber infrastructure challenges & developments in the open-source Waveform Server

    NASA Astrophysics Data System (ADS)

    Reyes, J. C.; Vernon, F. L.; Newman, R. L.; Steidl, J. H.

    2010-12-01

    The Waveform Server is an interactive web-based interface to multi-station, multi-sensor and multi-channel high-density time-series data stored in Center for Seismic Studies (CSS) 3.0 schema relational databases (Newman et al., 2009). In the last twelve months, based on expanded specifications and current user feedback, both the server-side infrastructure and client-side interface have been extensively rewritten. The Python Twisted server-side code-base has been fundamentally modified to now present waveform data stored in cluster-based databases using a multi-threaded architecture, in addition to supporting the pre-existing single database model. This allows interactive web-based access to high-density (broadband @ 40Hz to strong motion @ 200Hz) waveform data that can span multiple years; the common lifetime of broadband seismic networks. The client-side interface expands on it's use of simple JSON-based AJAX queries to now incorporate a variety of User Interface (UI) improvements including standardized calendars for defining time ranges, applying on-the-fly data calibration to display SI-unit data, and increased rendering speed. This presentation will outline the various cyber infrastructure challenges we have faced while developing this application, the use-cases currently in existence, and the limitations of web-based application development.

  5. An "EAR" on environmental surveillance and monitoring: A ...

    EPA Pesticide Factsheets

    Current environmental monitoring approaches focus primarily on chemical occurrence. However, based on chemical concentration alone, it can be difficult to identify which compounds may be of toxicological concern for prioritization for further monitoring or management. This can be problematic because toxicological characterization is lacking for many emerging contaminants. New sources of high throughput screening data like the ToxCast™ database, which contains data for over 9,000 compounds screened through up to 1,100 assays, are now available. Integrated analysis of chemical occurrence data with HTS data offers new opportunities to prioritize chemicals, sites, or biological effects for further investigation based on concentrations detected in the environment linked to relative potencies in pathway-based bioassays. As a case study, chemical occurrence data from a 2012 study in the Great Lakes Basin along with the ToxCast™ effects database were used to calculate exposure-activity ratios (EARs) as a prioritization tool. Technical considerations of data processing and use of the ToxCast™ database are presented and discussed. EAR prioritization identified multiple sites, biological pathways, and chemicals that warrant further investigation. Biological pathways were then linked to adverse outcome pathways to identify potential adverse outcomes and biomarkers for use in subsequent monitoring efforts. Anthropogenic contaminants are frequently reported in environm

  6. The BioExtract Server: a web-based bioinformatic workflow platform

    PubMed Central

    Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

    2011-01-01

    The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blom, Philip Stephen; Marcillo, Omar Eduardo; Euler, Garrett Gene

    InfraPy is a Python-based analysis toolkit being development at LANL. The algorithms are intended for ground-based nuclear detonation detection applications to detect, locate, and characterize explosive sources using infrasonic observations. The implementation is usable as a stand-alone Python library or as a command line driven tool operating directly on a database. With multiple scientists working on the project, we've begun using a LANL git repository for collaborative development and version control. Current and planned work on InfraPy focuses on the development of new algorithms and propagation models. Collaboration with Southern Methodist University (SMU) has helped identify bugs and limitations ofmore » the algorithms. The current focus of usage development is focused on library imports and CLI.« less

  8. Beyond MEDLINE for literature searches.

    PubMed

    Conn, Vicki S; Isaramalai, Sang-arun; Rath, Sabyasachi; Jantarakupt, Peeranuch; Wadhawan, Rohini; Dash, Yashodhara

    2003-01-01

    To describe strategies for a comprehensive literature search. MEDLINE searches result in limited numbers of studies that are often biased toward statistically significant findings. Diversified search strategies are needed. Empirical evidence about the recall and precision of diverse search strategies is presented. Challenges and strengths of each search strategy are identified. Search strategies vary in recall and precision. Often sensitivity and specificity are inversely related. Valuable search strategies include examination of multiple diverse computerized databases, ancestry searches, citation index searches, examination of research registries, journal hand searching, contact with the "invisible college," examination of abstracts, Internet searches, and contact with sources of synthesized information. Extending searches beyond MEDLINE enables researchers to conduct more systematic comprehensive searches.

  9. Noisy preferences in risky choice: A cautionary note.

    PubMed

    Bhatia, Sudeep; Loomes, Graham

    2017-10-01

    We examine the effects of multiple sources of noise in risky decision making. Noise in the parameters that characterize an individual's preferences can combine with noise in the response process to distort observed choice proportions. Thus, underlying preferences that conform to expected value maximization can appear to show systematic risk aversion or risk seeking. Similarly, core preferences that are consistent with expected utility theory, when perturbed by such noise, can appear to display nonlinear probability weighting. For this reason, modal choices cannot be used simplistically to infer underlying preferences. Quantitative model fits that do not allow for both sorts of noise can lead to wrong conclusions. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  10. The Ins and Outs of USDA Nutrient Composition

    USDA-ARS?s Scientific Manuscript database

    The USDA National Nutrient Database for Standard Reference (SR) is the major source of food composition data in the United States, providing the foundation for most food composition databases in the public and private sectors. Sources of data used in SR include analytical studies, food manufacturer...

  11. The USA Nr Inventory: Dominant Sources and Primary Transport Pathways

    NASA Astrophysics Data System (ADS)

    Sabo, R. D.; Clark, C.; Sobota, D. J.; Compton, J.; Cooter, E. J.; Schwede, D. B.; Bash, J. O.; Rea, A.; Dobrowolski, J. P.

    2016-12-01

    Efforts to mitigate the deleterious effects of excess reactive nitrogen (Nr) on human health and ecosystem goods and service while ensuring food, biofuel, and fiber availability, is one of the most pressing environmental management challenges of this century. Effective management of Nr requires up to date inventories that quantitatively characterize the sources, transport, and transformation of Nr through the environment. The inherent complexity of the nitrogen cycle, however, through multiple exchange points across air, water, and terrestrial media, renders such inventories difficult to compile and manage. Previous Nr Inventories are for 2002 and 2007, and used data sources that have since been improved. Thus, this recent inventory will substantially advance the methodology across many sectors of the inventory (e.g. deposition and biological fixation in crops and natural systems) and create a recent snapshot that is sorely needed for policy planning and trends analysis. Here we use a simple mass balance approach to estimate the input-output budgets for all United States Geologic Survey Hydrologic Unit Code-8 watersheds. We focus on a recent year (i.e. 2012) to update the Nr Inventory, but apply the analytical approach for multiple years where possible to assess trends through time. We also compare various sector estimates using multiple methodologies. Assembling datasets that account for new Nr inputs into watersheds (e.g., atmospheric NOy deposition, food imports, biologic N fixation) and internal fluxes of recycled Nr (e.g., manure, Nr emmissions/volatilization) provide an unprecedented, data driven computation of N flux. Input-output budgets will offer insight into 1) the dominant sources of Nr in a watershed (e.g., food imports, atmospheric N deposition, or fertilizer), 2) the primary loss pathways for Nr (e.g., crop N harvest, volatilization/emissions), and 3) what watersheds are net sources versus sinks of Nr. These insights will provide needed clarity for managers looking to minimize the loss of Nr to atmospheric and aquatic compartments, while also providing a foundational database for researchers assessing the dominant controls of N retention and loss in natural and anthropogenically dominated ecosystems. Disclaimer: Views expressed are the authors' and not views or polices of the U.S.EPA.

  12. Brunn: an open source laboratory information system for microplates with a graphical plate layout design process.

    PubMed

    Alvarsson, Jonathan; Andersson, Claes; Spjuth, Ola; Larsson, Rolf; Wikberg, Jarl E S

    2011-05-20

    Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data. A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves. Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.

  13. Calculating Proper Motions in the WFCAM Science Archive for the UKIRT Infrared Deep Sky Surveys

    NASA Astrophysics Data System (ADS)

    Collins, R.; Hambly, N.

    2012-09-01

    The ninth data release from the UKIRT Infrared Deep Sky Surveys (hereafter UKIDSS DR9), represents five years worth of observations by its wide-field camera (WFCAM) and will be the first to include proper motion values in its source catalogues for the shallow, wide-area surveys; the Large Area Survey (LAS), Galactic Clusters Survey (GCS) and (ultimately) Galactic Plane Survey (GPS). We, the Wide Field Astronomy Unit (WFAU) at the University of Edinburgh who prepare these regular data releases in the WFCAM Science Archive (WSA), describe in this paper how we make optimal use of the individual detection catalogues from each observation to derive high-quality astrometric fits for the positions of each detection enabling us to calculate a proper motion solution across multiple epochs and passbands when constructing a merged source catalogue. We also describe how the proper motion solutions affect the calculation of the various attributes provided in the database source catalogue tables, what measures of data quality we provide and a demonstration of the results for observations of the Pleiades cluster.

  14. THE NATURE AND FREQUENCY OF OUTFLOWS FROM STARS IN THE CENTRAL ORION NEBULA CLUSTER

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O’Dell, C. R.; Ferland, G. J.; Henney, W. J.

    Recent Hubble Space Telescope images have allowed the determination with unprecedented accuracy of motions and changes of shocks within the inner Orion Nebula. These originate from collimated outflows from very young stars, some within the ionized portion of the nebula and others within the host molecular cloud. We have doubled the number of Herbig–Haro objects known within the inner Orion Nebula. We find that the best-known Herbig–Haro shocks originate from relatively few stars, with the optically visible X-ray source COUP 666 driving many of them. While some isolated shocks are driven by single collimated outflows, many groups of shocks aremore » the result of a single stellar source having jets oriented in multiple directions at similar times. This explains the feature that shocks aligned in opposite directions in the plane of the sky are usually blueshifted because the redshifted outflows pass into the optically thick photon-dominated region behind the nebula. There are two regions from which optical outflows originate for which there are no candidate sources in the SIMBAD database.« less

  15. Newly graduated nurses' use of knowledge sources: a meta-ethnography.

    PubMed

    Voldbjerg, Siri Lygum; Grønkjaer, Mette; Sørensen, Erik Elgaard; Hall, Elisabeth O C

    2016-08-01

    To advance evidence on newly graduated nurses' use of knowledge sources. Clinical decisions need to be evidence-based and understanding the knowledge sources that newly graduated nurses use will inform both education and practice. Qualitative studies on newly graduated nurses' use of knowledge sources are increasing though generated from scattered healthcare contexts. Therefore, a metasynthesis of qualitative research on what knowledge sources new graduates use in decision-making was conducted. Meta-ethnography. Nineteen reports, representing 17 studies, published from 2000-2014 were identified from iterative searches in relevant databases from May 2013-May 2014. Included reports were appraised for quality and Noblit and Hare's meta-ethnography guided the interpretation and synthesis of data. Newly graduated nurses' use of knowledge sources during their first 2-year postgraduation were interpreted in the main theme 'self and others as knowledge sources,' with two subthemes 'doing and following' and 'knowing and doing,' each with several elucidating categories. The metasynthesis revealed a line of argument among the report findings underscoring progression in knowledge use and perception of competence and confidence among newly graduated nurses. The transition phase, feeling of confidence and ability to use critical thinking and reflection, has a great impact on knowledge sources incorporated in clinical decisions. The synthesis accentuates that for use of newly graduated nurses' qualifications and skills in evidence-based practice, clinical practice needs to provide a supportive environment which nurtures critical thinking and questions and articulates use of multiple knowledge sources. © 2016 John Wiley & Sons Ltd.

  16. Boosting Probabilistic Graphical Model Inference by Incorporating Prior Knowledge from Multiple Sources

    PubMed Central

    Praveen, Paurush; Fröhlich, Holger

    2013-01-01

    Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available. PMID:23826291

  17. Electronic Reference Library: Silverplatter's Database Networking Solution.

    ERIC Educational Resources Information Center

    Millea, Megan

    Silverplatter's Electronic Reference Library (ERL) provides wide area network access to its databases using TCP/IP communications and client-server architecture. ERL has two main components: The ERL clients (retrieval interface) and the ERL server (search engines). ERL clients provide patrons with seamless access to multiple databases on multiple…

  18. Digital food photography: Dietary surveillance and beyond

    USDA-ARS?s Scientific Manuscript database

    The method used for creating a database of approximately 20,000 digital images of multiple portion sizes of foods linked to the USDA's Food and Nutrient Database for Dietary Studies (FNDDS) is presented. The creation of this database began in 2002, and its development has spanned 10 years. Initially...

  19. 77 FR 34211 - Modification of Multiple Compulsory Reporting Points; Continental United States, Alaska and Hawaii

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-06-11

    ... previously updated in the FAA aeronautical database without accompanying regulatory action being taken. The... to ensure it matches the information contained in the FAA's aeronautical database and to ensure the... position information contained in the FAA's aeronautical database for the reporting points. When these...

  20. 77 FR 16434 - Revocation of Multiple Domestic, Alaskan, and Hawaiian Compulsory Reporting Points

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-03-21

    ... previously removed from service and taken out of the FAA aeronautical database. The FAA is removing these... FAA's aeronautical database. This will avoid confusion and eliminate safety issues with existing fixes... and not contained in the FAA's aeronautical database as reporting points. The reporting points...

  1. 77 FR 40490 - Revocation and Modification of Multiple Domestic, Alaskan, and Hawaiian Compulsory Reporting Points

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-07-10

    ... FAA aeronautical database as compulsory reporting points. Additionally, this action also requires... aeronautical database. DATES: Effective date 0901 UTC July 10, 2012. The Director of the Federal Register... FAA's aeronautical database as reporting points. The reporting points included five Domestic Reporting...

  2. Reliability and validity assessment of administrative databases in measuring the quality of rectal cancer management.

    PubMed

    Corbellini, Carlo; Andreoni, Bruno; Ansaloni, Luca; Sgroi, Giovanni; Martinotti, Mario; Scandroglio, Ildo; Carzaniga, Pierluigi; Longoni, Mauro; Foschi, Diego; Dionigi, Paolo; Morandi, Eugenio; Agnello, Mauro

    2018-01-01

    Measurement and monitoring of the quality of care using a core set of quality measures are increasing in health service research. Although administrative databases include limited clinical data, they offer an attractive source for quality measurement. The purpose of this study, therefore, was to evaluate the completeness of different administrative data sources compared to a clinical survey in evaluating rectal cancer cases. Between May 2012 and November 2014, a clinical survey was done on 498 Lombardy patients who had rectal cancer and underwent surgical resection. These collected data were compared with the information extracted from administrative sources including Hospital Discharge Dataset, drug database, daycare activity data, fee-exemption database, and regional screening program database. The agreement evaluation was performed using a set of 12 quality indicators. Patient complexity was a difficult indicator to measure for lack of clinical data. Preoperative staging was another suboptimal indicator due to the frequent missing administrative registration of tests performed. The agreement between the 2 data sources regarding chemoradiotherapy treatments was high. Screening detection, minimally invasive techniques, length of stay, and unpreventable readmissions were detected as reliable quality indicators. Postoperative morbidity could be a useful indicator but its agreement was lower, as expected. Healthcare administrative databases are large and real-time collected repositories of data useful in measuring quality in a healthcare system. Our investigation reveals that the reliability of indicators varies between them. Ideally, a combination of data from both sources could be used in order to improve usefulness of less reliable indicators.

  3. Encoding of low-quality DNA profiles as genotype probability matrices for improved profile comparisons, relatedness evaluation and database searches.

    PubMed

    Ryan, K; Williams, D Gareth; Balding, David J

    2016-11-01

    Many DNA profiles recovered from crime scene samples are of a quality that does not allow them to be searched against, nor entered into, databases. We propose a method for the comparison of profiles arising from two DNA samples, one or both of which can have multiple donors and be affected by low DNA template or degraded DNA. We compute likelihood ratios to evaluate the hypothesis that the two samples have a common DNA donor, and hypotheses specifying the relatedness of two donors. Our method uses a probability distribution for the genotype of the donor of interest in each sample. This distribution can be obtained from a statistical model, or we can exploit the ability of trained human experts to assess genotype probabilities, thus extracting much information that would be discarded by standard interpretation rules. Our method is compatible with established methods in simple settings, but is more widely applicable and can make better use of information than many current methods for the analysis of mixed-source, low-template DNA profiles. It can accommodate uncertainty arising from relatedness instead of or in addition to uncertainty arising from noisy genotyping. We describe a computer program GPMDNA, available under an open source licence, to calculate LRs using the method presented in this paper. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  4. [Effects of soil data and map scale on assessment of total phosphorus storage in upland soils.

    PubMed

    Li, Heng Rong; Zhang, Li Ming; Li, Xiao di; Yu, Dong Sheng; Shi, Xue Zheng; Xing, Shi He; Chen, Han Yue

    2016-06-01

    Accurate assessment of total phosphorus storage in farmland soils is of great significance to sustainable agricultural and non-point source pollution control. However, previous studies haven't considered the estimation errors from mapping scales and various databases with different sources of soil profile data. In this study, a total of 393×10 4 hm 2 of upland in the 29 counties (or cities) of North Jiangsu was cited as a case for study. Analysis was performed of how the four sources of soil profile data, namely, "Soils of County", "Soils of Prefecture", "Soils of Province" and "Soils of China", and the six scales, i.e. 1:50000, 1:250000, 1:500000, 1:1000000, 1:4000000 and1:10000000, used in the 24 soil databases established for the four soil journals, affected assessment of soil total phosphorus. Compared with the most detailed 1:50000 soil database established with 983 upland soil profiles, relative deviation of the estimates of soil total phosphorus density (STPD) and soil total phosphorus storage (STPS) from the other soil databases varied from 4.8% to 48.9% and from 1.6% to 48.4%, respectively. The estimated STPD and STPS based on the 1:50000 database of "Soils of County" and most of the estimates based on the databases of each scale in "Soils of County" and "Soils of Prefecture" were different, with the significance levels of P<0.001 or P<0.05. Extremely significant differences (P<0.001) existed between the estimates based on the 1:50000 database of "Soils of County" and the estimates based on the databases of each scale in "Soils of Province" and "Soils of China". This study demonstrated the significance of appropriate soil data sources and appropriate mapping scales in estimating STPS.

  5. CBD: a biomarker database for colorectal cancer.

    PubMed

    Zhang, Xueli; Sun, Xiao-Feng; Cao, Yang; Ye, Benchen; Peng, Qiliang; Liu, Xingyun; Shen, Bairong; Zhang, Hong

    2018-01-01

    Colorectal cancer (CRC) biomarker database (CBD) was established based on 870 identified CRC biomarkers and their relevant information from 1115 original articles in PubMed published from 1986 to 2017. In this version of the CBD, CRC biomarker data were collected, sorted, displayed and analysed. The CBD with the credible contents as a powerful and time-saving tool provide more comprehensive and accurate information for further CRC biomarker research. The CBD was constructed under MySQL server. HTML, PHP and JavaScript languages have been used to implement the web interface. The Apache was selected as HTTP server. All of these web operations were implemented under the Windows system. The CBD could provide to users the multiple individual biomarker information and categorized into the biological category, source and application of biomarkers; the experiment methods, results, authors and publication resources; the research region, the average age of cohort, gender, race, the number of tumours, tumour location and stage. We only collect data from the articles with clear and credible results to prove the biomarkers are useful in the diagnosis, treatment or prognosis of CRC. The CBD can also provide a professional platform to researchers who are interested in CRC research to communicate, exchange their research ideas and further design high-quality research in CRC. They can submit their new findings to our database via the submission page and communicate with us in the CBD.Database URL: http://sysbio.suda.edu.cn/CBD/.

  6. CBD: a biomarker database for colorectal cancer

    PubMed Central

    Zhang, Xueli; Sun, Xiao-Feng; Ye, Benchen; Peng, Qiliang; Liu, Xingyun; Shen, Bairong; Zhang, Hong

    2018-01-01

    Abstract Colorectal cancer (CRC) biomarker database (CBD) was established based on 870 identified CRC biomarkers and their relevant information from 1115 original articles in PubMed published from 1986 to 2017. In this version of the CBD, CRC biomarker data were collected, sorted, displayed and analysed. The CBD with the credible contents as a powerful and time-saving tool provide more comprehensive and accurate information for further CRC biomarker research. The CBD was constructed under MySQL server. HTML, PHP and JavaScript languages have been used to implement the web interface. The Apache was selected as HTTP server. All of these web operations were implemented under the Windows system. The CBD could provide to users the multiple individual biomarker information and categorized into the biological category, source and application of biomarkers; the experiment methods, results, authors and publication resources; the research region, the average age of cohort, gender, race, the number of tumours, tumour location and stage. We only collect data from the articles with clear and credible results to prove the biomarkers are useful in the diagnosis, treatment or prognosis of CRC. The CBD can also provide a professional platform to researchers who are interested in CRC research to communicate, exchange their research ideas and further design high-quality research in CRC. They can submit their new findings to our database via the submission page and communicate with us in the CBD. Database URL: http://sysbio.suda.edu.cn/CBD/ PMID:29846545

  7. BioM2MetDisease: a manually curated database for associations between microRNAs, metabolites, small molecules and metabolic diseases

    PubMed Central

    Xu, Yanjun; Yang, Haixiu; Wu, Tan; Dong, Qun; Sun, Zeguo; Shang, Desi; Li, Feng; Xu, Yingqi; Su, Fei; Liu, Siyao

    2017-01-01

    Abstract BioM2MetDisease is a manually curated database that aims to provide a comprehensive and experimentally supported resource of associations between metabolic diseases and various biomolecules. Recently, metabolic diseases such as diabetes have become one of the leading threats to people’s health. Metabolic disease associated with alterations of multiple types of biomolecules such as miRNAs and metabolites. An integrated and high-quality data source that collection of metabolic disease associated biomolecules is essential for exploring the underlying molecular mechanisms and discovering novel therapeutics. Here, we developed the BioM2MetDisease database, which currently documents 2681 entries of relationships between 1147 biomolecules (miRNAs, metabolites and small molecules/drugs) and 78 metabolic diseases across 14 species. Each entry includes biomolecule category, species, biomolecule name, disease name, dysregulation pattern, experimental technique, a brief description of metabolic disease-biomolecule relationships, the reference, additional annotation information etc. BioM2MetDisease provides a user-friendly interface to explore and retrieve all data conveniently. A submission page was also offered for researchers to submit new associations between biomolecules and metabolic diseases. BioM2MetDisease provides a comprehensive resource for studying biology molecules act in metabolic diseases, and it is helpful for understanding the molecular mechanisms and developing novel therapeutics for metabolic diseases. Database URL: http://www.bio-bigdata.com/BioM2MetDisease/ PMID:28605773

  8. Advancements in web-database applications for rabies surveillance.

    PubMed

    Rees, Erin E; Gendron, Bruno; Lelièvre, Frédérick; Coté, Nathalie; Bélanger, Denise

    2011-08-02

    Protection of public health from rabies is informed by the analysis of surveillance data from human and animal populations. In Canada, public health, agricultural and wildlife agencies at the provincial and federal level are responsible for rabies disease control, and this has led to multiple agency-specific data repositories. Aggregation of agency-specific data into one database application would enable more comprehensive data analyses and effective communication among participating agencies. In Québec, RageDB was developed to house surveillance data for the raccoon rabies variant, representing the next generation in web-based database applications that provide a key resource for the protection of public health. RageDB incorporates data from, and grants access to, all agencies responsible for the surveillance of raccoon rabies in Québec. Technological advancements of RageDB to rabies surveillance databases include (1) automatic integration of multi-agency data and diagnostic results on a daily basis; (2) a web-based data editing interface that enables authorized users to add, edit and extract data; and (3) an interactive dashboard to help visualize data simply and efficiently, in table, chart, and cartographic formats. Furthermore, RageDB stores data from citizens who voluntarily report sightings of rabies suspect animals. We also discuss how sightings data can indicate public perception to the risk of racoon rabies and thus aid in directing the allocation of disease control resources for protecting public health. RageDB provides an example in the evolution of spatio-temporal database applications for the storage, analysis and communication of disease surveillance data. The database was fast and inexpensive to develop by using open-source technologies, simple and efficient design strategies, and shared web hosting. The database increases communication among agencies collaborating to protect human health from raccoon rabies. Furthermore, health agencies have real-time access to a wide assortment of data documenting new developments in the raccoon rabies epidemic and this enables a more timely and appropriate response.

  9. Advancements in web-database applications for rabies surveillance

    PubMed Central

    2011-01-01

    Background Protection of public health from rabies is informed by the analysis of surveillance data from human and animal populations. In Canada, public health, agricultural and wildlife agencies at the provincial and federal level are responsible for rabies disease control, and this has led to multiple agency-specific data repositories. Aggregation of agency-specific data into one database application would enable more comprehensive data analyses and effective communication among participating agencies. In Québec, RageDB was developed to house surveillance data for the raccoon rabies variant, representing the next generation in web-based database applications that provide a key resource for the protection of public health. Results RageDB incorporates data from, and grants access to, all agencies responsible for the surveillance of raccoon rabies in Québec. Technological advancements of RageDB to rabies surveillance databases include 1) automatic integration of multi-agency data and diagnostic results on a daily basis; 2) a web-based data editing interface that enables authorized users to add, edit and extract data; and 3) an interactive dashboard to help visualize data simply and efficiently, in table, chart, and cartographic formats. Furthermore, RageDB stores data from citizens who voluntarily report sightings of rabies suspect animals. We also discuss how sightings data can indicate public perception to the risk of racoon rabies and thus aid in directing the allocation of disease control resources for protecting public health. Conclusions RageDB provides an example in the evolution of spatio-temporal database applications for the storage, analysis and communication of disease surveillance data. The database was fast and inexpensive to develop by using open-source technologies, simple and efficient design strategies, and shared web hosting. The database increases communication among agencies collaborating to protect human health from raccoon rabies. Furthermore, health agencies have real-time access to a wide assortment of data documenting new developments in the raccoon rabies epidemic and this enables a more timely and appropriate response. PMID:21810215

  10. Tools for beach health data management, data processing, and predictive model implementation

    USGS Publications Warehouse

    ,

    2013-01-01

    This fact sheet describes utilities created for management of recreational waters to provide efficient data management, data aggregation, and predictive modeling as well as a prototype geographic information system (GIS)-based tool for data visualization and summary. All of these utilities were developed to assist beach managers in making decisions to protect public health. The Environmental Data Discovery and Transformation (EnDDaT) Web service identifies, compiles, and sorts environmental data from a variety of sources that help to define climatic, hydrologic, and hydrodynamic characteristics including multiple data sources within the U.S. Geological Survey and the National Oceanic and Atmospheric Administration. The Great Lakes Beach Health Database (GLBH-DB) and Web application was designed to provide a flexible input, export, and storage platform for beach water quality and sanitary survey monitoring data to compliment beach monitoring programs within the Great Lakes. A real-time predictive modeling strategy was implemented by combining the capabilities of EnDDaT and the GLBH-DB for timely, automated prediction of beach water quality. The GIS-based tool was developed to map beaches based on their physical and biological characteristics, which was shared with multiple partners to provide concepts and information for future Web-accessible beach data outlets.

  11. Database of Standardized Questionnaires About Walking & Bicycling

    Cancer.gov

    This database contains questionnaire items and a list of validation studies for standardized items related to walking and biking. The items come from multiple national and international physical activity questionnaires.

  12. A genotypic and phenotypic information source for marker-assisted selection of cereals: the CEREALAB database

    PubMed Central

    Milc, Justyna; Sala, Antonio; Bergamaschi, Sonia; Pecchioni, Nicola

    2011-01-01

    The CEREALAB database aims to store genotypic and phenotypic data obtained by the CEREALAB project and to integrate them with already existing data sources in order to create a tool for plant breeders and geneticists. The database can help them in unravelling the genetics of economically important phenotypic traits; in identifying and choosing molecular markers associated to key traits; and in choosing the desired parentals for breeding programs. The database is divided into three sub-schemas corresponding to the species of interest: wheat, barley and rice; each sub-schema is then divided into two sub-ontologies, regarding genotypic and phenotypic data, respectively. Database URL: http://www.cerealab.unimore.it/jws/cerealab.jnlp PMID:21247929

  13. Analysis of commercial and public bioactivity databases.

    PubMed

    Tiikkainen, Pekka; Franke, Lutz

    2012-02-27

    Activity data for small molecules are invaluable in chemoinformatics. Various bioactivity databases exist containing detailed information of target proteins and quantitative binding data for small molecules extracted from journals and patents. In the current work, we have merged several public and commercial bioactivity databases into one bioactivity metabase. The molecular presentation, target information, and activity data of the vendor databases were standardized. The main motivation of the work was to create a single relational database which allows fast and simple data retrieval by in-house scientists. Second, we wanted to know the amount of overlap between databases by commercial and public vendors to see whether the former contain data complementing the latter. Third, we quantified the degree of inconsistency between data sources by comparing data points derived from the same scientific article cited by more than one vendor. We found that each data source contains unique data which is due to different scientific articles cited by the vendors. When comparing data derived from the same article we found that inconsistencies between the vendors are common. In conclusion, using databases of different vendors is still useful since the data overlap is not complete. It should be noted that this can be partially explained by the inconsistencies and errors in the source data.

  14. EPA’s SPECIATE 4.4 Database: Bridging Data Sources and Data Users

    EPA Science Inventory

    SPECIATE is the U.S. Environmental Protection Agency's (EPA)repository of volatile organic gas and particulate matter (PM) speciation profiles for air pollution sources. EPA released SPECIATE 4.4 in early 2014 and, in total, the SPECIATE 4.4 database includes 5,728 PM, VOC, total...

  15. Global Inventory of Gas Geochemistry Data from Fossil Fuel, Microbial and Burning Sources, version 2017

    NASA Astrophysics Data System (ADS)

    Sherwood, Owen A.; Schwietzke, Stefan; Arling, Victoria A.; Etiope, Giuseppe

    2017-08-01

    The concentration of atmospheric methane (CH4) has more than doubled over the industrial era. To help constrain global and regional CH4 budgets, inverse (top-down) models incorporate data on the concentration and stable carbon (δ13C) and hydrogen (δ2H) isotopic ratios of atmospheric CH4. These models depend on accurate δ13C and δ2H end-member source signatures for each of the main emissions categories. Compared with meticulous measurement and calibration of isotopic CH4 in the atmosphere, there has been relatively less effort to characterize globally representative isotopic source signatures, particularly for fossil fuel sources. Most global CH4 budget models have so far relied on outdated source signature values derived from globally nonrepresentative data. To correct this deficiency, we present a comprehensive, globally representative end-member database of the δ13C and δ2H of CH4 from fossil fuel (conventional natural gas, shale gas, and coal), modern microbial (wetlands, rice paddies, ruminants, termites, and landfills and/or waste) and biomass burning sources. Gas molecular compositional data for fossil fuel categories are also included with the database. The database comprises 10 706 samples (8734 fossil fuel, 1972 non-fossil) from 190 published references. Mean (unweighted) δ13C signatures for fossil fuel CH4 are significantly lighter than values commonly used in CH4 budget models, thus highlighting potential underestimation of fossil fuel CH4 emissions in previous CH4 budget models. This living database will be updated every 2-3 years to provide the atmospheric modeling community with the most complete CH4 source signature data possible. Database digital object identifier (DOI): https://doi.org/10.15138/G3201T.

  16. Constructing distributed Hippocratic video databases for privacy-preserving online patient training and counseling.

    PubMed

    Peng, Jinye; Babaguchi, Noboru; Luo, Hangzai; Gao, Yuli; Fan, Jianping

    2010-07-01

    Digital video now plays an important role in supporting more profitable online patient training and counseling, and integration of patient training videos from multiple competitive organizations in the health care network will result in better offerings for patients. However, privacy concerns often prevent multiple competitive organizations from sharing and integrating their patient training videos. In addition, patients with infectious or chronic diseases may not want the online patient training organizations to identify who they are or even which video clips they are interested in. Thus, there is an urgent need to develop more effective techniques to protect both video content privacy and access privacy . In this paper, we have developed a new approach to construct a distributed Hippocratic video database system for supporting more profitable online patient training and counseling. First, a new database modeling approach is developed to support concept-oriented video database organization and assign a degree of privacy of the video content for each database level automatically. Second, a new algorithm is developed to protect the video content privacy at the level of individual video clip by filtering out the privacy-sensitive human objects automatically. In order to integrate the patient training videos from multiple competitive organizations for constructing a centralized video database indexing structure, a privacy-preserving video sharing scheme is developed to support privacy-preserving distributed classifier training and prevent the statistical inferences from the videos that are shared for cross-validation of video classifiers. Our experiments on large-scale video databases have also provided very convincing results.

  17. Constructing a Database from Multiple 2D Images for Camera Pose Estimation and Robot Localization

    NASA Technical Reports Server (NTRS)

    Wolf, Michael; Ansar, Adnan I.; Brennan, Shane; Clouse, Daniel S.; Padgett, Curtis W.

    2012-01-01

    The LMDB (Landmark Database) Builder software identifies persistent image features (landmarks) in a scene viewed multiple times and precisely estimates the landmarks 3D world positions. The software receives as input multiple 2D images of approximately the same scene, along with an initial guess of the camera poses for each image, and a table of features matched pair-wise in each frame. LMDB Builder aggregates landmarks across an arbitrarily large collection of frames with matched features. Range data from stereo vision processing can also be passed to improve the initial guess of the 3D point estimates. The LMDB Builder aggregates feature lists across all frames, manages the process to promote selected features to landmarks, and iteratively calculates the 3D landmark positions using the current camera pose estimations (via an optimal ray projection method), and then improves the camera pose estimates using the 3D landmark positions. Finally, it extracts image patches for each landmark from auto-selected key frames and constructs the landmark database. The landmark database can then be used to estimate future camera poses (and therefore localize a robotic vehicle that may be carrying the cameras) by matching current imagery to landmark database image patches and using the known 3D landmark positions to estimate the current pose.

  18. CardioTF, a database of deconstructing transcriptional circuits in the heart system

    PubMed Central

    2016-01-01

    Background: Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. Methods: The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Results: Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. Discussion: The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Availability and Implementation: Database URL: http://www.cardiosignal.org/database/cardiotf.html. PMID:27635320

  19. CardioTF, a database of deconstructing transcriptional circuits in the heart system.

    PubMed

    Zhen, Yisong

    2016-01-01

    Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Database URL: http://www.cardiosignal.org/database/cardiotf.html.

  20. 75 FR 4827 - Submission for OMB Review; Comment Request Clinical Trials Reporting Program (CTRP) Database (NCI)

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-01-29

    ...; Comment Request Clinical Trials Reporting Program (CTRP) Database (NCI) Summary: Under the provisions of... Collection: Title: Clinical Trials Reporting Program (CTRP) Database. Type of Information Collection Request... Program (CTRP) Database, to serve as a single, definitive source of information about all NCI-supported...

  1. An Interactive Online Database for Potato Varieties Evaluated in the Eastern U.S.

    USDA-ARS?s Scientific Manuscript database

    Online databases are no longer a novelty. However, for the potato growing and research community little effort has been put into collecting data from multiple states and provinces, and presenting it in a web-based database format for researchers and end users to utilize. The NE1031 regional potato v...

  2. Appendix A. Borderlands Site Database

    Treesearch

    A.C. MacWilliams

    2006-01-01

    The database includes modified components of the Arizona State Museum Site Recording System (Arizona State Museum 1993) and the New Mexico NMCRIS User?s Guide (State of New Mexico 1993). When sites contain more than one recorded component, these instances were entered separately with the result that many sites have multiple entries. Information for this database...

  3. Social Science Data Bases and Data Banks in the United States and Canada.

    ERIC Educational Resources Information Center

    Black, John B.

    This overview of North American social science databases, including scope and services, identifies five trends: (1) growth--in the number of databases, subjects covered, and system availability; (2) increased competition in the retrieval systems marketplace with more databases being offered on multiple systems, improvements being made to the…

  4. Constructing Benchmark Databases and Protocols for Medical Image Analysis: Diabetic Retinopathy

    PubMed Central

    Kauppi, Tomi; Kämäräinen, Joni-Kristian; Kalesnykiene, Valentina; Sorri, Iiris; Uusitalo, Hannu; Kälviäinen, Heikki

    2013-01-01

    We address the performance evaluation practices for developing medical image analysis methods, in particular, how to establish and share databases of medical images with verified ground truth and solid evaluation protocols. Such databases support the development of better algorithms, execution of profound method comparisons, and, consequently, technology transfer from research laboratories to clinical practice. For this purpose, we propose a framework consisting of reusable methods and tools for the laborious task of constructing a benchmark database. We provide a software tool for medical image annotation helping to collect class label, spatial span, and expert's confidence on lesions and a method to appropriately combine the manual segmentations from multiple experts. The tool and all necessary functionality for method evaluation are provided as public software packages. As a case study, we utilized the framework and tools to establish the DiaRetDB1 V2.1 database for benchmarking diabetic retinopathy detection algorithms. The database contains a set of retinal images, ground truth based on information from multiple experts, and a baseline algorithm for the detection of retinopathy lesions. PMID:23956787

  5. BioMart: a data federation framework for large collaborative projects.

    PubMed

    Zhang, Junjun; Haider, Syed; Baran, Joachim; Cros, Anthony; Guberman, Jonathan M; Hsu, Jack; Liang, Yong; Yao, Long; Kasprzyk, Arek

    2011-01-01

    BioMart is a freely available, open source, federated database system that provides a unified access to disparate, geographically distributed data sources. It is designed to be data agnostic and platform independent, such that existing databases can easily be incorporated into the BioMart framework. BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects between different research groups. BioMart contains several levels of query optimization to efficiently manage large data sets and offers a diverse selection of graphical user interfaces and application programming interfaces to ensure that queries can be performed in whatever manner is most convenient for the user. The software has now been adopted by a large number of different biological databases spanning a wide range of data types and providing a rich source of annotation available to bioinformaticians and biologists alike.

  6. Study of an External Neutron Source for an Accelerator-Driven System using the PHITS Code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sugawara, Takanori; Iwasaki, Tomohiko; Chiba, Takashi

    A code system for the Accelerator Driven System (ADS) has been under development for analyzing dynamic behaviors of a subcritical core coupled with an accelerator. This code system named DSE (Dynamics calculation code system for a Subcritical system with an External neutron source) consists of an accelerator part and a reactor part. The accelerator part employs a database, which is calculated by using PHITS, for investigating the effect related to the accelerator such as the changes of beam energy, beam diameter, void generation, and target level. This analysis method using the database may introduce some errors into dynamics calculations sincemore » the neutron source data derived from the database has some errors in fitting or interpolating procedures. In this study, the effects of various events are investigated to confirm that the method based on the database is appropriate.« less

  7. [Data sources, the data used, and the modality for collection].

    PubMed

    Mercier, G; Costa, N; Dutot, C; Riche, V-P

    2018-03-01

    The hospital costing process implies access to various sources of data. Whether a micro-costing or a gross-costing approach is used, the choice of the methodology is based on a compromise between the cost of data collection, data accuracy, and data transferability. This work describes the data sources available in France and the access modalities that are used, as well as the main advantages and shortcomings of: (1) the local unit costs, (2) the hospital analytical accounting, (3) the Angers database, (4) the National Health Cost Studies, (5) the INTER CHR/U databases, (6) the Program for Medicalizing Information Systems, and (7) the public health insurance databases. Copyright © 2018 Elsevier Masson SAS. All rights reserved.

  8. Comparing Global Influence: China’s and U.S. Diplomacy, Foreign Aid, Trade, and Investment in the Developing World

    DTIC Science & Technology

    2008-08-15

    from 2006-2010 and about 7.4% for Source: United Nations, COMTRADE Database . 95 96 97 98 99 2000 1 2 3 4 5 6 7 Year 0 500 1000 1500 2000 2500 3000...49 Source: United Nations, COMTRADE Database . 95 96 97 98 99 2000 1 2 3 4 5 6 7 Year 0 200 400 600 800 1000 1200 1400 $Billion China Exports 149 151...U.S. and China’s Exports of Goods to the World Source: United Nations, COMTRADE Database . 95 96 97 98 99 2000 1 2 3 4 5 6 7 Year 0 500 1000 1500

  9. Geopotential Field Anomaly Continuation with Multi-Altitude Observations

    NASA Technical Reports Server (NTRS)

    Kim, Jeong Woo; Kim, Hyung Rae; von Frese, Ralph; Taylor, Patrick; Rangelova, Elena

    2012-01-01

    Conventional gravity and magnetic anomaly continuation invokes the standard Poisson boundary condition of a zero anomaly at an infinite vertical distance from the observation surface. This simple continuation is limited, however, where multiple altitude slices of the anomaly field have been observed. Increasingly, areas are becoming available constrained by multiple boundary conditions from surface, airborne, and satellite surveys. This paper describes the implementation of continuation with multi-altitude boundary conditions in Cartesian and spherical coordinates and investigates the advantages and limitations of these applications. Continuations by EPS (Equivalent Point Source) inversion and the FT (Fourier Transform), as well as by SCHA (Spherical Cap Harmonic Analysis) are considered. These methods were selected because they are especially well suited for analyzing multi-altitude data over finite patches of the earth such as covered by the ADMAP database. In general, continuations constrained by multi-altitude data surfaces are invariably superior to those constrained by a single altitude data surface due to anomaly measurement errors and the non-uniqueness of continuation.

  10. Geopotential Field Anomaly Continuation with Multi-Altitude Observations

    NASA Technical Reports Server (NTRS)

    Kim, Jeong Woo; Kim, Hyung Rae; vonFrese, Ralph; Taylor, Patrick; Rangelova, Elena

    2011-01-01

    Conventional gravity and magnetic anomaly continuation invokes the standard Poisson boundary condition of a zero anomaly at an infinite vertical distance from the observation surface. This simple continuation is limited, however, where multiple altitude slices of the anomaly field have been observed. Increasingly, areas are becoming available constrained by multiple boundary conditions from surface, airborne, and satellite surveys. This paper describes the implementation of continuation with multi-altitude boundary conditions in Cartesian and spherical coordinates and investigates the advantages and limitations of these applications. Continuations by EPS (Equivalent Point Source) inversion and the FT (Fourier Transform), as well as by SCHA (Spherical Cap Harmonic Analysis) are considered. These methods were selected because they are especially well suited for analyzing multi-altitude data over finite patches of the earth such as covered by the ADMAP database. In general, continuations constrained by multi-altitude data surfaces are invariably superior to those constrained by a single altitude data surface due to anomaly measurement errors and the non-uniqueness of continuation.

  11. Explosive eruption records from Eastern Africa: filling in the gaps with tephra records from stratified lake sequences

    NASA Astrophysics Data System (ADS)

    Lane, Christine; Asrat, Asfawossen; Cohen, Andy; Cullen, Victoria; Johnson, Thomas; Lamb, Henry; Martin-Jones, Catherine; Poppe, Sam; Schaebitz, Frank; Scholz, Christopher

    2017-04-01

    On-going research into the preservation of volcanic ash fall in stratified Holocene lake sediments in Eastern Africa reveals the level of incompleteness of our explosive eruption record. Only nine eruptions with VEI >4 are recorded in the LaMEVE database (Crosweller et al., 2012) and of the 188 Holocene eruptions listed for East African volcanoes in the Global Volcanism Programme database, only 24 are dated to > 2000 years ago (GVP, 2013). Tephrostratigraphic investigation of Holocene sediments from a number of lakes, including Lake Kivu (south of the Virunga volcanic field), Lake Victoria (west of the Kenyan Rift volcanism) and palaeolake Chew Bahir (southern Ethiopia), all reveal multiple tephra layers, which indicate vastly underestimated eruption histories. Whereas the tephra layers in Lake Kivu were all located macroscopically, no visible tephra layers were observed in the sediments from Lake Victoria and Chew Bahir. Instead, tephra are preserved as non-visible horizons (cryptotephra), revealed only after laboratory processing. These results indicate that even where we do have stratified visible tephra records, the number of past eruptions may still be a minimum. Cryptotephra studies therefore play a fundamental role in building comprehensive records of past volcanism. Challenges remain, in this understudied region, to identify the volcanic source of each of the tephra layers, which requires geochemical correlation to proximal volcanic deposits. Where correlations to source can be achieved, explosive eruption frequencies and recurrence rates may be assessed for individual volcanoes. Furthermore, if a tephra layer can be traced into multiple sedimentary sequences, the potential exists to evaluate eruption magnitude, providing a more useful criterion for risk assessment. Filling in the gaps in our understanding of East African Rift volcanism and the associated hazards is therefore critically dependent upon bringing together this important data from distal tephrostratigraphic records with the work of volcanologists studying more proximal deposits, and hazard modellers. Crosweller et al (2012) "Global database on large magnitude explosive volcanic eruptions (LaMEVE)" Journal of Applied Volcanology 1:4, doi:10.1186/2191-5040-1-4 Global Volcanism Program, 2013. Volcanoes of the World, v. 4.5.3. Venzke, E (ed.). Smithsonian Institution. Downloaded 06 Jan 2017. http://dx.doi.org/10.5479/si.GVP.VOTW4-2013

  12. Understanding the Groundwater Hydrology of a Geographically-Isolated Prairie Fen: Implications for Conservation

    PubMed Central

    Sampath, Prasanna Venkatesh; Liao, Hua-Sheng; Curtis, Zachary Kristopher; Doran, Patrick J.; Herbert, Matthew E.; May, Christopher A.; Li, Shu-Guang

    2015-01-01

    The sources of water and corresponding delivery mechanisms to groundwater-fed fens are not well understood due to the multi-scale geo-morphologic variability of the glacial landscape in which they occur. This lack of understanding limits the ability to effectively conserve these systems and the ecosystem services they provide, including biodiversity and water provisioning. While fens tend to occur in clusters around regional groundwater mounds, Ives Road Fen in southern Michigan is an example of a geographically-isolated fen. In this paper, we apply a multi-scale groundwater modeling approach to understand the groundwater sources for Ives Road fen. We apply Transition Probability geo-statistics on more than 3000 well logs from a state-wide water well database to characterize the complex geology using conditional simulations. We subsequently implement a 3-dimensional reverse particle tracking to delineate groundwater contribution areas to the fen. The fen receives water from multiple sources: local recharge, regional recharge from an extensive till plain, a regional groundwater mound, and a nearby pond. The regional sources deliver water through a tortuous, 3-dimensional “pipeline” consisting of a confined aquifer lying beneath an extensive clay layer. Water in this pipeline reaches the fen by upwelling through openings in the clay layer. The pipeline connects the geographically-isolated fen to the same regional mound that provides water to other fen clusters in southern Michigan. The major implication of these findings is that fen conservation efforts must be expanded from focusing on individual fens and their immediate surroundings, to studying the much larger and inter-connected hydrologic network that sustains multiple fens. PMID:26452279

  13. Move Over, Word Processors--Here Come the Databases.

    ERIC Educational Resources Information Center

    Olds, Henry F., Jr.; Dickenson, Anne

    1985-01-01

    Discusses the use of beginning, intermediate, and advanced databases for instructional purposes. A table listing seven databases with information on ease of use, smoothness of operation, data capacity, speed, source, and program features is included. (JN)

  14. SPOT 5/HRS: A Key Source for Navigation Database

    DTIC Science & Technology

    2003-09-02

    SUBTITLE SPOT 5 / HRS: A Key Source for Navigation Database 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT ......strategic objective. Nice data ….. What after ?? Filière SPOT Marc BERNARD Page 15 Producing from HRS u Partnership with IGN ( French

  15. Database of Sources of Environmental Releases of Dioxin-Like Compounds in the United States

    EPA Science Inventory

    The Database of Sources of Environmental Releases of Dioxin-like Compounds in the United States (US)African-American mitochondrial DNAs often match mtDNAs found in multiple African ethnic groups

    PubMed Central

    Ely, Bert; Wilson, Jamie Lee; Jackson, Fatimah; Jackson, Bruce A

    2006-01-01

    Background Mitochondrial DNA (mtDNA) haplotypes have become popular tools for tracing maternal ancestry, and several companies offer this service to the general public. Numerous studies have demonstrated that human mtDNA haplotypes can be used with confidence to identify the continent where the haplotype originated. Ideally, mtDNA haplotypes could also be used to identify a particular country or ethnic group from which the maternal ancestor emanated. However, the geographic distribution of mtDNA haplotypes is greatly influenced by the movement of both individuals and population groups. Consequently, common mtDNA haplotypes are shared among multiple ethnic groups. We have studied the distribution of mtDNA haplotypes among West African ethnic groups to determine how often mtDNA haplotypes can be used to reconnect Americans of African descent to a country or ethnic group of a maternal African ancestor. The nucleotide sequence of the mtDNA hypervariable segment I (HVS-I) usually provides sufficient information to assign a particular mtDNA to the proper haplogroup, and it contains most of the variation that is available to distinguish a particular mtDNA haplotype from closely related haplotypes. In this study, samples of general African-American and specific Gullah/Geechee HVS-I haplotypes were compared with two databases of HVS-I haplotypes from sub-Saharan Africa, and the incidence of perfect matches recorded for each sample. Results When two independent African-American samples were analyzed, more than half of the sampled HVS-I mtDNA haplotypes exactly matched common haplotypes that were shared among multiple African ethnic groups. Another 40% did not match any sequence in the database, and fewer than 10% were an exact match to a sequence from a single African ethnic group. Differences in the regional distribution of haplotypes were observed in the African database, and the African-American haplotypes were more likely to match haplotypes found in ethnic groups from West or West Central Africa than those found in eastern or southern Africa. Fewer than 14% of the African-American mtDNA sequences matched sequences from only West Africa or only West Central Africa. Conclusion Our database of sub-Saharan mtDNA sequences includes the most common haplotypes that are shared among ethnic groups from multiple regions of Africa. These common haplotypes have been found in half of all sub-Saharan Africans. More than 60% of the remaining haplotypes differ from the common haplotypes at a single nucleotide position in the HVS-I region, and they are likely to occur at varying frequencies within sub-Saharan Africa. However, the finding that 40% of the African-American mtDNAs analyzed had no match in the database indicates that only a small fraction of the total number of African haplotypes has been identified. In addition, the finding that fewer than 10% of African-American mtDNAs matched mtDNA sequences from a single African region suggests that few African Americans might be able to trace their mtDNA lineages to a particular region of Africa, and even fewer will be able to trace their mtDNA to a single ethnic group. However, no firm conclusions should be made until a much larger database is available. It is clear, however, that when identical mtDNA haplotypes are shared among many ethnic groups from different parts of Africa, it is impossible to determine which single ethnic group was the source of a particular maternal ancestor based on the mtDNA sequence. PMID:17038170

  16. Scale out databases for CERN use cases

    NASA Astrophysics Data System (ADS)

    Baranowski, Zbigniew; Grzybek, Maciej; Canali, Luca; Lanza Garcia, Daniel; Surdy, Kacper

    2015-12-01

    Data generation rates are expected to grow very fast for some database workloads going into LHC run 2 and beyond. In particular this is expected for data coming from controls, logging and monitoring systems. Storing, administering and accessing big data sets in a relational database system can quickly become a very hard technical challenge, as the size of the active data set and the number of concurrent users increase. Scale-out database technologies are a rapidly developing set of solutions for deploying and managing very large data warehouses on commodity hardware and with open source software. In this paper we will describe the architecture and tests on database systems based on Hadoop and the Cloudera Impala engine. We will discuss the results of our tests, including tests of data loading and integration with existing data sources and in particular with relational databases. We will report on query performance tests done with various data sets of interest at CERN, notably data from the accelerator log database.

  17. Does providing dental services reduce overall health care costs?: A systematic review of the literature.

    PubMed

    Elani, Hawazin W; Simon, Lisa; Ticku, Shenam; Bain, Paul A; Barrow, Jane; Riedy, Christine A

    2018-06-01

    The authors conducted a systematic review of the literature to assess the impact of dental treatment on overall health care costs for patients with chronic health conditions and patients who were pregnant. The authors searched multiple databases including MEDLINE, Embase, Web of Science, and Dentistry & Oral Sciences Source from the earliest date available through May 2017. Two reviewers conducted the initial screening of all retrieved titles and abstracts, read the full text of the eligible studies, and conducted data extraction and quality assessment of included studies. The authors found only 3 published studies that examined the effect of periodontal treatment on health care costs using medical and dental claims data from different insurance databases. Findings from the qualitative synthesis of those studies were inconclusive as 1 of the 3 studies showed a cost increase, whereas 2 studies showed a decrease. The small number of studies and their mixed outcomes demonstrate the need for high-quality studies to evaluate the effect of periodontal intervention on overall health care costs. Copyright © 2018 American Dental Association. Published by Elsevier Inc. All rights reserved.

  18. An Analysis of Database Replication Technologies with Regard to Deep Space Network Application Requirements

    NASA Technical Reports Server (NTRS)

    Connell, Andrea M.

    2011-01-01

    The Deep Space Network (DSN) has three communication facilities which handle telemetry, commands, and other data relating to spacecraft missions. The network requires these three sites to share data with each other and with the Jet Propulsion Laboratory for processing and distribution. Many database management systems have replication capabilities built in, which means that data updates made at one location will be automatically propagated to other locations. This project examines multiple replication solutions, looking for stability, automation, flexibility, performance, and cost. After comparing these features, Oracle Streams is chosen for closer analysis. Two Streams environments are configured - one with a Master/Slave architecture, in which a single server is the source for all data updates, and the second with a Multi-Master architecture, in which updates originating from any of the servers will be propagated to all of the others. These environments are tested for data type support, conflict resolution, performance, changes to the data structure, and behavior during and after network or server outages. Through this experimentation, it is determined which requirements of the DSN can be met by Oracle Streams and which cannot.

  19. Identifying functionally informative evolutionary sequence profiles.

    PubMed

    Gil, Nelson; Fiser, Andras

    2018-04-15

    Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.

  1. A geographically-diverse collection of 418 human gut microbiome pathway genome databases

    PubMed Central

    Hahn, Aria S.; Altman, Tomer; Konwar, Kishori M.; Hanson, Niels W.; Kim, Dongjae; Relman, David A.; Dill, David L.; Hallam, Steven J.

    2017-01-01

    Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using MetaPathways, enabling reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn’s disease, and type 2 diabetes. GutCyc data products are searchable online, or may be downloaded and explored locally using MetaPathways and Pathway Tools. PMID:28398290

  2. Central Appalachian basin natural gas database: distribution, composition, and origin of natural gases

    USGS Publications Warehouse

    Román Colón, Yomayra A.; Ruppert, Leslie F.

    2015-01-01

    The U.S. Geological Survey (USGS) has compiled a database consisting of three worksheets of central Appalachian basin natural gas analyses and isotopic compositions from published and unpublished sources of 1,282 gas samples from Kentucky, Maryland, New York, Ohio, Pennsylvania, Tennessee, Virginia, and West Virginia. The database includes field and reservoir names, well and State identification number, selected geologic reservoir properties, and the composition of natural gases (methane; ethane; propane; butane, iso-butane [i-butane]; normal butane [n-butane]; iso-pentane [i-pentane]; normal pentane [n-pentane]; cyclohexane, and hexanes). In the first worksheet, location and American Petroleum Institute (API) numbers from public or published sources are provided for 1,231 of the 1,282 gas samples. A second worksheet of 186 gas samples was compiled from published sources and augmented with public location information and contains carbon, hydrogen, and nitrogen isotopic measurements of natural gas. The third worksheet is a key for all abbreviations in the database. The database can be used to better constrain the stratigraphic distribution, composition, and origin of natural gas in the central Appalachian basin.

  3. Missing Modality Transfer Learning via Latent Low-Rank Constraint.

    PubMed

    Ding, Zhengming; Shao, Ming; Fu, Yun

    2015-11-01

    Transfer learning is usually exploited to leverage previously well-learned source domain for evaluating the unknown target domain; however, it may fail if no target data are available in the training stage. This problem arises when the data are multi-modal. For example, the target domain is in one modality, while the source domain is in another. To overcome this, we first borrow an auxiliary database with complete modalities, then consider knowledge transfer across databases and across modalities within databases simultaneously in a unified framework. The contributions are threefold: 1) a latent factor is introduced to uncover the underlying structure of the missing modality from the known data; 2) transfer learning in two directions allows the data alignment between both modalities and databases, giving rise to a very promising recovery; and 3) an efficient solution with theoretical guarantees to the proposed latent low-rank transfer learning algorithm. Comprehensive experiments on multi-modal knowledge transfer with missing target modality verify that our method can successfully inherit knowledge from both auxiliary database and source modality, and therefore significantly improve the recognition performance even when test modality is inaccessible in the training stage.

  4. Alaska IPASS database preparation manual.

    Treesearch

    P. McHugh; D. Olson; C. Schallau

    1989-01-01

    Describes the data, their sources, and the calibration procedures used in compiling a database for the Alaska IPASS (interactive policy analysis simulation system) model. Although this manual is for Alaska, it provides generic instructions for analysts preparing databases for other geographical areas.

  5. MyMolDB: a micromolecular database solution with open source and free components.

    PubMed

    Xia, Bing; Tai, Zheng-Fu; Gu, Yu-Cheng; Li, Bang-Jing; Ding, Li-Sheng; Zhou, Yan

    2011-10-01

    To manage chemical structures in small laboratories is one of the important daily tasks. Few solutions are available on the internet, and most of them are closed source applications. The open-source applications typically have limited capability and basic cheminformatics functionalities. In this article, we describe an open-source solution to manage chemicals in research groups based on open source and free components. It has a user-friendly interface with the functions of chemical handling and intensive searching. MyMolDB is a micromolecular database solution that supports exact, substructure, similarity, and combined searching. This solution is mainly implemented using scripting language Python with a web-based interface for compound management and searching. Almost all the searches are in essence done with pure SQL on the database by using the high performance of the database engine. Thus, impressive searching speed has been archived in large data sets for no external Central Processing Unit (CPU) consuming languages were involved in the key procedure of the searching. MyMolDB is an open-source software and can be modified and/or redistributed under GNU General Public License version 3 published by the Free Software Foundation (Free Software Foundation Inc. The GNU General Public License, Version 3, 2007. Available at: http://www.gnu.org/licenses/gpl.html). The software itself can be found at http://code.google.com/p/mymoldb/. Copyright © 2011 Wiley Periodicals, Inc.

  6. Assessing the number of fire fatalities in a defined population.

    PubMed

    Jonsson, Anders; Bergqvist, Anders; Andersson, Ragnar

    2015-12-01

    Fire-related fatalities and injuries have become a growing governmental concern in Sweden, and a national vision zero strategy has been adopted stating that nobody should get killed or seriously injured from fires. There is considerable uncertainty, however, regarding the numbers of both deaths and injuries due to fires. Different national sources present different numbers, even on deaths, which obstructs reliable surveillance of the problem over time. We assume the situation is similar in other countries. This study seeks to assess the true number of fire-related deaths in Sweden by combining sources, and to verify the coverage of each individual source. By doing so, we also wish to demonstrate the possibilities of improved surveillance practices. Data from three national sources were collected and matched; a special database on fatal fires held by The Swedish Contingencies Agency (nationally responsible for fire prevention), a database on forensic medical examinations held by the National Board of Forensic Medicine, and the cause of death register held by the Swedish National Board of Health and Welfare. The results disclose considerable underreporting in the single sources. The national database on fatal fires, serving as the principal source for policy making on fire prevention matters, underestimates the true situation by 20%. Its coverage of residential fires appears to be better than other fires. Systematic safety work and informed policy-making presuppose access to correct and reliable numbers. By combining several different sources, as suggested in this study, the national database on fatal fires is now considerably improved and includes regular matching with complementary sources.

  7. Surgical research using national databases

    PubMed Central

    Leland, Hyuma; Heckmann, Nathanael

    2016-01-01

    Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research. PMID:27867945

  8. Surgical research using national databases.

    PubMed

    Alluri, Ram K; Leland, Hyuma; Heckmann, Nathanael

    2016-10-01

    Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research.

  9. The NCBI BioSystems database.

    PubMed

    Geer, Lewis Y; Marchler-Bauer, Aron; Geer, Renata C; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H

    2010-01-01

    The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI's Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets.

  10. Measuring health system resource use for economic evaluation: a comparison of data sources.

    PubMed

    Pollicino, Christine; Viney, Rosalie; Haas, Marion

    2002-01-01

    A key challenge for evaluators and health system planners is the identification, measurement and valuation of resource use for economic evaluation. Accurately capturing all significant resource use is particularly difficult in the Australian context where there is no comprehensive database from which researchers can draw. Evaluators and health system planners need to consider different approaches to data collection for estimating resource use for economic evaluation, and the relative merits of the different data sources available. This paper illustrates the issues that arise in using different data sources using a sub-sample of the data being collected for an economic evaluation. Specifically, it compares the use of Australia's largest administrative database on resource use, the Health Insurance Commission database, with the use of patient-supplied data. The extent of agreement and discrepancies between the two data sources is investigated. Findings from this study and recommendations as to how to deal with different data sources are presented.

  11. An integrated modeling approach for estimating the water quality benefits of conservation practices at the river basin scale.

    PubMed

    Santhi, C; Kannan, N; White, M; Di Luzio, M; Arnold, J G; Wang, X; Williams, J R

    2014-01-01

    The USDA initiated the Conservation Effects Assessment Project (CEAP) to quantify the environmental benefits of conservation practices at regional and national scales. For this assessment, a sampling and modeling approach is used. This paper provides a technical overview of the modeling approach used in CEAP cropland assessment to estimate the off-site water quality benefits of conservation practices using the Ohio River Basin (ORB) as an example. The modeling approach uses a farm-scale model, Agricultural Policy Environmental Extender (APEX), and a watershed scale model (the Soil and Water Assessment Tool [SWAT]) and databases in the Hydrologic Unit Modeling for the United States system. Databases of land use, soils, land use management, topography, weather, point sources, and atmospheric depositions were developed to derive model inputs. APEX simulates the cultivated cropland, Conserve Reserve Program land, and the practices implemented on them, whereas SWAT simulates the noncultivated land (e.g., pasture, range, urban, and forest) and point sources. Simulation results from APEX are input into SWAT. SWAT routes all sources, including APEX's, to the basin outlet through each eight-digit watershed. Each basin is calibrated for stream flow, sediment, and nutrient loads at multiple gaging sites and turned in for simulating the effects of conservation practice scenarios on water quality. Results indicate that sediment, nitrogen, and phosphorus loads delivered to the Mississippi River from ORB could be reduced by 16, 15, and 23%, respectively, due to current conservation practices. Modeling tools are useful to provide science-based information for assessing existing conservation programs, developing future programs, and developing insights on load reductions necessary for hypoxia in the Gulf of Mexico. Copyright © by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America, Inc.

  12. Phenome-driven disease genetics prediction toward drug discovery.

    PubMed

    Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong

    2015-06-15

    Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. nlp. edu/public/data/DMN © The Author 2015. Published by Oxford University Press.

  13. Demyelinating disease in patients treated with TNF antagonists in rheumatology: data from BIOBADASER, a pharmacovigilance database, and a systematic review.

    PubMed

    Cruz Fernández-Espartero, María; Pérez-Zafrilla, Beatriz; Naranjo, Antonio; Esteban, Carmen; Ortiz, Ana M; Gómez-Reino, Juan J; Carmona, Loreto

    2011-12-01

    To estimate the rate of demyelinating diseases in patients with rheumatic diseases treated with tumor necrosis factor (TNF) antagonists and to describe the cases reported to 3 different pharmacovigilance sources. All confirmed cases of demyelinating disease, optic neuritis, and multiple sclerosis (MS) in patients with rheumatic diseases treated with TNF-antagonists were reviewed from 3 different sources: (1) the Spanish Registry of biological therapies in rheumatic diseases (BIOBADASER); (2) the Spanish Pharmacovigilance Database of Adverse Drug Reactions (FEDRA); and (3) a systematic review (PubMed, EMBASE, and the Cochrane Library). In BIOBADASER, the incidence rate per 1000 patients was estimated with a 95% confidence interval (95% CI). In 21,425 patient-years in BIOBADASER, there were 9 patients with confirmed demyelinating disease, 4 with optic neuritis, and 1 with MS. In addition, 22 patients presented polyneuropathies, paresthesias, dysesthesias, facial palsy, or vocal cord paralysis without confirmed demyelination. The incidence rate of demyelinating disease in patients with rheumatic diseases exposed to TNF-antagonists in BIOBADASER was 0.65 per 1000 patient-years (95% CI: 0.39-1.1). The incidence of MS in BIOBADASER was 0.05 (95% CI: 0.01-0.33), while the incidence in the general Spanish population was 0.02 to 0.04 cases per 1000. Compared with BIOBADASER, cases in FEDRA (n = 19) and in the literature (n = 48) tend to be younger, have shorter exposure to TNF-antagonists, and recover after discontinuation of the drug. It is not clear whether TNF antagonists increase the incidence of demyelinating diseases in patients with rheumatic diseases. Differences between cases depending on the pharmacovigilance source could be explained by selective reporting bias outside registries. Copyright © 2011. Published by Elsevier Inc.

  14. Demyelinating disease in patients treated with TNF antagonists in rheumatology: data from BIOBADASER, a pharmacovigilance database, and a systematic review.

    PubMed

    Fernández-Espartero, María Cruz; Pérez-Zafrilla, Beatriz; Naranjo, Antonio; Esteban, Carmen; Ortiz, Ana M; Gómez-Reino, Juan J; Carmona, Loreto

    2011-02-01

    To estimate the rate of demyelinating diseases in patients with rheumatic diseases treated with tumor necrosis factor (TNF) antagonists and to describe the cases reported to 3 different pharmacovigilance sources. All confirmed cases of demyelinating disease, optic neuritis, and multiple sclerosis (MS) in patients with rheumatic diseases treated with TNF-antagonists were reviewed from 3 different sources: (1) the Spanish Registry of biological therapies in rheumatic diseases (BIOBADASER); (2) the Spanish Pharmacovigilance Database of Adverse Drug Reactions (FEDRA); and (3) a systematic review (PubMed, EMBASE, and the Cochrane Library). In BIOBADASER, the incidence rate per 1000 patients was estimated with a 95% confidence interval (95%CI). In 21,425 patient-years in BIOBADASER, there were 9 patients with confirmed demyelinating disease, 4 with optic neuritis, and 1 with MS. In addition, 22 patients presented polyneuropathies, paresthesias, dysesthesias, facial palsy, or vocal cord paralysis without confirmed demyelination. The incidence rate of demyelinating disease in patients with rheumatic diseases exposed to TNF antagonists in BIOBADASER was 0.65 per 1000 patient-years (95%CI: 0.39-1.1). The incidence of MS in BIOBADASER was 0.05 (95%CI: 0.01-0.33), while the incidence in the general Spanish population was 0.02 to 0.04 cases per 1000. Compared with BIOBADASER, cases in FEDRA (n = 19) and in the literature (n = 48) tend to be younger, have shorter exposure to TNF-antagonists, and recover after discontinuation of the drug. It is not clear whether TNF antagonists increase the incidence of demyelinating diseases in patients with rheumatic diseases. Differences between cases depending on the pharmacovigilance source could be explained by selective reporting bias outside registries. Copyright © 2011 Elsevier Inc. All rights reserved.

  15. Predicting Adverse Drug Effects from Literature- and Database-Mined Assertions.

    PubMed

    La, Mary K; Sedykh, Alexander; Fourches, Denis; Muratov, Eugene; Tropsha, Alexander

    2018-06-06

    Given that adverse drug effects (ADEs) have led to post-market patient harm and subsequent drug withdrawal, failure of candidate agents in the drug development process, and other negative outcomes, it is essential to attempt to forecast ADEs and other relevant drug-target-effect relationships as early as possible. Current pharmacologic data sources, providing multiple complementary perspectives on the drug-target-effect paradigm, can be integrated to facilitate the inference of relationships between these entities. This study aims to identify both existing and unknown relationships between chemicals (C), protein targets (T), and ADEs (E) based on evidence in the literature. Cheminformatics and data mining approaches were employed to integrate and analyze publicly available clinical pharmacology data and literature assertions interrelating drugs, targets, and ADEs. Based on these assertions, a C-T-E relationship knowledge base was developed. Known pairwise relationships between chemicals, targets, and ADEs were collected from several pharmacological and biomedical data sources. These relationships were curated and integrated according to Swanson's paradigm to form C-T-E triangles. Missing C-E edges were then inferred as C-E relationships. Unreported associations between drugs, targets, and ADEs were inferred, and inferences were prioritized as testable hypotheses. Several C-E inferences, including testosterone → myocardial infarction, were identified using inferences based on the literature sources published prior to confirmatory case reports. Timestamping approaches confirmed the predictive ability of this inference strategy on a larger scale. The presented workflow, based on free-access databases and an association-based inference scheme, provided novel C-E relationships that have been validated post hoc in case reports. With refinement of prioritization schemes for the generated C-E inferences, this workflow may provide an effective computational method for the early detection of potential drug candidate ADEs that can be followed by targeted experimental investigations.

  16. [Pharmacological treatment conciliation methodology in patients with multiple conditions].

    PubMed

    Alfaro-Lara, Eva Rocío; Vega-Coca, María Dolores; Galván-Banqueri, Mercedes; Nieto-Martín, María Dolores; Pérez-Guerrero, Concepción; Santos-Ramos, Bernardo

    2014-02-01

    To carry out a bibliographic review in order to identify the different methodologies used along the reconciliation process of drug therapy applicable to polypathological patients. We performed a literature review. Data sources The bibliographic review (February 2012) included the following databases: Pubmed, EMBASE, CINAHL, PsycINFO and Spanish Medical Index (IME). The different methodologies, identified on those databases, to measure the conciliation process in polypathological patients, or otherwise elderly patients or polypharmacy, were studied. Study selection Two hundred and seventy three articles were retrieved, of which 25 were selected. Data extraction Specifically: the level of care, the sources of information, the use of registration forms, the established time, the medical professional in charge and the registered variables such as errors of reconciliation. Most of studies selected when the patient was admitted into the hospital and after the hospital discharge of the patient. The main sources of information to be highlighted are: the interview and the medical history of the patient. An established time is not explicitly stated on most of them, nor the registration form is used. The main professional in charge is the clinical pharmacologist. Apart from the home medication, the habits of self-medication and phytotherapy are also identified. The common errors of reconciliation vary from the omission of drugs to different forms of interaction with other medicinal products (drugs interactions). There is a large heterogeneity of methodologies used for reconciliation. There is not any work done on the specific figure of the polypathological patient, which precisely requires a standardized methodology due to its complexity and its susceptibility to errors of reconciliation. Copyright © 2012 Elsevier España, S.L. All rights reserved.

  17. Defining Care Patterns and Outcomes Among Persons Living with HIV in Washington, DC: Linkage of Clinical Cohort and Surveillance Data

    PubMed Central

    Terzian, Arpi; Opoku, Jenevieve; Happ, Lindsey Powers; Younes, Naji; Kharfen, Michael; Greenberg, Alan

    2018-01-01

    Background Triangulation of data from multiple sources such as clinical cohort and surveillance data can help improve our ability to describe care patterns, service utilization, comorbidities, and ultimately measure and monitor clinical outcomes among persons living with HIV infection. Objectives The objective of this study was to determine whether linkage of clinical cohort data and routinely collected HIV surveillance data would enhance the completeness and accuracy of each database and improve the understanding of care patterns and clinical outcomes. Methods We linked data from the District of Columbia (DC) Cohort, a large HIV observational clinical cohort, with Washington, DC, Department of Health (DOH) surveillance data between January 2011 and June 2015. We determined percent concordance between select variables in the pre- and postlinked databases using kappa test statistics. We compared retention in care (RIC), viral suppression (VS), sexually transmitted diseases (STDs), and non-HIV comorbid conditions (eg, hypertension) and compared HIV clinic visit patterns determined using the prelinked database (DC Cohort) versus the postlinked database (DC Cohort + DOH) using chi-square testing. Additionally, we compared sociodemographic characteristics, RIC, and VS among participants receiving HIV care at ≥3 sites versus <3 sites using chi-square testing. Results Of the 6054 DC Cohort participants, 5521 (91.19%) were included in the postlinked database and enrolled at a single DC Cohort site. The majority of the participants was male, black, and had men who have sex with men (MSM) as their HIV risk factor. In the postlinked database, 619 STD diagnoses previously unknown to the DC Cohort were identified. Additionally, the proportion of participants with RIC was higher compared with the prelinked database (59.83%, 2678/4476 vs 64.95%, 2907/4476; P<.001) and the proportion with VS was lower (87.85%, 2277/2592 vs 85.15%, 2391/2808; P<.001). Almost a quarter of participants (23.06%, 1279/5521) were identified as receiving HIV care at ≥2 sites (postlinked database). The participants using ≥3 care sites were more likely to achieve RIC (80.7%, 234/290 vs 62.61%, 2197/3509) but less likely to achieve VS (72.3%, 154/213 vs 89.51%, 1869/2088). The participants using ≥3 care sites were more likely to have unstable housing (15.1%, 64/424 vs 8.96%, 380/4242), public insurance (86.1%, 365/424 vs 57.57%, 2442/4242), comorbid conditions (eg, hypertension) (37.7%, 160/424 vs 22.98%, 975/4242), and have acquired immunodeficiency syndrome (77.8%, 330/424 vs 61.20%, 2596/4242) (all P<.001). Conclusions Linking surveillance and clinical data resulted in the improved completeness of each database and a larger volume of available data to evaluate HIV outcomes, allowing for refinement of HIV care continuum estimates. The postlinked database also highlighted important differences between participants who sought HIV care at multiple clinical sites. Our findings suggest that combined datasets can enhance evaluation of HIV-related outcomes across an entire metropolitan area. Future research will evaluate how to best utilize this information to improve outcomes in addition to monitoring them. PMID:29549065

  18. Linking data to decision-making: applying qualitative data analysis methods and software to identify mechanisms for using outcomes data.

    PubMed

    Patel, Vaishali N; Riley, Anne W

    2007-10-01

    A multiple case study was conducted to examine how staff in child out-of-home care programs used data from an Outcomes Management System (OMS) and other sources to inform decision-making. Data collection consisted of thirty-seven semi-structured interviews with clinicians, managers, and directors from two treatment foster care programs and two residential treatment centers, and individuals involved with developing the OMS; and observations of clinical and quality management meetings. Case study and grounded theory methodology guided analyses. The application of qualitative data analysis software is described. Results show that although staff rarely used data from the OMS, they did rely on other sources of systematically collected information to inform clinical, quality management, and program decisions. Analyses of how staff used these data suggest that improving the utility of OMS will involve encouraging staff to participate in data-based decision-making, and designing and implementing OMS in a manner that reflects how decision-making processes operate.

  19. Data mining: childhood injury control and beyond.

    PubMed

    Tepas, Joseph J

    2009-08-01

    Data mining is defined as the automatic extraction of useful, often previously unknown information from large databases or data sets. It has become a major part of modern life and is extensively used in industry, banking, government, and health care delivery. The process requires a data collection system that integrates input from multiple sources containing critical elements that define outcomes of interest. Appropriately designed data mining processes identify and adjust for confounding variables. The statistical modeling used to manipulate accumulated data may involve any number of techniques. As predicted results are periodically analyzed against those observed, the model is consistently refined to optimize precision and accuracy. Whether applying integrated sources of clinical data to inferential probabilistic prediction of risk of ventilator-associated pneumonia or population surveillance for signs of bioterrorism, it is essential that modern health care providers have at least a rudimentary understanding of what the concept means, how it basically works, and what it means to current and future health care.

  20. Characterization of PTO and Idle Behavior for Utility Vehicles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Duran, Adam W.; Konan, Arnaud M.; Miller, Eric S.

    This report presents the results of analyses performed on utility vehicle data composed primarily of aerial lift bucket trucks sampled from the National Renewable Energy Laboratory's Fleet DNA database to characterize power takeoff (PTO) and idle operating behavior for utility trucks. Two major data sources were examined in this study: a 75-vehicle sample of Odyne electric PTO (ePTO)-equipped vehicles drawn from multiple fleets spread across the United States and 10 conventional PTO-equipped Pacific Gas and Electric fleet vehicles operating in California. Novel data mining approaches were developed to identify PTO and idle operating states for each of the datasets usingmore » telematics and controller area network/onboard diagnostics data channels. These methods were applied to the individual datasets and aggregated to develop utilization curves and distributions describing PTO and idle behavior in both absolute and relative operating terms. This report also includes background information on the source vehicles, development of the analysis methodology, and conclusions regarding the study's findings.« less

  1. CROPPER: a metagene creator resource for cross-platform and cross-species compendium studies.

    PubMed

    Paananen, Jussi; Storvik, Markus; Wong, Garry

    2006-09-22

    Current genomic research methods provide researchers with enormous amounts of data. Combining data from different high-throughput research technologies commonly available in biological databases can lead to novel findings and increase research efficiency. However, combining data from different heterogeneous sources is often a very arduous task. These sources can be different microarray technology platforms, genomic databases, or experiments performed on various species. Our aim was to develop a software program that could facilitate the combining of data from heterogeneous sources, and thus allow researchers to perform genomic cross-platform/cross-species studies and to use existing experimental data for compendium studies. We have developed a web-based software resource, called CROPPER that uses the latest genomic information concerning different data identifiers and orthologous genes from the Ensembl database. CROPPER can be used to combine genomic data from different heterogeneous sources, allowing researchers to perform cross-platform/cross-species compendium studies without the need for complex computational tools or the requirement of setting up one's own in-house database. We also present an example of a simple cross-platform/cross-species compendium study based on publicly available Parkinson's disease data derived from different sources. CROPPER is a user-friendly and freely available web-based software resource that can be successfully used for cross-species/cross-platform compendium studies.

  2. Implementation of the Regulatory Authority Information System in Egypt

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carson, S.D.; Schetnan, R.; Hasan, A.

    2006-07-01

    As part of the implementation of a bar-code-based system to track radioactive sealed sources (RSS) in Egypt, the Regulatory Authority Information System Personal Digital Assistant (RAIS PDA) Application was developed to extend the functionality of the International Atomic Energy Agency's (IAEA's) RAIS database by allowing users to download RSS data from the database to a portable PDA equipped with a bar-code scanner. [1, 4] The system allows users in the field to verify radioactive sealed source data, gather radioactive sealed source audit information, and upload that data to the RAIS database. This paper describes the development of the RAIS PDAmore » Application, its features, and how it will be implemented in Egypt. (authors)« less

  3. The Chandra Source Catalog: User Interface

    NASA Astrophysics Data System (ADS)

    Bonaventura, Nina; Evans, I. N.; Harbo, P. N.; Rots, A. H.; Tibbetts, M. S.; Van Stone, D. W.; Zografou, P.; Anderson, C. S.; Chen, J. C.; Davis, J. E.; Doe, S. M.; Evans, J. D.; Fabbiano, G.; Galle, E.; Gibbs, D. G.; Glotfelty, K. J.; Grier, J. D.; Hain, R.; Hall, D. M.; He, X.; Houck, J. C.; Karovska, M.; Lauer, J.; McCollough, M. L.; McDowell, J. C.; Miller, J. B.; Mitschang, A. W.; Morgan, D. L.; Nichols, J. S.; Nowak, M. A.; Plummer, D. A.; Primini, F. A.; Refsdal, B. L.; Siemiginowska, A. L.; Sundheim, B. A.; Winkelman, S. L.

    2009-01-01

    The Chandra Source Catalog (CSC) is the definitive catalog of all X-ray sources detected by Chandra. The CSC is presented to the user in two tables: the Master Chandra Source Table and the Table of Individual Source Observations. Each distinct X-ray source identified in the CSC is represented by a single master source entry and one or more individual source entries. If a source is unaffected by confusion and pile-up in multiple observations, the individual source observations are merged to produce a master source. In each table, a row represents a source, and each column a quantity that is officially part of the catalog. The CSC contains positions and multi-band fluxes for the sources, as well as derived spatial, spectral, and temporal source properties. The CSC also includes associated source region and full-field data products for each source, including images, photon event lists, light curves, and spectra. The master source properties represent the best estimates of the properties of a source, and are presented in the following categories: Position and Position Errors, Source Flags, Source Extent and Errors, Source Fluxes, Source Significance, Spectral Properties, and Source Variability. The CSC Data Access GUI provides direct access to the source properties and data products contained in the catalog. The user may query the catalog database via a web-style search or an SQL command-line query. Each query returns a table of source properties, along with the option to browse and download associated data products. The GUI is designed to run in a web browser with Java version 1.5 or higher, and may be accessed via a link on the CSC website homepage (http://cxc.harvard.edu/csc/). As an alternative to the GUI, the contents of the CSC may be accessed directly through a URL, using the command-line tool, cURL. Support: NASA contract NAS8-03060 (CXC).

  4. MINDMAP: establishing an integrated database infrastructure for research in ageing, mental well-being, and the urban environment.

    PubMed

    Beenackers, Mariëlle A; Doiron, Dany; Fortier, Isabel; Noordzij, J Mark; Reinhard, Erica; Courtin, Emilie; Bobak, Martin; Chaix, Basile; Costa, Giuseppe; Dapp, Ulrike; Diez Roux, Ana V; Huisman, Martijn; Grundy, Emily M; Krokstad, Steinar; Martikainen, Pekka; Raina, Parminder; Avendano, Mauricio; van Lenthe, Frank J

    2018-01-19

    Urbanization and ageing have important implications for public mental health and well-being. Cities pose major challenges for older citizens, but also offer opportunities to develop, test, and implement policies, services, infrastructure, and interventions that promote mental well-being. The MINDMAP project aims to identify the opportunities and challenges posed by urban environmental characteristics for the promotion and management of mental well-being and cognitive function of older individuals. MINDMAP aims to achieve its research objectives by bringing together longitudinal studies from 11 countries covering over 35 cities linked to databases of area-level environmental exposures and social and urban policy indicators. The infrastructure supporting integration of this data will allow multiple MINDMAP investigators to safely and remotely co-analyse individual-level and area-level data. Individual-level data is derived from baseline and follow-up measurements of ten participating cohort studies and provides information on mental well-being outcomes, sociodemographic variables, health behaviour characteristics, social factors, measures of frailty, physical function indicators, and chronic conditions, as well as blood derived clinical biochemistry-based biomarkers and genetic biomarkers. Area-level information on physical environment characteristics (e.g. green spaces, transportation), socioeconomic and sociodemographic characteristics (e.g. neighbourhood income, residential segregation, residential density), and social environment characteristics (e.g. social cohesion, criminality) and national and urban social policies is derived from publically available sources such as geoportals and administrative databases. The linkage, harmonization, and analysis of data from different sources are being carried out using piloted tools to optimize the validity of the research results and transparency of the methodology. MINDMAP is a novel research collaboration that is combining population-based cohort data with publicly available datasets not typically used for ageing and mental well-being research. Integration of various data sources and observational units into a single platform will help to explain the differences in ageing-related mental and cognitive disorders both within as well as between cities in Europe, the US, Canada, and Russia and to assess the causal pathways and interactions between the urban environment and the individual determinants of mental well-being and cognitive ageing in older adults.

  5. Hubble Source Catalog

    NASA Astrophysics Data System (ADS)

    Lubow, S.; Budavári, T.

    2013-10-01

    We have created an initial catalog of objects observed by the WFPC2 and ACS instruments on the Hubble Space Telescope (HST). The catalog is based on observations taken on more than 6000 visits (telescope pointings) of ACS/WFC and more than 25000 visits of WFPC2. The catalog is obtained by cross matching by position in the sky all Hubble Legacy Archive (HLA) Source Extractor source lists for these instruments. The source lists describe properties of source detections within a visit. The calculations are performed on a SQL Server database system. First we collect overlapping images into groups, e.g., Eta Car, and determine nearby (approximately matching) pairs of sources from different images within each group. We then apply a novel algorithm for improving the cross matching of pairs of sources by adjusting the astrometry of the images. Next, we combine pairwise matches into maximal sets of possible multi-source matches. We apply a greedy Bayesian method to split the maximal matches into more reliable matches. We test the accuracy of the matches by comparing the fluxes of the matched sources. The result is a set of information that ties together multiple observations of the same object. A byproduct of the catalog is greatly improved relative astrometry for many of the HST images. We also provide information on nondetections that can be used to determine dropouts. With the catalog, for the first time, one can carry out time domain, multi-wavelength studies across a large set of HST data. The catalog is publicly available. Much more can be done to expand the catalog capabilities.

  6. Business Management System Support Analysis

    NASA Technical Reports Server (NTRS)

    Parikh, Jay

    2008-01-01

    The purpose of this research project was to develop a searchable database compiled with internal and external audit findings/observations. The data will correspond to the findings and observations from the date of Center-wide implementation of the ISO 9001-2000 standard to the present (2003-2008). It was derived and extracted from several sources and was in multiple formats. Once extracted, categorization of the findings/observations would be possible. The final data was mapped to the ISO 9001-2000 standard with the understanding that it will be displayed graphically. The data will be used to verify trends, associate risks, and establish timelines to identify strengths and weaknesses to determine areas of improvement in the Kennedy Space Center Business Management System Internal Audit Program.

  7. The Ensembl Web Site: Mechanics of a Genome Browser

    PubMed Central

    Stalker, James; Gibbins, Brian; Meidl, Patrick; Smith, James; Spooner, William; Hotz, Hans-Rudolf; Cox, Antony V.

    2004-01-01

    The Ensembl Web site (http://www.ensembl.org/) is the principal user interface to the data of the Ensembl project, and currently serves >500,000 pages (∼2.5 million hits) per week, providing access to >80 GB (gigabyte) of data to users in more than 80 countries. Built atop an open-source platform comprising Apache/mod_perl and the MySQL relational database management system, it is modular, extensible, and freely available. It is being actively reused and extended in several different projects, and has been downloaded and installed in companies and academic institutions worldwide. Here, we describe some of the technical features of the site, with particular reference to its dynamic configuration that enables it to handle disparate data from multiple species. PMID:15123591

  8. The Ensembl Web site: mechanics of a genome browser.

    PubMed

    Stalker, James; Gibbins, Brian; Meidl, Patrick; Smith, James; Spooner, William; Hotz, Hans-Rudolf; Cox, Antony V

    2004-05-01

    The Ensembl Web site (http://www.ensembl.org/) is the principal user interface to the data of the Ensembl project, and currently serves >500,000 pages (approximately 2.5 million hits) per week, providing access to >80 GB (gigabyte) of data to users in more than 80 countries. Built atop an open-source platform comprising Apache/mod_perl and the MySQL relational database management system, it is modular, extensible, and freely available. It is being actively reused and extended in several different projects, and has been downloaded and installed in companies and academic institutions worldwide. Here, we describe some of the technical features of the site, with particular reference to its dynamic configuration that enables it to handle disparate data from multiple species.

  9. OpenElectrophy: An Electrophysiological Data- and Analysis-Sharing Framework

    PubMed Central

    Garcia, Samuel; Fourcaud-Trocmé, Nicolas

    2008-01-01

    Progress in experimental tools and design is allowing the acquisition of increasingly large datasets. Storage, manipulation and efficient analyses of such large amounts of data is now a primary issue. We present OpenElectrophy, an electrophysiological data- and analysis-sharing framework developed to fill this niche. It stores all experiment data and meta-data in a single central MySQL database, and provides a graphic user interface to visualize and explore the data, and a library of functions for user analysis scripting in Python. It implements multiple spike-sorting methods, and oscillation detection based on the ridge extraction methods due to Roux et al. (2007). OpenElectrophy is open source and is freely available for download at http://neuralensemble.org/trac/OpenElectrophy. PMID:19521545

  10. Device Data Ingestion for Industrial Big Data Platforms with a Case Study †

    PubMed Central

    Ji, Cun; Shao, Qingshi; Sun, Jiao; Liu, Shijun; Pan, Li; Wu, Lei; Yang, Chenglei

    2016-01-01

    Despite having played a significant role in the Industry 4.0 era, the Internet of Things is currently faced with the challenge of how to ingest large-scale heterogeneous and multi-type device data. In response to this problem we present a heterogeneous device data ingestion model for an industrial big data platform. The model includes device templates and four strategies for data synchronization, data slicing, data splitting and data indexing, respectively. We can ingest device data from multiple sources with this heterogeneous device data ingestion model, which has been verified on our industrial big data platform. In addition, we present a case study on device data-based scenario analysis of industrial big data. PMID:26927121

  11. The Main Anatomic Variations of the Hepatic Artery and Their Importance in Surgical Practice: Review of the Literature.

    PubMed

    Noussios, George; Dimitriou, Ioannis; Chatzis, Iosif; Katsourakis, Anastasios

    2017-04-01

    Anatomical variations of the hepatic artery are important in the planning and performance of abdominal surgical procedures. Normal hepatic anatomy occurs in approximately 80% of cases, for the remaining 20% multiple variations have been described. The purpose of this study was to review the existing literature on the hepatic anatomy and to stress out its importance in surgical practice. Two main databases were searched for eligible articles during the period 2000 - 2015, and results concerning more than 19,000 patients were included in the study. The most common variation was the replaced right hepatic artery (type III according to Michels classification) which is the chief source of blood supply to the bile duct.

  12. Semantic Web Ontology and Data Integration: a Case Study in Aiding Psychiatric Drug Repurposing.

    PubMed

    Liang, Chen; Sun, Jingchun; Tao, Cui

    2015-01-01

    There remain significant difficulties selecting probable candidate drugs from existing databases. We describe an ontology-oriented approach to represent the nexus between genes, drugs, phenotypes, symptoms, and diseases from multiple information sources. We also report a case study in which we attempted to explore candidate drugs effective for bipolar disorder and epilepsy. We constructed an ontology incorporating knowledge between the two diseases and performed semantic reasoning tasks with the ontology. The results suggested 48 candidate drugs that hold promise for further breakthrough. The evaluation demonstrated the validity our approach. Our approach prioritizes the candidate drugs that have potential associations among genes, phenotypes and symptoms, and thus facilitates the data integration and drug repurposing in psychiatric disorders.

  13. [Retrospective census of cancers between 1994 and 2002 around the municipal solid waste incinerator of Gilly-sur-Isère].

    PubMed

    Thabuis, A; Schmitt, M; Megas, F; Fabres, B

    2007-12-01

    The retrospective cancer incidence study carried out around the municipal solid waste incinerator of Gilly-sur-Isère (Savoie, France) was ordered in a context of crisis during its closing in the late 2001. Its purpose was to determine whether or not there was an excessive number of cancers around the incinerator. In the absence of cancer registry in Savoie, this study consisted in counting as exhaustively as possible the cancers that occurred between 1994 and 2002 in the study area, which was exposed to the atmospheric fallouts from the incinerator. Thus, it was planned to compare the observed cancer incidence to the French cancer registries'. This work describes the main difficulties encountered as well as the solutions found during the census of cancer cases; the results of the incidence study are not included. The collection of medical data was carried out thanks to multiple sources of information: pathology and hematology laboratories, hospitals' and clinics' departments of medical information, health insurance funds, liberal practitioners or specialised cancer registries. The collected medical data files were dealt with: looking for the missing addresses, selecting patients from the study area, homogenizing cancers coding, merging files into a single database, analysing available information on each cancer and de-duplicating the database. Most cancers were validated by consulting medical folders so as to exclude the false cases like metastasises of a known primary cancer or recurrences. Two thousand eight hundred and forty-five cancers were initially collected, and 28% of them were excluded because they did not correspond to the case definition (no proof of cancer, diagnosis date before the study period...); the final database was made of 2055 cancer cases. Quality indicators showed that the database could be considered as exhaustive and valid as a registry's. Three types of sources allowed to identify 94% of cases: laboratories, hospitals' departments of medical information and health insurance funds. Using administrative data and consulting medical folders turned out to be necessary considering uncertainties about: the patients' residence at the time of the diagnosis, errors in coding cancers in some databases that were collected and difficulties to identify false cases. This census required very important means.

  14. Overview of Historical Earthquake Document Database in Japan and Future Development

    NASA Astrophysics Data System (ADS)

    Nishiyama, A.; Satake, K.

    2014-12-01

    In Japan, damage and disasters from historical large earthquakes have been documented and preserved. Compilation of historical earthquake documents started in the early 20th century and 33 volumes of historical document source books (about 27,000 pages) have been published. However, these source books are not effectively utilized for researchers due to a contamination of low-reliability historical records and a difficulty for keyword searching by characters and dates. To overcome these problems and to promote historical earthquake studies in Japan, construction of text database started in the 21 century. As for historical earthquakes from the beginning of the 7th century to the early 17th century, "Online Database of Historical Documents in Japanese Earthquakes and Eruptions in the Ancient and Medieval Ages" (Ishibashi, 2009) has been already constructed. They investigated the source books or original texts of historical literature, emended the descriptions, and assigned the reliability of each historical document on the basis of written age. Another database compiled the historical documents for seven damaging earthquakes occurred along the Sea of Japan coast in Honshu, central Japan in the Edo period (from the beginning of the 17th century to the middle of the 19th century) and constructed text database and seismic intensity data base. These are now publicized on the web (written only in Japanese). However, only about 9 % of the earthquake source books have been digitized so far. Therefore, we plan to digitize all of the remaining historical documents by the research-program which started in 2014. The specification of the data base will be similar for previous ones. We also plan to combine this database with liquefaction traces database, which will be constructed by other research program, by adding the location information described in historical documents. Constructed database would be utilized to estimate the distributions of seismic intensities and tsunami heights.

  15. Building a database for long-term monitoring of benthic macrofauna in the Pertuis-Charentais (2004-2014).

    PubMed

    Philippe, Anne S; Plumejeaud-Perreau, Christine; Jourde, Jérôme; Pineau, Philippe; Lachaussée, Nicolas; Joyeux, Emmanuel; Corre, Frédéric; Delaporte, Philippe; Bocher, Pierrick

    2017-01-01

    Long-term benthic monitoring is rewarding in terms of science, but labour-intensive, whether in the field, the laboratory, or behind the computer. Building and managing databases require multiple skills, including consistency over time as well as organisation via a systematic approach. Here, we introduce and share our spatially explicit benthic database, comprising 11 years of benthic data. It is the result of intensive benthic sampling that has been conducted on a regular grid (259 stations) covering the intertidal mudflats of the Pertuis-Charentais (Marennes-Oléron Bay and Aiguillon Bay). Samples were taken by foot or by boats during winter depending on tidal height, from December 2003 to February 2014. The present dataset includes abundances and biomass densities of all mollusc species of the study regions and principal polychaetes as well as their length, accessibility to shorebirds, energy content and shell mass when appropriate and available. This database has supported many studies dealing with the spatial distribution of benthic invertebrates and temporal variations in food resources for shorebird species as well as latitudinal comparisons with other databases. In this paper, we introduce our benthos monitoring, share our data, and present a "guide of good practices" for building, cleaning and using it efficiently, providing examples of results with associated R code. The dataset has been formatted into a geo-referenced relational database, using PostgreSQL open-source DBMS. We provide density information, measurements, energy content and accessibility of thirteen bivalve, nine gastropod and two polychaete taxa (a total of 66,620 individuals)​ for 11 consecutive winters. Figures and maps are provided to describe how the dataset was built, cleaned, and how it can be used. This dataset can again support studies concerning spatial and temporal variations in species abundance, interspecific interactions as well as evaluations of the availability of food resources for small- and medium size shorebirds and, potentially, conservation and impact assessment studies.

  16. Lessons Learned Implementing DOORS in a Citrix Environment

    NASA Technical Reports Server (NTRS)

    Bussman, Marie

    2005-01-01

    NASA's James Web Space Telescope (JWST) Project is a large multi-national project with geographically dispersed contractors that all need access to the Projects requirement database. Initially, the project utilized multiple DOORS databases with the built-in partitions feature to exchange modules amongst the various contractor sites. As the requirements databases matured the use of partitions became extremely difficult. There have been many issues such as incompatible versions of DOORS, inefficient mechanism for sharing modules, security concerns, performance issues, and inconsistent document import and export formats. Deployment of the client software with limited IT resources available was also an issue. The solution chosen by JWST was to integrate the use of a Citrix environment with the DOORS database to address most of the project concerns. The use of the Citrix solution allowed a single Requirements database in a secure environment via a web interface. The Citrix environment allows JWST to upgrade to the most current version of DOORS without having to coordinate multiple sites and user upgrades. The single requirements database eliminates a multitude of Configuration Management concerns and facilitated the standardization of documentation formats. This paper discusses the obstacles and the lessons learned throughout the installation, implementation, usage and deployment process of a centralized DOORS database solution.

  17. Content-based video indexing and searching with wavelet transformation

    NASA Astrophysics Data System (ADS)

    Stumpf, Florian; Al-Jawad, Naseer; Du, Hongbo; Jassim, Sabah

    2006-05-01

    Biometric databases form an essential tool in the fight against international terrorism, organised crime and fraud. Various government and law enforcement agencies have their own biometric databases consisting of combination of fingerprints, Iris codes, face images/videos and speech records for an increasing number of persons. In many cases personal data linked to biometric records are incomplete and/or inaccurate. Besides, biometric data in different databases for the same individual may be recorded with different personal details. Following the recent terrorist atrocities, law enforcing agencies collaborate more than before and have greater reliance on database sharing. In such an environment, reliable biometric-based identification must not only determine who you are but also who else you are. In this paper we propose a compact content-based video signature and indexing scheme that can facilitate retrieval of multiple records in face biometric databases that belong to the same person even if their associated personal data are inconsistent. We shall assess the performance of our system using a benchmark audio visual face biometric database that has multiple videos for each subject but with different identity claims. We shall demonstrate that retrieval of relatively small number of videos that are nearest, in terms of the proposed index, to any video in the database results in significant proportion of that individual biometric data.

  18. The NCBI BioSystems database

    PubMed Central

    Geer, Lewis Y.; Marchler-Bauer, Aron; Geer, Renata C.; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H.

    2010-01-01

    The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI’s Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets. PMID:19854944

  19. Effect of multiple-source entry on price competition after patent expiration in the pharmaceutical industry.

    PubMed Central

    Suh, D C; Manning, W G; Schondelmeyer, S; Hadsall, R S

    2000-01-01

    OBJECTIVE: To analyze the effect of multiple-source drug entry on price competition after patent expiration in the pharmaceutical industry. DATA SOURCES: Originators and their multiple-source drugs selected from the 35 chemical entities whose patents expired from 1984 through 1987. Data were obtained from various primary and secondary sources for the patents' expiration dates, sales volume and units sold, and characteristics of drugs in the sample markets. STUDY DESIGN: The study was designed to determine significant factors using the study model developed under the assumption that the off-patented market is an imperfectly segmented market. PRINCIPAL FINDINGS: After patent expiration, the originators' prices continued to increase, while the price of multiple-source drugs decreased significantly over time. By the fourth year after patent expiration, originators' sales had decreased 12 percent in dollars and 30 percent in quantity. Multiple-source drugs increased their sales twofold in dollars and threefold in quantity, and possessed about one-fourth (in dollars) and half (in quantity) of the total market three years after entry. CONCLUSION: After patent expiration, multiple-source drugs compete largely with other multiple-source drugs in the price-sensitive sector, but indirectly with the originator in the price-insensitive sector. Originators have first-mover advantages, and therefore have a market that is less price sensitive after multiple-source drugs enter. On the other hand, multiple-source drugs target the price-sensitive sector, using their lower-priced drugs. This trend may indicate that the off-patented market is imperfectly segmented between the price-sensitive and insensitive sector. Consumers as a whole can gain from the entry of multiple-source drugs because the average price of the market continually declines after patent expiration. PMID:10857475

  20. WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

    PubMed

    Sharma, Parichit; Mantri, Shrikant S

    2014-01-01

    The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.

  1. Characteristics of work-related fatal and hospitalised injuries not captured in workers' compensation data.

    PubMed

    Koehoorn, M; Tamburic, L; Xu, F; Alamgir, H; Demers, P A; McLeod, C B

    2015-06-01

    (1) To identify work-related fatal and non-fatal hospitalised injuries using multiple data sources, (2) to compare case-ascertainment from external data sources with accepted workers' compensation claims and (3) to investigate the characteristics of work-related fatal and hospitalised injuries not captured by workers' compensation. Work-related fatal injuries were ascertained from vital statistics, coroners and hospital discharge databases using payment and diagnosis codes and injury and work descriptions; and work-related (non-fatal) injuries were ascertained from the hospital discharge database using admission, diagnosis and payment codes. Injuries for British Columbia residents aged 15-64 years from 1991 to 2009 ascertained from the above external data sources were compared to accepted workers' compensation claims using per cent captured, validity analyses and logistic regression. The majority of work-related fatal injuries identified in the coroners data (83%) and the majority of work-related hospitalised injuries (95%) were captured as an accepted workers' compensation claim. A work-related coroner report was a positive predictor (88%), and the responsibility of payment field in the hospital discharge record a sensitive indicator (94%), for a workers' compensation claim. Injuries not captured by workers' compensation were associated with female gender, type of work (natural resources and other unspecified work) and injury diagnosis (eg, airway-related, dislocations and undetermined/unknown injury). Some work-related injuries captured by external data sources were not found in workers' compensation data in British Columbia. This may be the result of capturing injuries or workers that are ineligible for workers' compensation, or the result of injuries that go unreported to the compensation system. Hospital discharge records and coroner reports may provide opportunities to identify workers (or family members) with an unreported work-related injury and to provide them with information for submitting a workers' compensation claim. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  2. WImpiBLAST: Web Interface for mpiBLAST to Help Biologists Perform Large-Scale Annotation Using High Performance Computing

    PubMed Central

    Sharma, Parichit; Mantri, Shrikant S.

    2014-01-01

    The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis. PMID:24979410

  3. Characteristics of work-related fatal and hospitalised injuries not captured in workers’ compensation data

    PubMed Central

    Koehoorn, M; Tamburic, L; Xu, F; Alamgir, H; Demers, P A; McLeod, C B

    2015-01-01

    Objectives (1) To identify work-related fatal and non-fatal hospitalised injuries using multiple data sources, (2) to compare case-ascertainment from external data sources with accepted workers’ compensation claims and (3) to investigate the characteristics of work-related fatal and hospitalised injuries not captured by workers’ compensation. Methods Work-related fatal injuries were ascertained from vital statistics, coroners and hospital discharge databases using payment and diagnosis codes and injury and work descriptions; and work-related (non-fatal) injuries were ascertained from the hospital discharge database using admission, diagnosis and payment codes. Injuries for British Columbia residents aged 15–64 years from 1991 to 2009 ascertained from the above external data sources were compared to accepted workers’ compensation claims using per cent captured, validity analyses and logistic regression. Results The majority of work-related fatal injuries identified in the coroners data (83%) and the majority of work-related hospitalised injuries (95%) were captured as an accepted workers’ compensation claim. A work-related coroner report was a positive predictor (88%), and the responsibility of payment field in the hospital discharge record a sensitive indicator (94%), for a workers’ compensation claim. Injuries not captured by workers’ compensation were associated with female gender, type of work (natural resources and other unspecified work) and injury diagnosis (eg, airway-related, dislocations and undetermined/unknown injury). Conclusions Some work-related injuries captured by external data sources were not found in workers’ compensation data in British Columbia. This may be the result of capturing injuries or workers that are ineligible for workers’ compensation, or the result of injuries that go unreported to the compensation system. Hospital discharge records and coroner reports may provide opportunities to identify workers (or family members) with an unreported work-related injury and to provide them with information for submitting a workers’ compensation claim. PMID:25713157

  4. ACToR Chemical Structure processing using Open Source ...

    EPA Pesticide Factsheets

    ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from over 1,950 public sources. ACToR contains chemical structure information and toxicological data for over 558,000 unique chemicals. The database primarily includes data from NCCT research programs, in vivo toxicity data from ToxRef, human exposure data from ExpoCast, high-throughput screening data from ToxCast and high quality chemical structure information from the EPA DSSTox program. The DSSTox database is a chemical structure inventory for the NCCT programs and currently has about 16,000 unique structures. Included are also data from PubChem, ChemSpider, USDA, FDA, NIH and several other public data sources. ACToR has been a resource to various international and national research groups. Most of our recent efforts on ACToR are focused on improving the structural identifiers and Physico-Chemical properties of the chemicals in the database. Organizing this huge collection of data and improving the chemical structure quality of the database has posed some major challenges. Workflows have been developed to process structures, calculate chemical properties and identify relationships between CAS numbers. The Structure processing workflow integrates web services (PubChem and NIH NCI Cactus) to d

  5. The Cancer Epidemiology Descriptive Cohort Database: A Tool to Support Population-Based Interdisciplinary Research

    PubMed Central

    Kennedy, Amy E.; Khoury, Muin J.; Ioannidis, John P.A.; Brotzman, Michelle; Miller, Amy; Lane, Crystal; Lai, Gabriel Y.; Rogers, Scott D.; Harvey, Chinonye; Elena, Joanne W.; Seminara, Daniela

    2017-01-01

    Background We report on the establishment of a web-based Cancer Epidemiology Descriptive Cohort Database (CEDCD). The CEDCD’s goals are to enhance awareness of resources, facilitate interdisciplinary research collaborations, and support existing cohorts for the study of cancer-related outcomes. Methods Comprehensive descriptive data were collected from large cohorts established to study cancer as primary outcome using a newly developed questionnaire. These included an inventory of baseline and follow-up data, biospecimens, genomics, policies, and protocols. Additional descriptive data extracted from publicly available sources were also collected. This information was entered in a searchable and publicly accessible database. We summarized the descriptive data across cohorts and reported the characteristics of this resource. Results As of December 2015, the CEDCD includes data from 46 cohorts representing more than 6.5 million individuals (29% ethnic/racial minorities). Overall, 78% of the cohorts have collected blood at least once, 57% at multiple time points, and 46% collected tissue samples. Genotyping has been performed by 67% of the cohorts, while 46% have performed whole-genome or exome sequencing in subsets of enrolled individuals. Information on medical conditions other than cancer has been collected in more than 50% of the cohorts. More than 600,000 incident cancer cases and more than 40,000 prevalent cases are reported, with 24 cancer sites represented. Conclusions The CEDCD assembles detailed descriptive information on a large number of cancer cohorts in a searchable database. Impact Information from the CEDCD may assist the interdisciplinary research community by facilitating identification of well-established population resources and large-scale collaborative and integrative research. PMID:27439404

  6. AIM: a comprehensive Arabidopsis interactome module database and related interologs in plants.

    PubMed

    Wang, Yi; Thilmony, Roger; Zhao, Yunjun; Chen, Guoping; Gu, Yong Q

    2014-01-01

    Systems biology analysis of protein modules is important for understanding the functional relationships between proteins in the interactome. Here, we present a comprehensive database named AIM for Arabidopsis (Arabidopsis thaliana) interactome modules. The database contains almost 250,000 modules that were generated using multiple analysis methods and integration of microarray expression data. All the modules in AIM are well annotated using multiple gene function knowledge databases. AIM provides a user-friendly interface for different types of searches and offers a powerful graphical viewer for displaying module networks linked to the enrichment annotation terms. Both interactive Venn diagram and power graph viewer are integrated into the database for easy comparison of modules. In addition, predicted interologs from other plant species (homologous proteins from different species that share a conserved interaction module) are available for each Arabidopsis module. AIM is a powerful systems biology platform for obtaining valuable insights into the function of proteins in Arabidopsis and other plants using the modules of the Arabidopsis interactome. Database URL:http://probes.pw.usda.gov/AIM Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

  7. Quantifying Data Quality for Clinical Trials Using Electronic Data Capture

    PubMed Central

    Nahm, Meredith L.; Pieper, Carl F.; Cunningham, Maureen M.

    2008-01-01

    Background Historically, only partial assessments of data quality have been performed in clinical trials, for which the most common method of measuring database error rates has been to compare the case report form (CRF) to database entries and count discrepancies. Importantly, errors arising from medical record abstraction and transcription are rarely evaluated as part of such quality assessments. Electronic Data Capture (EDC) technology has had a further impact, as paper CRFs typically leveraged for quality measurement are not used in EDC processes. Methods and Principal Findings The National Institute on Drug Abuse Treatment Clinical Trials Network has developed, implemented, and evaluated methodology for holistically assessing data quality on EDC trials. We characterize the average source-to-database error rate (14.3 errors per 10,000 fields) for the first year of use of the new evaluation method. This error rate was significantly lower than the average of published error rates for source-to-database audits, and was similar to CRF-to-database error rates reported in the published literature. We attribute this largely to an absence of medical record abstraction on the trials we examined, and to an outpatient setting characterized by less acute patient conditions. Conclusions Historically, medical record abstraction is the most significant source of error by an order of magnitude, and should be measured and managed during the course of clinical trials. Source-to-database error rates are highly dependent on the amount of structured data collection in the clinical setting and on the complexity of the medical record, dependencies that should be considered when developing data quality benchmarks. PMID:18725958

  8. Czech multicenter research database of severe COPD

    PubMed Central

    Novotna, Barbora; Koblizek, Vladimir; Zatloukal, Jaromir; Plutinsky, Marek; Hejduk, Karel; Zbozinkova, Zuzana; Jarkovsky, Jiri; Sobotik, Ondrej; Dvorak, Tomas; Safranek, Petr

    2014-01-01

    Purpose Chronic obstructive pulmonary disease (COPD) has been recognized as a heterogeneous, multiple organ system-affecting disorder. The Global Initiative for Chronic Obstructive Lung Disease (GOLD) places emphasis on symptom and exacerbation management. The aim of this study is examine the course of COPD and its impact on morbidity and all-cause mortality of patients, with respect to individual phenotypes and GOLD categories. This study will also evaluate COPD real-life patient care in the Czech Republic. Patients and methods The Czech Multicentre Research Database of COPD is projected to last for 5 years, with the aim of enrolling 1,000 patients. This is a multicenter, observational, and prospective study of patients with severe COPD (post-bronchodilator forced expiratory volume in 1 second ≤60%). Every consecutive patient, who fulfils the inclusion criteria, is asked to participate in the study. Patient recruitment is done on the basis of signed informed consent. The study was approved by the Multicentre Ethical Committee in Brno, Czech Republic. Results The objective of this paper was to outline the methodology of this study. Conclusion The establishment of the database is a useful step in improving care for COPD subjects. Additionally, it will serve as a source of data elucidating the natural course of COPD, comorbidities, and overall impact on the patients. Moreover, it will provide information on the diverse course of the COPD syndrome in the Czech Republic. PMID:25419124

  9. Retrospective Mining of Toxicology Data to Discover ...

    EPA Pesticide Factsheets

    In vivo toxicology data is subject to multiple sources of uncertainty: observer severity bias (a pathologist may record only more severe effects and ignore less severe ones); dose spacing issues (this can lead to missing data, e.g. if a severe effect has a less severe precursor, but both occur at the same tested dose); imperfect control of key independent variables (in databases, one can rarely control key input variables such as animal strain or dosing schedules); effect description heterogeneity (terminology changes over time which can lead to information loss); statistical issues (too few chemicals with a given phenotype, or too few animals in dose groups). These issues directly contribute to uncertainties in models built from the data. We are investigating the use of collections of endpoints (toxicity syndromes) to address these issues. These are identical in concept to medical syndromes which allow a physician to diagnose an underlying disease more accurately than can be done when relying on examination of one symptom at a time. Our test case is anemia, for several reasons: most of the phenotypes (e.g. cell counts) are quantitative; related effects are measured in an automated way; anemia is relatively common, at least at high doses (~30% of chemicals in our database show significant drops in red cell count); the causes of anemia are well understood; and, there is a standard clinical decision tree to classify anemia. Using a database of 658 chemicals, we ha

  10. An “EAR” on environmental surveillance and monitoring: A case study on the use of Exposure–Activity Ratios (EARs) to prioritize sites, chemicals, and bioactivities of concern in Great Lakes waters

    USGS Publications Warehouse

    Blackwell, Brett R.; Ankley, Gerald T.; Corsi, Steven; DeCicco, Laura; Houck, Kieth A.; Judson, Richard S.; Li, Shibin; Martin, Matthew T.; Murphy, Elizabeth; Schroeder, Anthony L.; Smith, Edwin R.; Swintek, Joe; Villeneuve, Daniel L.

    2017-01-01

    Current environmental monitoring approaches focus primarily on chemical occurrence. However, based on concentration alone, it can be difficult to identify which compounds may be of toxicological concern and should be prioritized for further monitoring, in-depth testing, or management. This can be problematic because toxicological characterization is lacking for many emerging contaminants. New sources of high-throughput screening (HTS) data, such as the ToxCast database, which contains information for over 9000 compounds screened through up to 1100 bioassays, are now available. Integrated analysis of chemical occurrence data with HTS data offers new opportunities to prioritize chemicals, sites, or biological effects for further investigation based on concentrations detected in the environment linked to relative potencies in pathway-based bioassays. As a case study, chemical occurrence data from a 2012 study in the Great Lakes Basin along with the ToxCast effects database were used to calculate exposure–activity ratios (EARs) as a prioritization tool. Technical considerations of data processing and use of the ToxCast database are presented and discussed. EAR prioritization identified multiple sites, biological pathways, and chemicals that warrant further investigation. Prioritized bioactivities from the EAR analysis were linked to discrete adverse outcome pathways to identify potential adverse outcomes and biomarkers for use in subsequent monitoring efforts.

  11. QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors.

    PubMed

    Tarasova, Olga A; Urusova, Aleksandra F; Filimonov, Dmitry A; Nicklaus, Marc C; Zakharov, Alexey V; Poroikov, Vladimir V

    2015-07-27

    Large-scale databases are important sources of training sets for various QSAR modeling approaches. Generally, these databases contain information extracted from different sources. This variety of sources can produce inconsistency in the data, defined as sometimes widely diverging activity results for the same compound against the same target. Because such inconsistency can reduce the accuracy of predictive models built from these data, we are addressing the question of how best to use data from publicly and commercially accessible databases to create accurate and predictive QSAR models. We investigate the suitability of commercially and publicly available databases to QSAR modeling of antiviral activity (HIV-1 reverse transcriptase (RT) inhibition). We present several methods for the creation of modeling (i.e., training and test) sets from two, either commercially or freely available, databases: Thomson Reuters Integrity and ChEMBL. We found that the typical predictivities of QSAR models obtained using these different modeling set compilation methods differ significantly from each other. The best results were obtained using training sets compiled for compounds tested using only one method and material (i.e., a specific type of biological assay). Compound sets aggregated by target only typically yielded poorly predictive models. We discuss the possibility of "mix-and-matching" assay data across aggregating databases such as ChEMBL and Integrity and their current severe limitations for this purpose. One of them is the general lack of complete and semantic/computer-parsable descriptions of assay methodology carried by these databases that would allow one to determine mix-and-matchability of result sets at the assay level.

  12. New UV-source catalogs, UV spectral database, UV variables and science tools from the GALEX surveys

    NASA Astrophysics Data System (ADS)

    Bianchi, Luciana; de la Vega, Alexander; Shiao, Bernard; Bohlin, Ralph

    2018-03-01

    We present a new, expanded and improved catalog of Ultraviolet (UV) sources from the GALEX All-Sky Imaging survey: GUVcat_AIS (Bianchi et al. in Astrophys. J. Suppl. Ser. 230:24, 2017). The catalog includes 83 million unique sources (duplicate measurements and rim artifacts are removed) measured in far-UV and near-UV. With respect to previous versions (Bianchi et al. in Mon. Not. R. Astron. Soc. 411:2770 2011a, Adv. Space Res. 53:900-991, 2014), GUVcat_AIS covers a slightly larger area, 24,790 square degrees, and includes critical corrections and improvements, as well as new tags, in particular to identify sources in the footprint of extended objects, where pipeline source detection may fail and custom-photometry may be necessary. The UV unique-source catalog facilitates studies of density of sources, and matching of the UV samples with databases at other wavelengths. We also present first results from two ongoing projects, addressing respectively UV variability searches on time scales from seconds to years by mining the GALEX photon archive, and the construction of a database of ˜120,000 GALEX UV spectra (range ˜1300-3000 Å), including quality and calibration assessment and classification of the grism, hence serendipitous, spectral sources.

  13. Status of the CDS Services, SIMBAD, VizieR and Aladin

    NASA Astrophysics Data System (ADS)

    Genova, Francoise; Allen, M. G.; Bienayme, O.; Boch, T.; Bonnarel, F.; Cambresy, L.; Derriere, S.; Dubois, P.; Fernique, P.; Landais, G.; Lesteven, S.; Loup, C.; Oberto, A.; Ochsenbein, F.; Schaaff, A.; Vollmer, B.; Wenger, M.; Louys, M.; Davoust, E.; Jasniewicz, G.

    2006-12-01

    Major evolutions have been implemented in the three main CDS databases in 2006. SIMBAD 4, a new version of SIMBAD developed with Java and PostgreSQL, has been released. Il is much more flexible than the previous version and offers in particular full search capabilities on all parameters. Wild card can also be used in object names, which should ease searching for a given object in the frequent case of 'fuzzy' nomenclature. New information is progressively added, in particular a set of multiwavelength magnitudes (in progress), and other information from the Dictionnary of Nomenclature such as the list of object types attached to each object name (available), or hierarchy and associations (in progress). A new version of VizieR, also in the open source PostgreSQL DBMS, has been completed, in order to simplify mirroring. The master database at CDS currently remains in the present Sybase implementation. A new simplified interface will be demonstrated, providing a more user-friendly navigation while retaining the multiple browsing capabilities. A new release of the Aladin Sky Atlas offers new capabilities, like the management of multipart FITS files and of data cubes, construction and execution of macros for processing a list of targets, and improved navigation within an image plane. This new version also allows easy and efficient manipulation of very large (>108 pixels) images, support for solar images display, and direct access to SExtractor to perform source extraction on displayed images.

  14. Database Design for the Evaluation of On-shore and Off-Shore Storm Characteristics over East Central Florida

    NASA Technical Reports Server (NTRS)

    Simpson, Amy A.; Wilson, Jennifer G.; Brown, Robert G.

    2015-01-01

    Data from multiple sources is needed to investigate lightning characteristics over differing terrain (on-shore vs. off-shore) by comparing natural cloud-to-ground lightning behavior differences depending on the characteristics of attachment mediums. The KSC Lightning Research Database (KLRD) was created to reduce manual data entry time and aid research by combining information from various data sources into a single record for each unique lightning event of interest. The KLRD uses automatic data handling functions to import data from a lightning detection network and identify and record lighting events of interest. Additional automatic functions import data from the NASA Buoy 41009 (located approximately 20 miles off the coast) and the KSC Electric Field Mill network, then match these electric field mill values to the corresponding lightning events. The KLRD calculates distances between each lightning event and the various electric field mills, aids in identifying the location type for each stroke (i.e., on-shore vs. off-shore, etc.), provides statistics on the number of strokes per flash, and produces customizable reports for quick retrieval and logical display of data. Data from February 2014 to date covers 48 unique storm dates with 2295 flashes containing 5700 strokes, of which 2612 are off-shore and 1003 are on-shore. The number of strokes per flash ranges from 1 to 22. The ratio of single to subsequent stroke flashes is 1.29 for off-shore strokes and 2.19 for on-shore strokes.

  15. Environmental health impacts of tobacco farming: a review of the literature.

    PubMed

    Lecours, Natacha; Almeida, Guilherme E G; Abdallah, Jumanne M; Novotny, Thomas E

    2012-03-01

    To review the literature on environmental health impacts of tobacco farming and to summarise the findings and research gaps in this field. A standard literature search was performed using multiple electronic databases for identification of peer-reviewed articles. The internet and organisational databases were also used to find other types of documents (eg, books and reports). The reference lists of identified relevant documents were reviewed to find additional sources. The selected studies documented many negative environmental impacts of tobacco production at the local level, often linking them with associated social and health problems. The common agricultural practices related to tobacco farming, especially in low-income and middle-income countries, lead to deforestation and soil degradation. Agrochemical pollution and deforestation in turn lead to ecological disruptions that cause a loss of ecosystem services, including land resources, biodiversity and food sources, which negatively impact human health. Multinational tobacco companies' policies and practices contribute to environmental problems related to tobacco leaf production. Development and implementation of interventions against the negative environmental impacts of tobacco production worldwide are necessary to protect the health of farmers, particularly in low-income and middle-income countries. Transitioning these farmers out of tobacco production is ultimately the resolution to this environmental health problem. In order to inform policy, however, further research is needed to better quantify the health impacts of tobacco farming and evaluate the potential alternative livelihoods that may be possible for tobacco farmers globally.

  16. Quantification of the Uncertainties for the Ares I A106 Ascent Aerodynamic Database

    NASA Technical Reports Server (NTRS)

    Houlden, Heather P.; Favaregh, Amber L.

    2010-01-01

    A detailed description of the quantification of uncertainties for the Ares I ascent aero 6-DOF wind tunnel database is presented. The database was constructed from wind tunnel test data and CFD results. The experimental data came from tests conducted in the Boeing Polysonic Wind Tunnel in St. Louis and the Unitary Plan Wind Tunnel at NASA Langley Research Center. The major sources of error for this database were: experimental error (repeatability), database modeling errors, and database interpolation errors.

  17. WPBMB Entrez: An interface to NCBI Entrez for Wordpress.

    PubMed

    Gohara, David W

    2018-03-01

    Research-oriented websites are an important means for the timely communication of information. These websites fall under a number of categories including: research laboratories, training grant and program projects, and online service portals. Invariably there is content on a site, such as publication listings, that require frequent updating. A number of content management systems exist to aid in the task of developing and managing a website, each with their strengths and weaknesses. One popular choice is Wordpress, a free, open source and actively developed application for the creation of web content. During a recent site redesign for our department, the need arose to ensure publications were up to date for each of the research labs and department as a whole. Several plugins for Wordpress offer this type of functionality, but in many cases the plugins are either no longer maintained, are missing features that would require the use of several, possibly incompatible, plugins or lack features for layout on a webpage. WPBMB Entrez was developed to address these needs. WPBMB Entrez utilizes a subset of NCBI Entrez and RCSB databases to maintain up to date records of publications, and publication related information on Wordpress-based websites. The core functionality uses the same search query syntax as on the NCBI Entrez site, including advanced query syntaxes. The plugin is extensible allowing for rapid development and addition of new data sources as the need arises. WPBMB Entrez was designed to be easy to use, yet flexible enough to address more complex usage scenarios. Features of the plugin include: an easy to use interface, design customization, multiple templates for displaying publication results, a caching mechanism to reduce page load times, supports multiple distinct queries and retrieval modes, and the ability to aggregate multiple queries into unified lists. Additionally, developer documentation is provided to aid in customization of the plugin. WPBMB Entrez is available at no cost, is open source and works with all recent versions of Wordpress. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Predicting residential indoor concentrations of nitrogen dioxide, fine particulate matter, and elemental carbon using questionnaire and geographic information system based data

    NASA Astrophysics Data System (ADS)

    Baxter, Lisa K.; Clougherty, Jane E.; Paciorek, Christopher J.; Wright, Rosalind J.; Levy, Jonathan I.

    Previous studies have identified associations between traffic-related air pollution and adverse health effects. Most have used measurements from a few central ambient monitors and/or some measure of traffic as indicators of exposure, disregarding spatial variability and factors influencing personal exposure-ambient concentration relationships. This study seeks to utilize publicly available data (i.e., central site monitors, geographic information system, and property assessment data) and questionnaire responses to predict residential indoor concentrations of traffic-related air pollutants for lower socioeconomic status (SES) urban households. As part of a prospective birth cohort study in urban Boston, we collected indoor and outdoor 3-4 day samples of nitrogen dioxide (NO 2) and fine particulate matter (PM 2.5) in 43 low SES residences across multiple seasons from 2003 to 2005. Elemental carbon (EC) concentrations were determined via reflectance analysis. Multiple traffic indicators were derived using Massachusetts Highway Department data and traffic counts collected outside sampling homes. Home characteristics and occupant behaviors were collected via a standardized questionnaire. Additional housing information was collected through property tax records, and ambient concentrations were collected from a centrally located ambient monitor. The contributions of ambient concentrations, local traffic and indoor sources to indoor concentrations were quantified with regression analyses. PM 2.5 was influenced less by local traffic but had significant indoor sources, while EC was associated with traffic and NO 2 with both traffic and indoor sources. Comparing models based on covariate selection using p-values or a Bayesian approach yielded similar results, with traffic density within a 50 m buffer of a home and distance from a truck route as important contributors to indoor levels of NO 2 and EC, respectively. The Bayesian approach also highlighted the uncertanity in the models. We conclude that by utilizing public databases and focused questionnaire data we can identify important predictors of indoor concentrations for multiple air pollutants in a high-risk population.

  19. Improved Dust Forecast Products for Southwest Asia Forecasters through Dust Source Database Advancements

    NASA Astrophysics Data System (ADS)

    Brooks, G. R.

    2011-12-01

    Dust storm forecasting is a critical part of military theater operations in Afghanistan and Iraq as well as other strategic areas of the globe. The Air Force Weather Agency (AFWA) has been using the Dust Transport Application (DTA) as a forecasting tool since 2001. Initially developed by The Johns Hopkins University Applied Physics Laboratory (JHUAPL), output products include dust concentration and reduction of visibility due to dust. The performance of the products depends on several factors including the underlying dust source database, treatment of soil moisture, parameterization of dust processes, and validity of the input atmospheric model data. Over many years of analysis, seasonal dust forecast biases of the DTA have been observed and documented. As these products are unique and indispensible for U.S. and NATO forces, amendments were required to provide the best forecasts possible. One of the quickest ways to scientifically address the dust concentration biases noted over time was to analyze the weaknesses in, and adjust the dust source database. Dust source database strengths and weaknesses, the satellite analysis and adjustment process, and tests which confirmed the resulting improvements in the final dust concentration and visibility products will be shown.

  20. Development of expert systems for analyzing electronic documents

    NASA Astrophysics Data System (ADS)

    Abeer Yassin, Al-Azzawi; Shidlovskiy, S.; Jamal, A. A.

    2018-05-01

    The paper analyses a Database Management System (DBMS). Expert systems, Databases, and database technology have become an essential component of everyday life in the modern society. As databases are widely used in every organization with a computer system, data resource control and data management are very important [1]. DBMS is the most significant tool developed to serve multiple users in a database environment consisting of programs that enable users to create and maintain a database. This paper focuses on development of a database management system for General Directorate for education of Diyala in Iraq (GDED) using Clips, java Net-beans and Alfresco and system components, which were previously developed in Tomsk State University at the Faculty of Innovative Technology.

  1. DEVELOPMENT AND APPLICATION OF THE DORIAN (DOSE-RESPONSE INFORMATION ANALYSIS) SYSTEM

    EPA Science Inventory

    • Migration of ArrayTrack from the proprietary Oracle database to open source Postgres database.
    • Making the public version of the ebKB available with provisions for soliciting input from collaborators and outside users.
    • Continued development ...

    • A structured vocabulary for indexing dietary supplements in databases in the United States

      PubMed Central

      Saldanha, Leila G; Dwyer, Johanna T; Holden, Joanne M; Ireland, Jayne D.; Andrews, Karen W; Bailey, Regan L; Gahche, Jaime J.; Hardy, Constance J; Møller, Anders; Pilch, Susan M.; Roseland, Janet M

      2011-01-01

      Food composition databases are critical to assess and plan dietary intakes. Dietary supplement databases are also needed because dietary supplements make significant contributions to total nutrient intakes. However, no uniform system exists for classifying dietary supplement products and indexing their ingredients in such databases. Differing approaches to classifying these products make it difficult to retrieve or link information effectively. A consistent approach to classifying information within food composition databases led to the development of LanguaL™, a structured vocabulary. LanguaL™ is being adapted as an interface tool for classifying and retrieving product information in dietary supplement databases. This paper outlines proposed changes to the LanguaL™ thesaurus for indexing dietary supplement products and ingredients in databases. The choice of 12 of the original 14 LanguaL™ facets pertinent to dietary supplements, modifications to their scopes, and applications are described. The 12 chosen facets are: Product Type; Source; Part of Source; Physical State, Shape or Form; Ingredients; Preservation Method, Packing Medium, Container or Wrapping; Contact Surface; Consumer Group/Dietary Use/Label Claim; Geographic Places and Regions; and Adjunct Characteristics of food. PMID:22611303

    • Domain Regeneration for Cross-Database Micro-Expression Recognition

      NASA Astrophysics Data System (ADS)

      Zong, Yuan; Zheng, Wenming; Huang, Xiaohua; Shi, Jingang; Cui, Zhen; Zhao, Guoying

      2018-05-01

      In this paper, we investigate the cross-database micro-expression recognition problem, where the training and testing samples are from two different micro-expression databases. Under this setting, the training and testing samples would have different feature distributions and hence the performance of most existing micro-expression recognition methods may decrease greatly. To solve this problem, we propose a simple yet effective method called Target Sample Re-Generator (TSRG) in this paper. By using TSRG, we are able to re-generate the samples from target micro-expression database and the re-generated target samples would share same or similar feature distributions with the original source samples. For this reason, we can then use the classifier learned based on the labeled source samples to accurately predict the micro-expression categories of the unlabeled target samples. To evaluate the performance of the proposed TSRG method, extensive cross-database micro-expression recognition experiments designed based on SMIC and CASME II databases are conducted. Compared with recent state-of-the-art cross-database emotion recognition methods, the proposed TSRG achieves more promising results.

    • CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources.

      PubMed

      Bleda, Marta; Tarraga, Joaquin; de Maria, Alejandro; Salavert, Francisco; Garcia-Alonso, Luz; Celma, Matilde; Martin, Ainoha; Dopazo, Joaquin; Medina, Ignacio

      2012-07-01

      During the past years, the advances in high-throughput technologies have produced an unprecedented growth in the number and size of repositories and databases storing relevant biological data. Today, there is more biological information than ever but, unfortunately, the current status of many of these repositories is far from being optimal. Some of the most common problems are that the information is spread out in many small databases; frequently there are different standards among repositories and some databases are no longer supported or they contain too specific and unconnected information. In addition, data size is increasingly becoming an obstacle when accessing or storing biological data. All these issues make very difficult to extract and integrate information from different sources, to analyze experiments or to access and query this information in a programmatic way. CellBase provides a solution to the growing necessity of integration by easing the access to biological data. CellBase implements a set of RESTful web services that query a centralized database containing the most relevant biological data sources. The database is hosted in our servers and is regularly updated. CellBase documentation can be found at http://docs.bioinfo.cipf.es/projects/cellbase.

    • Assessment of COPD-related outcomes via a national electronic medical record database.

      PubMed

      Asche, Carl; Said, Quayyim; Joish, Vijay; Hall, Charles Oaxaca; Brixner, Diana

      2008-01-01

      The technology and sophistication of healthcare utilization databases have expanded over the last decade to include results of lab tests, vital signs, and other clinical information. This review provides an assessment of the methodological and analytical challenges of conducting chronic obstructive pulmonary disease (COPD) outcomes research in a national electronic medical records (EMR) dataset and its potential application towards the assessment of national health policy issues, as well as a description of the challenges or limitations. An EMR database and its application to measuring outcomes for COPD are described. The ability to measure adherence to the COPD evidence-based practice guidelines, generated by the NIH and HEDIS quality indicators, in this database was examined. Case studies, before and after their publication, were used to assess the adherence to guidelines and gauge the conformity to quality indicators. EMR was the only source of information for pulmonary function tests, but low frequency in ordering by primary care was an issue. The EMR data can be used to explore impact of variation in healthcare provision on clinical outcomes. The EMR database permits access to specific lab data and biometric information. The richness and depth of information on "real world" use of health services for large population-based analytical studies at relatively low cost render such databases an attractive resource for outcomes research. Various sources of information exist to perform outcomes research. It is important to understand the desired endpoints of such research and choose the appropriate database source.

    • The World Spatiotemporal Analytics and Mapping Project (WSTAMP): Discovering, Exploring, and Mapping Spatiotemporal Patterns Across Heterogenous Space-Time Data

      NASA Astrophysics Data System (ADS)

      Morton, A.; Stewart, R.; Held, E.; Piburn, J.; Allen, M. R.; McManamay, R.; Sanyal, J.; Sorokine, A.; Bhaduri, B. L.

      2017-12-01

      Spatiotemporal (ST) analytics applied to major spatio-temporal data sources from major vendors such as USGS, NOAA, World Bank and World Health Organization have tremendous value in shedding light on the evolution of physical, cultural, and geopolitical landscapes on a local and global level. Especially powerful is the integration of these physical and cultural datasets across multiple and disparate formats, facilitating new interdisciplinary analytics and insights. Realizing this potential first requires an ST data model that addresses challenges in properly merging data from multiple authors, with evolving ontological perspectives, semantical differences, changing attributes, and content that is textual, numeric, categorical, and hierarchical. Equally challenging is the development of analytical and visualization approaches that provide a serious exploration of this integrated data while remaining accessible to practitioners with varied backgrounds. The WSTAMP project at the Oak Ridge National Laboratory has yielded two major results in addressing these challenges: 1) development of the WSTAMP database, a significant advance in ST data modeling that integrates 16000+ attributes covering 200+ countries for over 50 years from over 30 major sources and 2) a novel online ST exploratory and analysis tool providing an array of modern statistical and visualization techniques for analyzing these data temporally, spatially, and spatiotemporally under a standard analytic workflow. We report on these advances, provide an illustrative case study, and inform how others may freely access the tool.

    • A new open-source Python-based Space Weather data access, visualization, and analysis toolkit

      NASA Astrophysics Data System (ADS)

      de Larquier, S.; Ribeiro, A.; Frissell, N. A.; Spaleta, J.; Kunduri, B.; Thomas, E. G.; Ruohoniemi, J.; Baker, J. B.

      2013-12-01

      Space weather research relies heavily on combining and comparing data from multiple observational platforms. Current frameworks exist to aggregate some of the data sources, most based on file downloads via web or ftp interfaces. Empirical models are mostly fortran based and lack interfaces with more useful scripting languages. In an effort to improve data and model access, the SuperDARN community has been developing a Python-based Space Science Data Visualization Toolkit (DaViTpy). At the center of this development was a redesign of how our data (from 30 years of SuperDARN radars) was made available. Several access solutions are now wrapped into one convenient Python interface which probes local directories, a new remote NoSQL database, and an FTP server to retrieve the requested data based on availability. Motivated by the efficiency of this interface and the inherent need for data from multiple instruments, we implemented similar modules for other space science datasets (POES, OMNI, Kp, AE...), and also included fundamental empirical models with Python interfaces to enhance data analysis (IRI, HWM, MSIS...). All these modules and more are gathered in a single convenient toolkit, which is collaboratively developed and distributed using Github and continues to grow. While still in its early stages, we expect this toolkit will facilitate multi-instrument space weather research and improve scientific productivity.

    • Dietary sources and determinants of soy isoflavone intake among midlife Chinese Women in Hong Kong.

      PubMed

      Chan, Sieu-gaen; Ho, Suzanne C; Kreiger, Nancy; Darlington, Gerarda; So, Kam F; Chong, Portia Y Y

      2007-11-01

      The dietary sources, intake levels, and determinants of soy isoflavone intake were examined using 3217 dietary recalls (DR) collected from 141 Hong Kong Chinese women aged 50-61 y. Multiple-pass 24-h DR were administered by phone by trained interviewers on 23 random, nonconsecutive days to participants over a 12-mo period from 2001 to 2002. We calculated isoflavone intake using analytical values in the Chinese University of Hong Kong Soy Isoflavone Database. Results indicated that the daily intake of total isoflavones was 7.8 +/- 5.6 mg in the study population. Non-Cantonese women had a higher intake of 10.7 +/- 7.6 mg compared with 7.3 +/- 5.0 mg in Cantonese women (P = 0.04). Altogether, 22 foods contributed approximately 90% of the total isoflavone intake. Soft tofu alone accounted for approximately 21% of the isoflavone intake, followed by bean curd skin (7.1%), name-brand soybean milk (6.3%), homemade soybean milk (6.2%), and generic soybean milk (5.8%). Combined, these 5 food items contributed 46% of the total dietary isoflavones. Multiple linear regression analysis indicated dialect group, self-reported health, and age group were significant independent predictors of soy isoflavone consumption. The data provide the basis for elucidating the patterns, determinants, and assessment of dietary soy isoflavone intake in Asian women.

    • Endeavour update: a web resource for gene prioritization in multiple species

      PubMed Central

      Tranchevent, Léon-Charles; Barriot, Roland; Yu, Shi; Van Vooren, Steven; Van Loo, Peter; Coessens, Bert; De Moor, Bart; Aerts, Stein; Moreau, Yves

      2008-01-01

      Endeavour (http://www.esat.kuleuven.be/endeavourweb; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes. Using a training set of genes known to be involved in a biological process of interest, our approach consists of (i) inferring several models (based on various genomic data sources), (ii) applying each model to the candidate genes to rank those candidates against the profile of the known genes and (iii) merging the several rankings into a global ranking of the candidate genes. In the present article, we describe the latest developments of Endeavour. First, we provide a web-based user interface, besides our Java client, to make Endeavour more universally accessible. Second, we support multiple species: in addition to Homo sapiens, we now provide gene prioritization for three major model organisms: Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Third, Endeavour makes use of additional data sources and is now including numerous databases: ontologies and annotations, protein–protein interactions, cis-regulatory information, gene expression data sets, sequence information and text-mining data. We tested the novel version of Endeavour on 32 recent disease gene associations from the literature. Additionally, we describe a number of recent independent studies that made use of Endeavour to prioritize candidate genes for obesity and Type II diabetes, cleft lip and cleft palate, and pulmonary fibrosis. PMID:18508807

    • A scalable architecture for extracting, aligning, linking, and visualizing multi-Int data

      NASA Astrophysics Data System (ADS)

      Knoblock, Craig A.; Szekely, Pedro

      2015-05-01

      An analyst today has a tremendous amount of data available, but each of the various data sources typically exists in their own silos, so an analyst has limited ability to see an integrated view of the data and has little or no access to contextual information that could help in understanding the data. We have developed the Domain-Insight Graph (DIG) system, an innovative architecture for extracting, aligning, linking, and visualizing massive amounts of domain-specific content from unstructured sources. Under the DARPA Memex program we have already successfully applied this architecture to multiple application domains, including the enormous international problem of human trafficking, where we extracted, aligned and linked data from 50 million online Web pages. DIG builds on our Karma data integration toolkit, which makes it easy to rapidly integrate structured data from a variety of sources, including databases, spreadsheets, XML, JSON, and Web services. The ability to integrate Web services allows Karma to pull in live data from the various social media sites, such as Twitter, Instagram, and OpenStreetMaps. DIG then indexes the integrated data and provides an easy to use interface for query, visualization, and analysis.

    • Design and implementation of a twin-family database for behavior genetics and genomics studies.

      PubMed

      Boomsma, Dorret I; Willemsen, Gonneke; Vink, Jacqueline M; Bartels, Meike; Groot, Paul; Hottenga, Jouke Jan; van Beijsterveldt, C E M Toos; Stroet, Therese; van Dijk, Rob; Wertheim, Rien; Visser, Marco; van der Kleij, Frank

      2008-06-01

      In this article we describe the design and implementation of a database for extended twin families. The database does not focus on probands or on index twins, as this approach becomes problematic when larger multigenerational families are included, when more than one set of multiples is present within a family, or when families turn out to be part of a larger pedigree. Instead, we present an alternative approach that uses a highly flexible notion of persons and relations. The relations among the subjects in the database have a one-to-many structure, are user-definable and extendible and support arbitrarily complicated pedigrees. Some additional characteristics of the database are highlighted, such as the storage of historical data, predefined expressions for advanced queries, output facilities for individuals and relations among individuals and an easy-to-use multi-step wizard for contacting participants. This solution presents a flexible approach to accommodate pedigrees of arbitrary size, multiple biological and nonbiological relationships among participants and dynamic changes in these relations that occur over time, which can be implemented for any type of multigenerational family study.

    • Database integration of protocol-specific neurological imaging datasets

      PubMed Central

      Pacurar, Emil E.; Sethi, Sean K.; Habib, Charbel; Laze, Marius O.; Martis-Laze, Rachel; Haacke, E. Mark

      2016-01-01

      For many years now, Magnetic Resonance Innovations (MR Innovations), a magnetic resonance imaging (MRI) software development, technology, and research company, has been aggregating a multitude of MRI data from different scanning sites through its collaborations and research contracts. The majority of the data has adhered to neuroimaging protocols developed by our group which has helped ensure its quality and consistency. The protocols involved include the study of: traumatic brain injury, extracranial venous imaging for multiple sclerosis and Parkinson's disease, and stroke. The database has proven invaluable in helping to establish disease biomarkers, validate findings across multiple data sets, develop and refine signal processing algorithms, and establish both public and private research collaborations. Myriad Masters and PhD dissertations have been possible thanks to the availability of this database. As an example of a project that cuts across diseases, we have used the data and specialized software to develop new guidelines for detecting cerebral microbleeds. Ultimately, the database has been vital in our ability to provide tools and information for researchers and radiologists in diagnosing their patients, and we encourage collaborations and welcome sharing of similar data in this database. PMID:25959660

    • Effectiveness of virtual reality training for balance and gait rehabilitation in people with multiple sclerosis: a systematic review and meta-analysis.

      PubMed

      Casuso-Holgado, María Jesús; Martín-Valero, Rocío; Carazo, Ana F; Medrano-Sánchez, Esther M; Cortés-Vega, M Dolores; Montero-Bancalero, Francisco José

      2018-04-01

      To evaluate the evidence for the use of virtual reality to treat balance and gait impairments in multiple sclerosis rehabilitation. Systematic review and meta-analysis of randomized controlled trials and quasi-randomized clinical trials. An electronic search was conducted using the following databases: MEDLINE (PubMed), Physiotherapy Evidence Database (PEDro), Cochrane Database of Systematic Reviews (CDSR) and (CINHAL). A quality assessment was performed using the PEDro scale. The data were pooled and a meta-analysis was completed. This systematic review was conducted in accordance with the (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) PRISMA guideline statement. It was registered in the PROSPERO database (CRD42016049360). A total of 11 studies were included. The data were pooled, allowing meta-analysis of seven outcomes of interest. A total of 466 participants clinically diagnosed with multiple sclerosis were analysed. Results showed that virtual reality balance training is more effective than no intervention for postural control improvement (standard mean difference (SMD) = -0.64; 95% confidence interval (CI) = -1.05, -0.24; P = 0.002). However, significant overall effect was not showed when compared with conventional training (SMD = -0.04; 95% CI = -0.70, 0.62; P = 0.90). Inconclusive results were also observed for gait rehabilitation. Virtual reality training could be considered at least as effective as conventional training and more effective than no intervention to treat balance and gait impairments in multiple sclerosis rehabilitation.

    • Creating a FIESTA (Framework for Integrated Earth Science and Technology Applications) with MagIC

      NASA Astrophysics Data System (ADS)

      Minnett, R.; Koppers, A. A. P.; Jarboe, N.; Tauxe, L.; Constable, C.

      2017-12-01

      The Magnetics Information Consortium (https://earthref.org/MagIC) has recently developed a containerized web application to considerably reduce the friction in contributing, exploring and combining valuable and complex datasets for the paleo-, geo- and rock magnetic scientific community. The data produced in this scientific domain are inherently hierarchical and the communities evolving approaches to this scientific workflow, from sampling to taking measurements to multiple levels of interpretations, require a large and flexible data model to adequately annotate the results and ensure reproducibility. Historically, contributing such detail in a consistent format has been prohibitively time consuming and often resulted in only publishing the highly derived interpretations. The new open-source (https://github.com/earthref/MagIC) application provides a flexible upload tool integrated with the data model to easily create a validated contribution and a powerful search interface for discovering datasets and combining them to enable transformative science. MagIC is hosted at EarthRef.org along with several interdisciplinary geoscience databases. A FIESTA (Framework for Integrated Earth Science and Technology Applications) is being created by generalizing MagIC's web application for reuse in other domains. The application relies on a single configuration document that describes the routing, data model, component settings and external services integrations. The container hosts an isomorphic Meteor JavaScript application, MongoDB database and ElasticSearch search engine. Multiple containers can be configured as microservices to serve portions of the application or rely on externally hosted MongoDB, ElasticSearch, or third-party services to efficiently scale computational demands. FIESTA is particularly well suited for many Earth Science disciplines with its flexible data model, mapping, account management, upload tool to private workspaces, reference metadata, image galleries, full text searches and detailed filters. EarthRef's Seamount Catalog of bathymetry and morphology data, EarthRef's Geochemical Earth Reference Model (GERM) databases, and Oregon State University's Marine and Geology Repository (http://osu-mgr.org) will benefit from custom adaptations of FIESTA.

    • PIRIA: a general tool for indexing, search, and retrieval of multimedia content

      NASA Astrophysics Data System (ADS)

      Joint, Magali; Moellic, Pierre-Alain; Hede, P.; Adam, P.

      2004-05-01

      The Internet is a continuously expanding source of multimedia content and information. There are many products in development to search, retrieve, and understand multimedia content. But most of the current image search/retrieval engines, rely on a image database manually pre-indexed with keywords. Computers are still powerless to understand the semantic meaning of still or animated image content. Piria (Program for the Indexing and Research of Images by Affinity), the search engine we have developed brings this possibility closer to reality. Piria is a novel search engine that uses the query by example method. A user query is submitted to the system, which then returns a list of images ranked by similarity, obtained by a metric distance that operates on every indexed image signature. These indexed images are compared according to several different classifiers, not only Keywords, but also Form, Color and Texture, taking into account geometric transformations and variance like rotation, symmetry, mirroring, etc. Form - Edges extracted by an efficient segmentation algorithm. Color - Histogram, semantic color segmentation and spatial color relationship. Texture - Texture wavelets and local edge patterns. If required, Piria is also able to fuse results from multiple classifiers with a new classification of index categories: Single Indexer Single Call (SISC), Single Indexer Multiple Call (SIMC), Multiple Indexers Single Call (MISC) or Multiple Indexers Multiple Call (MIMC). Commercial and industrial applications will be explored and discussed as well as current and future development.

    • How can the research potential of the clinical quality databases be maximized? The Danish experience.

      PubMed

      Nørgaard, M; Johnsen, S P

      2016-02-01

      In Denmark, the need for monitoring of clinical quality and patient safety with feedback to the clinical, administrative and political systems has resulted in the establishment of a network of more than 60 publicly financed nationwide clinical quality databases. Although primarily devoted to monitoring and improving quality of care, the potential of these databases as data sources in clinical research is increasingly being recognized. In this review, we describe these databases focusing on their use as data sources for clinical research, including their strengths and weaknesses as well as future concerns and opportunities. The research potential of the clinical quality databases is substantial but has so far only been explored to a limited extent. Efforts related to technical, legal and financial challenges are needed in order to take full advantage of this potential. © 2016 The Association for the Publication of the Journal of Internal Medicine.

    • Performance Evaluation of a Database System in a Multiple Backend Configurations,

      DTIC Science & Technology

      1984-10-01

      leaving a systemn process , the * internal performance measuremnents of MMSD have been carried out. Mathodo lo.- gies for constructing test databases...access d i rectory data via the AT, EDIT, and CDT. In designing the test database, one of the key concepts is the choice of the directory attributes in...internal timing. These requests are selected since they retrieve the seIaI lest portion of the test database and the processing time for each request is

    • Multiple Household Water Sources and Their Use in Remote Communities With Evidence From Pacific Island Countries

      NASA Astrophysics Data System (ADS)

      Elliott, Mark; MacDonald, Morgan C.; Chan, Terence; Kearton, Annika; Shields, Katherine F.; Bartram, Jamie K.; Hadwen, Wade L.

      2017-11-01

      Global water research and monitoring typically focus on the household's "main source of drinking-water." Use of multiple water sources to meet daily household needs has been noted in many developing countries but rarely quantified or reported in detail. We gathered self-reported data using a cross-sectional survey of 405 households in eight communities of the Republic of the Marshall Islands (RMI) and five Solomon Islands (SI) communities. Over 90% of households used multiple sources, with differences in sources and uses between wet and dry seasons. Most RMI households had large rainwater tanks and rationed stored rainwater for drinking throughout the dry season, whereas most SI households collected rainwater in small pots, precluding storage across seasons. Use of a source for cooking was strongly positively correlated with use for drinking, whereas use for cooking was negatively correlated or uncorrelated with nonconsumptive uses (e.g., bathing). Dry season water uses implied greater risk of water-borne disease, with fewer (frequently zero) handwashing sources reported and more unimproved sources consumed. Use of multiple sources is fundamental to household water management and feasible to monitor using electronic survey tools. We contend that recognizing multiple water sources can greatly improve understanding of household-level and community-level climate change resilience, that use of multiple sources confounds health impact studies of water interventions, and that incorporating multiple sources into water supply interventions can yield heretofore-unrealized benefits. We propose that failure to consider multiple sources undermines the design and effectiveness of global water monitoring, data interpretation, implementation, policy, and research.

    • ReNE: A Cytoscape Plugin for Regulatory Network Enhancement

      PubMed Central

      Politano, Gianfranco; Benso, Alfredo; Savino, Alessandro; Di Carlo, Stefano

      2014-01-01

      One of the biggest challenges in the study of biological regulatory mechanisms is the integration, americanmodeling, and analysis of the complex interactions which take place in biological networks. Despite post transcriptional regulatory elements (i.e., miRNAs) are widely investigated in current research, their usage and visualization in biological networks is very limited. Regulatory networks are commonly limited to gene entities. To integrate networks with post transcriptional regulatory data, researchers are therefore forced to manually resort to specific third party databases. In this context, we introduce ReNE, a Cytoscape 3.x plugin designed to automatically enrich a standard gene-based regulatory network with more detailed transcriptional, post transcriptional, and translational data, resulting in an enhanced network that more precisely models the actual biological regulatory mechanisms. ReNE can automatically import a network layout from the Reactome or KEGG repositories, or work with custom pathways described using a standard OWL/XML data format that the Cytoscape import procedure accepts. Moreover, ReNE allows researchers to merge multiple pathways coming from different sources. The merged network structure is normalized to guarantee a consistent and uniform description of the network nodes and edges and to enrich all integrated data with additional annotations retrieved from genome-wide databases like NCBI, thus producing a pathway fully manageable through the Cytoscape environment. The normalized network is then analyzed to include missing transcription factors, miRNAs, and proteins. The resulting enhanced network is still a fully functional Cytoscape network where each regulatory element (transcription factor, miRNA, gene, protein) and regulatory mechanism (up-regulation/down-regulation) is clearly visually identifiable, thus enabling a better visual understanding of its role and the effect in the network behavior. The enhanced network produced by ReNE is exportable in multiple formats for further analysis via third party applications. ReNE can be freely installed from the Cytoscape App Store (http://apps.cytoscape.org/apps/rene) and the full source code is freely available for download through a SVN repository accessible at http://www.sysbio.polito.it/tools_svn/BioInformatics/Rene/releases/. ReNE enhances a network by only integrating data from public repositories, without any inference or prediction. The reliability of the introduced interactions only depends on the reliability of the source data, which is out of control of ReNe developers. PMID:25541727

    • Awareness, perception and barriers to seeking information from online academic databases and medical journals as sources of information.

      PubMed

      Wong, Li Ping; Mohamad Shakir, Sharina Mahavera; Tong, Wen Ting; Alias, Haridah; Aghamohammadi, Nasrin; Arumugam, Kulenthran

      2017-10-16

      Medical students' use of online medical journals as a source of information is crucial in the learning pathway to become medical doctors. We conducted a cross-sectional survey study among University medical students between December 2012 and March 2013 to assess their awareness, perceived usefulness, practices, and barriers to seeking information from online academic databases and medical journals. The response rate was 67.53%. The majority of the students knew of the availability of online academic databases and medical journals. The mean score for awareness (4.25 of possible 11.0), perceived usefulness (13.95 of possible 33.0), and practice (10.67 of possible 33.0) were low. The mean barrier score toward using online academic databases and medical journals was 25.41 (of possible 45.0). Multivariate findings showed that significant barriers associated with overall usage of online databases and medical journals were 1) not knowing where or how to locate databases and 2) unsureness of using the Boolean operators. Availability of full text subscriptions was found to be an important factor in using online databases. Study findings highlighted the need to increase awareness of academic databases' availability and increase training on ways to search online academic databases and medical journals.

Top