Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as "look-up-tables" of existing data, and most often did no...
Benigni, Romualdo; Bossa, Cecilia; Richard, Ann M; Yang, Chihae
2008-01-01
Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as "look-up-tables" of existing data, and most often did not contain chemical structures. Concepts and technologies originated from the structure-activity relationships science have provided powerful tools to create new types of databases, where the effective linkage of chemical toxicity with chemical structure can facilitate and greatly enhance data gathering and hypothesis generation, by permitting: a) exploration across both chemical and biological domains; and b) structure-searchability through the data. This paper reviews the main public databases, together with the progress in the field of chemical relational databases, and presents the ISSCAN database on experimental chemical carcinogens.
Toward unification of taxonomy databases in a distributed computer environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kitakami, Hajime; Tateno, Yoshio; Gojobori, Takashi
1994-12-31
All the taxonomy databases constructed with the DNA databases of the international DNA data banks are powerful electronic dictionaries which aid in biological research by computer. The taxonomy databases are, however not consistently unified with a relational format. If we can achieve consistent unification of the taxonomy databases, it will be useful in comparing many research results, and investigating future research directions from existent research results. In particular, it will be useful in comparing relationships between phylogenetic trees inferred from molecular data and those constructed from morphological data. The goal of the present study is to unify the existent taxonomymore » databases and eliminate inconsistencies (errors) that are present in them. Inconsistencies occur particularly in the restructuring of the existent taxonomy databases, since classification rules for constructing the taxonomy have rapidly changed with biological advancements. A repair system is needed to remove inconsistencies in each data bank and mismatches among data banks. This paper describes a new methodology for removing both inconsistencies and mismatches from the databases on a distributed computer environment. The methodology is implemented in a relational database management system, SYBASE.« less
A Novel Approach: Chemical Relational Databases, and the ...
Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as
Monitoring of services with non-relational databases and map-reduce framework
NASA Astrophysics Data System (ADS)
Babik, M.; Souto, F.
2012-12-01
Service Availability Monitoring (SAM) is a well-established monitoring framework that performs regular measurements of the core site services and reports the corresponding availability and reliability of the Worldwide LHC Computing Grid (WLCG) infrastructure. One of the existing extensions of SAM is Site Wide Area Testing (SWAT), which gathers monitoring information from the worker nodes via instrumented jobs. This generates quite a lot of monitoring data to process, as there are several data points for every job and several million jobs are executed every day. The recent uptake of non-relational databases opens a new paradigm in the large-scale storage and distributed processing of systems with heavy read-write workloads. For SAM this brings new possibilities to improve its model, from performing aggregation of measurements to storing raw data and subsequent re-processing. Both SAM and SWAT are currently tuned to run at top performance, reaching some of the limits in storage and processing power of their existing Oracle relational database. We investigated the usability and performance of non-relational storage together with its distributed data processing capabilities. For this, several popular systems have been compared. In this contribution we describe our investigation of the existing non-relational databases suited for monitoring systems covering Cassandra, HBase and MongoDB. Further, we present our experiences in data modeling and prototyping map-reduce algorithms focusing on the extension of the already existing availability and reliability computations. Finally, possible future directions in this area are discussed, analyzing the current deficiencies of the existing Grid monitoring systems and proposing solutions to leverage the benefits of the non-relational databases to get more scalable and flexible frameworks.
The relational clinical database: a possible solution to the star wars in registry systems.
Michels, D K; Zamieroski, M
1990-12-01
In summary, having data from other service areas available in a relational clinical database could resolve many of the problems existing in today's registry systems. Uniting sophisticated information systems into a centralized database system could definitely be a corporate asset in managing the bottom line.
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2018-01-01
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form. PMID:29608174
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2018-03-19
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.
Relational Databases and Biomedical Big Data.
de Silva, N H Nisansa D
2017-01-01
In various biomedical applications that collect, handle, and manipulate data, the amounts of data tend to build up and venture into the range identified as bigdata. In such occurrences, a design decision has to be taken as to what type of database would be used to handle this data. More often than not, the default and classical solution to this in the biomedical domain according to past research is relational databases. While this used to be the norm for a long while, it is evident that there is a trend to move away from relational databases in favor of other types and paradigms of databases. However, it still has paramount importance to understand the interrelation that exists between biomedical big data and relational databases. This chapter will review the pros and cons of using relational databases to store biomedical big data that previous researches have discussed and used.
NASA Technical Reports Server (NTRS)
Finley, Gail T.
1988-01-01
This report covers the study of the relational database implementation in the NASCAD computer program system. The existing system is used primarily for computer aided design. Attention is also directed to a hidden-surface algorithm for final drawing output.
Flexible network reconstruction from relational databases with Cytoscape and CytoSQL
2010-01-01
Background Molecular interaction networks can be efficiently studied using network visualization software such as Cytoscape. The relevant nodes, edges and their attributes can be imported in Cytoscape in various file formats, or directly from external databases through specialized third party plugins. However, molecular data are often stored in relational databases with their own specific structure, for which dedicated plugins do not exist. Therefore, a more generic solution is presented. Results A new Cytoscape plugin 'CytoSQL' is developed to connect Cytoscape to any relational database. It allows to launch SQL ('Structured Query Language') queries from within Cytoscape, with the option to inject node or edge features of an existing network as SQL arguments, and to convert the retrieved data to Cytoscape network components. Supported by a set of case studies we demonstrate the flexibility and the power of the CytoSQL plugin in converting specific data subsets into meaningful network representations. Conclusions CytoSQL offers a unified approach to let Cytoscape interact with relational databases. Thanks to the power of the SQL syntax, this tool can rapidly generate and enrich networks according to very complex criteria. The plugin is available at http://www.ptools.ua.ac.be/CytoSQL. PMID:20594316
Flexible network reconstruction from relational databases with Cytoscape and CytoSQL.
Laukens, Kris; Hollunder, Jens; Dang, Thanh Hai; De Jaeger, Geert; Kuiper, Martin; Witters, Erwin; Verschoren, Alain; Van Leemput, Koenraad
2010-07-01
Molecular interaction networks can be efficiently studied using network visualization software such as Cytoscape. The relevant nodes, edges and their attributes can be imported in Cytoscape in various file formats, or directly from external databases through specialized third party plugins. However, molecular data are often stored in relational databases with their own specific structure, for which dedicated plugins do not exist. Therefore, a more generic solution is presented. A new Cytoscape plugin 'CytoSQL' is developed to connect Cytoscape to any relational database. It allows to launch SQL ('Structured Query Language') queries from within Cytoscape, with the option to inject node or edge features of an existing network as SQL arguments, and to convert the retrieved data to Cytoscape network components. Supported by a set of case studies we demonstrate the flexibility and the power of the CytoSQL plugin in converting specific data subsets into meaningful network representations. CytoSQL offers a unified approach to let Cytoscape interact with relational databases. Thanks to the power of the SQL syntax, this tool can rapidly generate and enrich networks according to very complex criteria. The plugin is available at http://www.ptools.ua.ac.be/CytoSQL.
NASA Astrophysics Data System (ADS)
Maffei, A. R.; Chandler, C. L.; Work, T.; Allen, J.; Groman, R. C.; Fox, P. A.
2009-12-01
Content Management Systems (CMSs) provide powerful features that can be of use to oceanographic (and other geo-science) data managers. However, in many instances, geo-science data management offices have previously designed customized schemas for their metadata. The WHOI Ocean Informatics initiative and the NSF funded Biological Chemical and Biological Data Management Office (BCO-DMO) have jointly sponsored a project to port an existing, relational database containing oceanographic metadata, along with an existing interface coded in Cold Fusion middleware, to a Drupal6 Content Management System. The goal was to translate all the existing database tables, input forms, website reports, and other features present in the existing system to employ Drupal CMS features. The replacement features include Drupal content types, CCK node-reference fields, themes, RDB, SPARQL, workflow, and a number of other supporting modules. Strategic use of some Drupal6 CMS features enables three separate but complementary interfaces that provide access to oceanographic research metadata via the MySQL database: 1) a Drupal6-powered front-end; 2) a standard SQL port (used to provide a Mapserver interface to the metadata and data; and 3) a SPARQL port (feeding a new faceted search capability being developed). Future plans include the creation of science ontologies, by scientist/technologist teams, that will drive semantically-enabled faceted search capabilities planned for the site. Incorporation of semantic technologies included in the future Drupal 7 core release is also anticipated. Using a public domain CMS as opposed to proprietary middleware, and taking advantage of the many features of Drupal 6 that are designed to support semantically-enabled interfaces will help prepare the BCO-DMO database for interoperability with other ecosystem databases.
Wollbrett, Julien; Larmande, Pierre; de Lamotte, Frédéric; Ruiz, Manuel
2013-04-15
In recent years, a large amount of "-omics" data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic.
2013-01-01
Background In recent years, a large amount of “-omics” data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. Results We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. Conclusions BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic. PMID:23586394
Schuemie, Martijn J; Mons, Barend; Weeber, Marc; Kors, Jan A
2007-06-01
Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.
Migration from relational to NoSQL database
NASA Astrophysics Data System (ADS)
Ghotiya, Sunita; Mandal, Juhi; Kandasamy, Saravanakumar
2017-11-01
Data generated by various real time applications, social networking sites and sensor devices is of very huge amount and unstructured, which makes it difficult for Relational database management systems to handle the data. Data is very precious component of any application and needs to be analysed after arranging it in some structure. Relational databases are only able to deal with structured data, so there is need of NoSQL Database management System which can deal with semi -structured data also. Relational database provides the easiest way to manage the data but as the use of NoSQL is increasing it is becoming necessary to migrate the data from Relational to NoSQL databases. Various frameworks has been proposed previously which provides mechanisms for migration of data stored at warehouses in SQL, middle layer solutions which can provide facility of data to be stored in NoSQL databases to handle data which is not structured. This paper provides a literature review of some of the recent approaches proposed by various researchers to migrate data from relational to NoSQL databases. Some researchers proposed mechanisms for the co-existence of NoSQL and Relational databases together. This paper provides a summary of mechanisms which can be used for mapping data stored in Relational databases to NoSQL databases. Various techniques for data transformation and middle layer solutions are summarised in the paper.
A structural informatics approach to mine kinase knowledge bases.
Brooijmans, Natasja; Mobilio, Dominick; Walker, Gary; Nilakantan, Ramaswamy; Denny, Rajiah A; Feyfant, Eric; Diller, David; Bikker, Jack; Humblet, Christine
2010-03-01
In this paper, we describe a combination of structural informatics approaches developed to mine data extracted from existing structure knowledge bases (Protein Data Bank and the GVK database) with a focus on kinase ATP-binding site data. In contrast to existing systems that retrieve and analyze protein structures, our techniques are centered on a database of ligand-bound geometries in relation to residues lining the binding site and transparent access to ligand-based SAR data. We illustrate the systems in the context of the Abelson kinase and related inhibitor structures. 2009 Elsevier Ltd. All rights reserved.
Khan, Aihab; Husain, Syed Afaq
2013-01-01
We put forward a fragile zero watermarking scheme to detect and characterize malicious modifications made to a database relation. Most of the existing watermarking schemes for relational databases introduce intentional errors or permanent distortions as marks into the database original content. These distortions inevitably degrade the data quality and data usability as the integrity of a relational database is violated. Moreover, these fragile schemes can detect malicious data modifications but do not characterize the tempering attack, that is, the nature of tempering. The proposed fragile scheme is based on zero watermarking approach to detect malicious modifications made to a database relation. In zero watermarking, the watermark is generated (constructed) from the contents of the original data rather than introduction of permanent distortions as marks into the data. As a result, the proposed scheme is distortion-free; thus, it also resolves the inherent conflict between security and imperceptibility. The proposed scheme also characterizes the malicious data modifications to quantify the nature of tempering attacks. Experimental results show that even minor malicious modifications made to a database relation can be detected and characterized successfully.
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.
2003-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.
A web based relational database management system for filariasis control
Murty, Upadhyayula Suryanarayana; Kumar, Duvvuri Venkata Rama Satya; Sriram, Kumaraswamy; Rao, Kadiri Madhusudhan; Bhattacharyulu, Chakravarthula Hayageeva Narasimha Venakata; Praveen, Bhoopathi; Krishna, Amirapu Radha
2005-01-01
The present study describes a RDBMS (relational database management system) for the effective management of Filariasis, a vector borne disease. Filariasis infects 120 million people from 83 countries. The possible re-emergence of the disease and the complexity of existing control programs warrant the development of new strategies. A database containing comprehensive data associated with filariasis finds utility in disease control. We have developed a database containing information on the socio-economic status of patients, mosquito collection procedures, mosquito dissection data, filariasis survey report and mass blood data. The database can be searched using a user friendly web interface. Availability http://www.webfil.org (login and password can be obtained from the authors) PMID:17597846
Levy, C.; Beauchamp, C.
1996-01-01
This poster describes the methods used and working prototype that was developed from an abstraction of the relational model from the VA's hierarchical DHCP database. Overlaying the relational model on DHCP permits multiple user views of the physical data structure, enhances access to the database by providing a link to commercial (SQL based) software, and supports a conceptual managed care data model based on primary and longitudinal patient care. The goal of this work was to create a relational abstraction of the existing hierarchical database; to construct, using SQL data definition language, user views of the database which reflect the clinical conceptual view of DHCP, and to allow the user to work directly with the logical view of the data using GUI based commercial software of their choosing. The workstation is intended to serve as a platform from which a managed care information model could be implemented and evaluated.
BioWarehouse: a bioinformatics database warehouse toolkit
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D
2006-01-01
Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315
BioWarehouse: a bioinformatics database warehouse toolkit.
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D
2006-03-23
This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.
2003-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semistructured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.
An Extensible Schema-less Database Framework for Managing High-throughput Semi-Structured Documents
NASA Technical Reports Server (NTRS)
Maluf, David A.; Tran, Peter B.; La, Tracy; Clancy, Daniel (Technical Monitor)
2002-01-01
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword searches of records for both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high throughput open database framework for managing, storing, and searching unstructured or semi structured arbitrary hierarchal models, XML and HTML.
Reliability database development for use with an object-oriented fault tree evaluation program
NASA Technical Reports Server (NTRS)
Heger, A. Sharif; Harringtton, Robert J.; Koen, Billy V.; Patterson-Hine, F. Ann
1989-01-01
A description is given of the development of a fault-tree analysis method using object-oriented programming. In addition, the authors discuss the programs that have been developed or are under development to connect a fault-tree analysis routine to a reliability database. To assess the performance of the routines, a relational database simulating one of the nuclear power industry databases has been constructed. For a realistic assessment of the results of this project, the use of one of existing nuclear power reliability databases is planned.
Consulting report on the NASA technology utilization network system
NASA Technical Reports Server (NTRS)
Hlava, Marjorie M. K.
1992-01-01
The purposes of this consulting effort are: (1) to evaluate the existing management and production procedures and workflow as they each relate to the successful development, utilization, and implementation of the NASA Technology Utilization Network System (TUNS) database; (2) to identify, as requested by the NASA Project Monitor, the strengths, weaknesses, areas of bottlenecking, and previously unaddressed problem areas affecting TUNS; (3) to recommend changes or modifications of existing procedures as necessary in order to effect corrections for the overall benefit of NASA TUNS database production, implementation, and utilization; and (4) to recommend the addition of alternative procedures, routines, and activities that will consolidate and facilitate the production, implementation, and utilization of the NASA TUNS database.
Constructing a Geology Ontology Using a Relational Database
NASA Astrophysics Data System (ADS)
Hou, W.; Yang, L.; Yin, S.; Ye, J.; Clarke, K.
2013-12-01
In geology community, the creation of a common geology ontology has become a useful means to solve problems of data integration, knowledge transformation and the interoperation of multi-source, heterogeneous and multiple scale geological data. Currently, human-computer interaction methods and relational database-based methods are the primary ontology construction methods. Some human-computer interaction methods such as the Geo-rule based method, the ontology life cycle method and the module design method have been proposed for applied geological ontologies. Essentially, the relational database-based method is a reverse engineering of abstracted semantic information from an existing database. The key is to construct rules for the transformation of database entities into the ontology. Relative to the human-computer interaction method, relational database-based methods can use existing resources and the stated semantic relationships among geological entities. However, two problems challenge the development and application. One is the transformation of multiple inheritances and nested relationships and their representation in an ontology. The other is that most of these methods do not measure the semantic retention of the transformation process. In this study, we focused on constructing a rule set to convert the semantics in a geological database into a geological ontology. According to the relational schema of a geological database, a conversion approach is presented to convert a geological spatial database to an OWL-based geological ontology, which is based on identifying semantics such as entities, relationships, inheritance relationships, nested relationships and cluster relationships. The semantic integrity of the transformation was verified using an inverse mapping process. In a geological ontology, an inheritance and union operations between superclass and subclass were used to present the nested relationship in a geochronology and the multiple inheritances relationship. Based on a Quaternary database of downtown of Foshan city, Guangdong Province, in Southern China, a geological ontology was constructed using the proposed method. To measure the maintenance of semantics in the conversation process and the results, an inverse mapping from the ontology to a relational database was tested based on a proposed conversation rule. The comparison of schema and entities and the reduction of tables between the inverse database and the original database illustrated that the proposed method retains the semantic information well during the conversation process. An application for abstracting sandstone information showed that semantic relationships among concepts in the geological database were successfully reorganized in the constructed ontology. Key words: geological ontology; geological spatial database; multiple inheritance; OWL Acknowledgement: This research is jointly funded by the Specialized Research Fund for the Doctoral Program of Higher Education of China (RFDP) (20100171120001), NSFC (41102207) and the Fundamental Research Funds for the Central Universities (12lgpy19).
Performance-Oriented Privacy-Preserving Data Integration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pon, R K; Critchlow, T
2004-09-15
Current solutions to integrating private data with public data have provided useful privacy metrics, such as relative information gain, that can be used to evaluate alternative approaches. Unfortunately, they have not addressed critical performance issues, especially when the public database is very large. The use of hashes and noise yields better performance than existing techniques while still making it difficult for unauthorized entities to distinguish which data items truly exist in the private database. As we show here, leveraging the uncertainty introduced by collisions caused by hashing and the injection of noise, we present a technique for performing a relationalmore » join operation between a massive public table and a relatively smaller private one.« less
Computerization of the Arkansas Fishes Database
Henry W. Robison; L. Gayle Henderson; Melvin L. Warren; Janet S. Rader
2004-01-01
Abstract - Until recently, distributional data for the fishes of Arkansas existed in the form of museum records, field notebooks of various ichthyologists, and published fish survey data; none of which was in a digital format. In 1995, a relational database system was used to design a PC platform data entry module for the capture of information on...
Quantify spatial relations to discover handwritten graphical symbols
NASA Astrophysics Data System (ADS)
Li, Jinpeng; Mouchère, Harold; Viard-Gaudin, Christian
2012-01-01
To model a handwritten graphical language, spatial relations describe how the strokes are positioned in the 2-dimensional space. Most of existing handwriting recognition systems make use of some predefined spatial relations. However, considering a complex graphical language, it is hard to express manually all the spatial relations. Another possibility would be to use a clustering technique to discover the spatial relations. In this paper, we discuss how to create a relational graph between strokes (nodes) labeled with graphemes in a graphical language. Then we vectorize spatial relations (edges) for clustering and quantization. As the targeted application, we extract the repetitive sub-graphs (graphical symbols) composed of graphemes and learned spatial relations. On two handwriting databases, a simple mathematical expression database and a complex flowchart database, the unsupervised spatial relations outperform the predefined spatial relations. In addition, we visualize the frequent patterns on two text-lines containing Chinese characters.
Solutions for medical databases optimal exploitation.
Branescu, I; Purcarea, V L; Dobrescu, R
2014-03-15
The paper discusses the methods to apply OLAP techniques for multidimensional databases that leverage the existing, performance-enhancing technique, known as practical pre-aggregation, by making this technique relevant to a much wider range of medical applications, as a logistic support to the data warehousing techniques. The transformations have practically low computational complexity and they may be implemented using standard relational database technology. The paper also describes how to integrate the transformed hierarchies in current OLAP systems, transparently to the user and proposes a flexible, "multimodel" federated system for extending OLAP querying to external object databases.
The Danish Testicular Cancer database.
Daugaard, Gedske; Kier, Maria Gry Gundgaard; Bandak, Mikkel; Mortensen, Mette Saksø; Larsson, Heidi; Søgaard, Mette; Toft, Birgitte Groenkaer; Engvad, Birte; Agerbæk, Mads; Holm, Niels Vilstrup; Lauritsen, Jakob
2016-01-01
The nationwide Danish Testicular Cancer database consists of a retrospective research database (DaTeCa database) and a prospective clinical database (Danish Multidisciplinary Cancer Group [DMCG] DaTeCa database). The aim is to improve the quality of care for patients with testicular cancer (TC) in Denmark, that is, by identifying risk factors for relapse, toxicity related to treatment, and focusing on late effects. All Danish male patients with a histologically verified germ cell cancer diagnosis in the Danish Pathology Registry are included in the DaTeCa databases. Data collection has been performed from 1984 to 2007 and from 2013 onward, respectively. The retrospective DaTeCa database contains detailed information with more than 300 variables related to histology, stage, treatment, relapses, pathology, tumor markers, kidney function, lung function, etc. A questionnaire related to late effects has been conducted, which includes questions regarding social relationships, life situation, general health status, family background, diseases, symptoms, use of medication, marital status, psychosocial issues, fertility, and sexuality. TC survivors alive on October 2014 were invited to fill in this questionnaire including 160 validated questions. Collection of questionnaires is still ongoing. A biobank including blood/sputum samples for future genetic analyses has been established. Both samples related to DaTeCa and DMCG DaTeCa database are included. The prospective DMCG DaTeCa database includes variables regarding histology, stage, prognostic group, and treatment. The DMCG DaTeCa database has existed since 2013 and is a young clinical database. It is necessary to extend the data collection in the prospective database in order to answer quality-related questions. Data from the retrospective database will be added to the prospective data. This will result in a large and very comprehensive database for future studies on TC patients.
A new relational database structure and online interface for the HITRAN database
NASA Astrophysics Data System (ADS)
Hill, Christian; Gordon, Iouli E.; Rothman, Laurence S.; Tennyson, Jonathan
2013-11-01
A new format for the HITRAN database is proposed. By storing the line-transition data in a number of linked tables described by a relational database schema, it is possible to overcome the limitations of the existing format, which have become increasingly apparent over the last few years as new and more varied data are being used by radiative-transfer models. Although the database in the new format can be searched using the well-established Structured Query Language (SQL), a web service, HITRANonline, has been deployed to allow users to make most common queries of the database using a graphical user interface in a web page. The advantages of the relational form of the database to ensuring data integrity and consistency are explored, and the compatibility of the online interface with the emerging standards of the Virtual Atomic and Molecular Data Centre (VAMDC) project is discussed. In particular, the ability to access HITRAN data using a standard query language from other websites, command line tools and from within computer programs is described.
AgeFactDB--the JenAge Ageing Factor Database--towards data integration in ageing research.
Hühne, Rolf; Thalheim, Torsten; Sühnel, Jürgen
2014-01-01
AgeFactDB (http://agefactdb.jenage.de) is a database aimed at the collection and integration of ageing phenotype data including lifespan information. Ageing factors are considered to be genes, chemical compounds or other factors such as dietary restriction, whose action results in a changed lifespan or another ageing phenotype. Any information related to the effects of ageing factors is called an observation and is presented on observation pages. To provide concise access to the complete information for a particular ageing factor, corresponding observations are also summarized on ageing factor pages. In a first step, ageing-related data were primarily taken from existing databases such as the Ageing Gene Database--GenAge, the Lifespan Observations Database and the Dietary Restriction Gene Database--GenDR. In addition, we have started to include new ageing-related information. Based on homology data taken from the HomoloGene Database, AgeFactDB also provides observation and ageing factor pages of genes that are homologous to known ageing-related genes. These homologues are considered as candidate or putative ageing-related genes. AgeFactDB offers a variety of search and browse options, and also allows the download of ageing factor or observation lists in TSV, CSV and XML formats.
Karst database development in Minnesota: Design and data assembly
Gao, Y.; Alexander, E.C.; Tipping, R.G.
2005-01-01
The Karst Feature Database (KFD) of Minnesota is a relational GIS-based Database Management System (DBMS). Previous karst feature datasets used inconsistent attributes to describe karst features in different areas of Minnesota. Existing metadata were modified and standardized to represent a comprehensive metadata for all the karst features in Minnesota. Microsoft Access 2000 and ArcView 3.2 were used to develop this working database. Existing county and sub-county karst feature datasets have been assembled into the KFD, which is capable of visualizing and analyzing the entire data set. By November 17 2002, 11,682 karst features were stored in the KFD of Minnesota. Data tables are stored in a Microsoft Access 2000 DBMS and linked to corresponding ArcView applications. The current KFD of Minnesota has been moved from a Windows NT server to a Windows 2000 Citrix server accessible to researchers and planners through networked interfaces. ?? Springer-Verlag 2005.
Frankewitsch, T; Prokosch, H U
2000-01-01
Knowledge in the environment of information technologies is bound to structured vocabularies. Medical data dictionaries are necessary for uniquely describing findings like diagnoses, procedures or functions. Therefore we decided to locally install a version of the Unified Medical Language System (UMLS) of the U.S. National Library of Medicine as a repository for defining entries of a medical multimedia database. Because of the requirement to extend the vocabulary in concepts and relations between existing concepts a graphical tool for appending new items to the database has been developed: Although the database is an instance of a semantic network the focus on single entries offers the opportunity of reducing the net to a tree within this detail. Based on the graph theorem, there are definitions of nodes of concepts and nodes of knowledge. The UMLS additionally offers the specification of sub-relations, which can be represented, too. Using this view it is possible to manage these 1:n-Relations in a simple tree view. On this background an explorer like graphical user interface has been realised to add new concepts and define new relationships between those and existing entries for adapting the UMLS for specific purposes such as describing medical multimedia objects.
Scale out databases for CERN use cases
NASA Astrophysics Data System (ADS)
Baranowski, Zbigniew; Grzybek, Maciej; Canali, Luca; Lanza Garcia, Daniel; Surdy, Kacper
2015-12-01
Data generation rates are expected to grow very fast for some database workloads going into LHC run 2 and beyond. In particular this is expected for data coming from controls, logging and monitoring systems. Storing, administering and accessing big data sets in a relational database system can quickly become a very hard technical challenge, as the size of the active data set and the number of concurrent users increase. Scale-out database technologies are a rapidly developing set of solutions for deploying and managing very large data warehouses on commodity hardware and with open source software. In this paper we will describe the architecture and tests on database systems based on Hadoop and the Cloudera Impala engine. We will discuss the results of our tests, including tests of data loading and integration with existing data sources and in particular with relational databases. We will report on query performance tests done with various data sets of interest at CERN, notably data from the accelerator log database.
Zhang, RuiJie; Li, Xia; Jiang, YongShuai; Liu, GuiYou; Li, ChuanXing; Zhang, Fan; Xiao, Yun; Gong, BinSheng
2009-02-01
High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes. In this study, first, we apply four kinds of haplotype identification methods (Confidence Intervals, Four Gamete Tests, Solid Spine of LD and fusing method of haplotype block) into high-throughout SNP genotype data to identify blocks, then use cluster analysis to verify the effectiveness of the four methods, and select the alcoholism-related SNP haplotypes through risk analysis. Second, we establish a mapping from haplotypes to alcoholism-related genes. Third, we inquire NCBI SNP and gene databases to locate the blocks and identify the candidate genes. In the end, we make gene function annotation by KEGG, Biocarta, and GO database. We find 159 haplotype blocks, which relate to the alcoholism most possibly on chromosome 1 approximately 22, including 227 haplotypes, of which 102 SNP haplotypes may increase the risk of alcoholism. We get 121 alcoholism-related genes and verify their reliability by the functional annotation of biology. In a word, we not only can handle the SNP data easily, but also can locate the disease-related genes precisely by combining our novel strategies of mining alcoholism-related haplotypes and genes with existing knowledge framework.
Solutions for medical databases optimal exploitation
Branescu, I; Purcarea, VL; Dobrescu, R
2014-01-01
The paper discusses the methods to apply OLAP techniques for multidimensional databases that leverage the existing, performance-enhancing technique, known as practical pre-aggregation, by making this technique relevant to a much wider range of medical applications, as a logistic support to the data warehousing techniques. The transformations have practically low computational complexity and they may be implemented using standard relational database technology. The paper also describes how to integrate the transformed hierarchies in current OLAP systems, transparently to the user and proposes a flexible, “multimodel" federated system for extending OLAP querying to external object databases. PMID:24653769
This funding opportunity announcement (FOA) encourages applications that propose to conduct secondary data analysis and integration of existing datasets and database resources, with the ultimate aim to elucidate the genetic architecture of cancer risk and related outcomes. The goal of this initiative is to address key scientific questions relevant to cancer epidemiology by supporting the analysis of existing genetic or genomic datasets, possibly in combination with environmental, outcomes, behavioral, lifestyle, and molecular profiles data.
This funding opportunity announcement (FOA) encourages applications that propose to conduct secondary data analysis and integration of existing datasets and database resources, with the ultimate aim to elucidate the genetic architecture of cancer risk and related outcomes. The goal of this initiative is to address key scientific questions relevant to cancer epidemiology by supporting the analysis of existing genetic or genomic datasets, possibly in combination with environmental, outcomes, behavioral, lifestyle, and molecular profiles data.
A literature search tool for intelligent extraction of disease-associated genes.
Jung, Jae-Yoon; DeLuca, Todd F; Nelson, Tristan H; Wall, Dennis P
2014-01-01
To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.
[A relational database to store Poison Centers calls].
Barelli, Alessandro; Biondi, Immacolata; Tafani, Chiara; Pellegrini, Aristide; Soave, Maurizio; Gaspari, Rita; Annetta, Maria Giuseppina
2006-01-01
Italian Poison Centers answer to approximately 100,000 calls per year. Potentially, this activity is a huge source of data for toxicovigilance and for syndromic surveillance. During the last decade, surveillance systems for early detection of outbreaks have drawn the attention of public health institutions due to the threat of terrorism and high-profile disease outbreaks. Poisoning surveillance needs the ongoing, systematic collection, analysis, interpretation, and dissemination of harmonised data about poisonings from all Poison Centers for use in public health action to reduce morbidity and mortality and to improve health. The entity-relationship model for a Poison Center relational database is extremely complex and not studied in detail. For this reason, not harmonised data collection happens among Italian Poison Centers. Entities are recognizable concepts, either concrete or abstract, such as patients and poisons, or events which have relevance to the database, such as calls. Connectivity and cardinality of relationships are complex as well. A one-to-many relationship exist between calls and patients: for one instance of entity calls, there are zero, one, or many instances of entity patients. At the same time, a one-to-many relationship exist between patients and poisons: for one instance of entity patients, there are zero, one, or many instances of entity poisons. This paper shows a relational model for a poison center database which allows the harmonised data collection of poison centers calls.
NASA Astrophysics Data System (ADS)
Ehlmann, Bryon K.
Current scientific experiments are often characterized by massive amounts of very complex data and the need for complex data analysis software. Object-oriented database (OODB) systems have the potential of improving the description of the structure and semantics of this data and of integrating the analysis software with the data. This dissertation results from research to enhance OODB functionality and methodology to support scientific databases (SDBs) and, more specifically, to support a nuclear physics experiments database for the Continuous Electron Beam Accelerator Facility (CEBAF). This research to date has identified a number of problems related to the practical application of OODB technology to the conceptual design of the CEBAF experiments database and other SDBs: the lack of a generally accepted OODB design methodology, the lack of a standard OODB model, the lack of a clear conceptual level in existing OODB models, and the limited support in existing OODB systems for many common object relationships inherent in SDBs. To address these problems, the dissertation describes an Object-Relationship Diagram (ORD) and an Object-oriented Database Definition Language (ODDL) that provide tools that allow SDB design and development to proceed systematically and independently of existing OODB systems. These tools define multi-level, conceptual data models for SDB design, which incorporate a simple notation for describing common types of relationships that occur in SDBs. ODDL allows these relationships and other desirable SDB capabilities to be supported by an extended OODB system. A conceptual model of the CEBAF experiments database is presented in terms of ORDs and the ODDL to demonstrate their functionality and use and provide a foundation for future development of experimental nuclear physics software using an OODB approach.
Construction of a Linux based chemical and biological information system.
Molnár, László; Vágó, István; Fehér, András
2003-01-01
A chemical and biological information system with a Web-based easy-to-use interface and corresponding databases has been developed. The constructed system incorporates all chemical, numerical and textual data related to the chemical compounds, including numerical biological screen results. Users can search the database by traditional textual/numerical and/or substructure or similarity queries through the web interface. To build our chemical database management system, we utilized existing IT components such as ORACLE or Tripos SYBYL for database management and Zope application server for the web interface. We chose Linux as the main platform, however, almost every component can be used under various operating systems.
NASA Astrophysics Data System (ADS)
Shao, Weber; Kupelian, Patrick A.; Wang, Jason; Low, Daniel A.; Ruan, Dan
2014-03-01
We devise a paradigm for representing the DICOM-RT structure sets in a database management system, in such way that secondary calculations of geometric information can be performed quickly from the existing contour definitions. The implementation of this paradigm is achieved using the PostgreSQL database system and the PostGIS extension, a geographic information system commonly used for encoding geographical map data. The proposed paradigm eliminates the overhead of retrieving large data records from the database, as well as the need to implement various numerical and data parsing routines, when additional information related to the geometry of the anatomy is desired.
A database for the analysis of immunity genes in Drosophila: PADMA database.
Lee, Mark J; Mondal, Ariful; Small, Chiyedza; Paddibhatla, Indira; Kawaguchi, Akira; Govind, Shubha
2011-01-01
While microarray experiments generate voluminous data, discerning trends that support an existing or alternative paradigm is challenging. To synergize hypothesis building and testing, we designed the Pathogen Associated Drosophila MicroArray (PADMA) database for easy retrieval and comparison of microarray results from immunity-related experiments (www.padmadatabase.org). PADMA also allows biologists to upload their microarray-results and compare it with datasets housed within PADMA. We tested PADMA using a preliminary dataset from Ganaspis xanthopoda-infected fly larvae, and uncovered unexpected trends in gene expression, reshaping our hypothesis. Thus, the PADMA database will be a useful resource to fly researchers to evaluate, revise, and refine hypotheses.
Materials engineering data base
NASA Technical Reports Server (NTRS)
1995-01-01
The various types of materials related data that exist at the NASA Marshall Space Flight Center and compiled into databases which could be accessed by all the NASA centers and by other contractors, are presented.
Initiation of a Database of CEUS Ground Motions for NGA East
NASA Astrophysics Data System (ADS)
Cramer, C. H.
2007-12-01
The Nuclear Regulatory Commission has funded the first stage of development of a database of central and eastern US (CEUS) broadband and accelerograph records, along the lines of the existing Next Generation Attenuation (NGA) database for active tectonic areas. This database will form the foundation of an NGA East project for the development of CEUS ground-motion prediction equations that include the effects of soils. This initial effort covers the development of a database design and the beginning of data collection to populate the database. It also includes some processing for important source parameters (Brune corner frequency and stress drop) and site parameters (kappa, Vs30). Besides collecting appropriate earthquake recordings and information, existing information about site conditions at recording sites will also be gathered, including geology and geotechnical information. The long-range goal of the database development is to complete the database and make it available in 2010. The database design is centered on CEUS ground motion information needs but is built on the Pacific Earthquake Engineering Research Center's (PEER) NGA experience. Documentation from the PEER NGA website was reviewed and relevant fields incorporated into the CEUS database design. CEUS database tables include ones for earthquake, station, component, record, and references. As was done for NGA, a CEUS ground- motion flat file of key information will be extracted from the CEUS database for use in attenuation relation development. A short report on the CEUS database and several initial design-definition files are available at https://umdrive.memphis.edu:443/xythoswfs/webui/_xy-7843974_docstore1. Comments and suggestions on the database design can be sent to the author. More details will be presented in a poster at the meeting.
Evolution of the use of relational and NoSQL databases in the ATLAS experiment
NASA Astrophysics Data System (ADS)
Barberis, D.
2016-09-01
The ATLAS experiment used for many years a large database infrastructure based on Oracle to store several different types of non-event data: time-dependent detector configuration and conditions data, calibrations and alignments, configurations of Grid sites, catalogues for data management tools, job records for distributed workload management tools, run and event metadata. The rapid development of "NoSQL" databases (structured storage services) in the last five years allowed an extended and complementary usage of traditional relational databases and new structured storage tools in order to improve the performance of existing applications and to extend their functionalities using the possibilities offered by the modern storage systems. The trend is towards using the best tool for each kind of data, separating for example the intrinsically relational metadata from payload storage, and records that are frequently updated and benefit from transactions from archived information. Access to all components has to be orchestrated by specialised services that run on front-end machines and shield the user from the complexity of data storage infrastructure. This paper describes this technology evolution in the ATLAS database infrastructure and presents a few examples of large database applications that benefit from it.
Development of a Dependency Theory Toolbox for Database Design.
1987-12-01
published algorithms and theorems , and hand simulating these algorithms can be a tedious and error prone chore. Additionally, since the process of...to design and study relational databases exists in the form of published algorithms and theorems . However, hand simulating these algorithms can be a...published algorithms and theorems . Hand simulating these algorithms can be a tedious and error prone chore. Therefore, a toolbox of algorithms and
An SQL query generator for CLIPS
NASA Technical Reports Server (NTRS)
Snyder, James; Chirica, Laurian
1990-01-01
As expert systems become more widely used, their access to large amounts of external information becomes increasingly important. This information exists in several forms such as statistical, tabular data, knowledge gained by experts and large databases of information maintained by companies. Because many expert systems, including CLIPS, do not provide access to this external information, much of the usefulness of expert systems is left untapped. The scope of this paper is to describe a database extension for the CLIPS expert system shell. The current industry standard database language is SQL. Due to SQL standardization, large amounts of information stored on various computers, potentially at different locations, will be more easily accessible. Expert systems should be able to directly access these existing databases rather than requiring information to be re-entered into the expert system environment. The ORACLE relational database management system (RDBMS) was used to provide a database connection within the CLIPS environment. To facilitate relational database access a query generation system was developed as a CLIPS user function. The queries are entered in a CLlPS-like syntax and are passed to the query generator, which constructs and submits for execution, an SQL query to the ORACLE RDBMS. The query results are asserted as CLIPS facts. The query generator was developed primarily for use within the ICADS project (Intelligent Computer Aided Design System) currently being developed by the CAD Research Unit in the California Polytechnic State University (Cal Poly). In ICADS, there are several parallel or distributed expert systems accessing a common knowledge base of facts. Expert system has a narrow domain of interest and therefore needs only certain portions of the information. The query generator provides a common method of accessing this information and allows the expert system to specify what data is needed without specifying how to retrieve it.
NASA Technical Reports Server (NTRS)
Shearrow, Charles A.
1999-01-01
One of the identified goals of EM3 is to implement virtual manufacturing by the time the year 2000 has ended. To realize this goal of a true virtual manufacturing enterprise the initial development of a machinability database and the infrastructure must be completed. This will consist of the containment of the existing EM-NET problems and developing machine, tooling, and common materials databases. To integrate the virtual manufacturing enterprise with normal day to day operations the development of a parallel virtual manufacturing machinability database, virtual manufacturing database, virtual manufacturing paradigm, implementation/integration procedure, and testable verification models must be constructed. Common and virtual machinability databases will include the four distinct areas of machine tools, available tooling, common machine tool loads, and a materials database. The machine tools database will include the machine envelope, special machine attachments, tooling capacity, location within NASA-JSC or with a contractor, and availability/scheduling. The tooling database will include available standard tooling, custom in-house tooling, tool properties, and availability. The common materials database will include materials thickness ranges, strengths, types, and their availability. The virtual manufacturing databases will consist of virtual machines and virtual tooling directly related to the common and machinability databases. The items to be completed are the design and construction of the machinability databases, virtual manufacturing paradigm for NASA-JSC, implementation timeline, VNC model of one bridge mill and troubleshoot existing software and hardware problems with EN4NET. The final step of this virtual manufacturing project will be to integrate other production sites into the databases bringing JSC's EM3 into a position of becoming a clearing house for NASA's digital manufacturing needs creating a true virtual manufacturing enterprise.
2013-01-01
Background Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Results Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes: • Support for multi-component compounds (mixtures) • Import and export of SD-files • Optional security (authorization) For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. Conclusions By using a simple web application it was shown that Molecule Database Framework successfully abstracts chemical structure searches and SD-File import and export to simple method calls. The framework offers good search performance on a standard laptop without any database tuning. This is also due to the fact that chemical structure searches are paged and cached. Molecule Database Framework is available for download on the projects web page on bitbucket: https://bitbucket.org/kienerj/moleculedatabaseframework. PMID:24325762
Kiener, Joos
2013-12-11
Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes:•Support for multi-component compounds (mixtures)•Import and export of SD-files•Optional security (authorization)For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures).Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. By using a simple web application it was shown that Molecule Database Framework successfully abstracts chemical structure searches and SD-File import and export to simple method calls. The framework offers good search performance on a standard laptop without any database tuning. This is also due to the fact that chemical structure searches are paged and cached. Molecule Database Framework is available for download on the projects web page on bitbucket: https://bitbucket.org/kienerj/moleculedatabaseframework.
Information integration for a sky survey by data warehousing
NASA Astrophysics Data System (ADS)
Luo, A.; Zhang, Y.; Zhao, Y.
The virtualization service of data system for a sky survey LAMOST is very important for astronomers The service needs to integrate information from data collections catalogs and references and support simple federation of a set of distributed files and associated metadata Data warehousing has been in existence for several years and demonstrated superiority over traditional relational database management systems by providing novel indexing schemes that supported efficient on-line analytical processing OLAP of large databases Now relational database systems such as Oracle etc support the warehouse capability which including extensions to the SQL language to support OLAP operations and a number of metadata management tools have been created The information integration of LAMOST by applying data warehousing is to effectively provide data and knowledge on-line
A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*
Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing
2011-01-01
Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108
REFOLDdb: a new and sustainable gateway to experimental protocols for protein refolding.
Mizutani, Hisashi; Sugawara, Hideaki; Buckle, Ashley M; Sangawa, Takeshi; Miyazono, Ken-Ichi; Ohtsuka, Jun; Nagata, Koji; Shojima, Tomoki; Nosaki, Shohei; Xu, Yuqun; Wang, Delong; Hu, Xiao; Tanokura, Masaru; Yura, Kei
2017-04-24
More than 7000 papers related to "protein refolding" have been published to date, with approximately 300 reports each year during the last decade. Whilst some of these papers provide experimental protocols for protein refolding, a survey in the structural life science communities showed a necessity for a comprehensive database for refolding techniques. We therefore have developed a new resource - "REFOLDdb" that collects refolding techniques into a single, searchable repository to help researchers develop refolding protocols for proteins of interest. We based our resource on the existing REFOLD database, which has not been updated since 2009. We redesigned the data format to be more concise, allowing consistent representations among data entries compared with the original REFOLD database. The remodeled data architecture enhances the search efficiency and improves the sustainability of the database. After an exhaustive literature search we added experimental refolding protocols from reports published 2009 to early 2017. In addition to this new data, we fully converted and integrated existing REFOLD data into our new resource. REFOLDdb contains 1877 entries as of March 17 th , 2017, and is freely available at http://p4d-info.nig.ac.jp/refolddb/ . REFOLDdb is a unique database for the life sciences research community, providing annotated information for designing new refolding protocols and customizing existing methodologies. We envisage that this resource will find wide utility across broad disciplines that rely on the production of pure, active, recombinant proteins. Furthermore, the database also provides a useful overview of the recent trends and statistics in refolding technology development.
A UML Profile for Developing Databases that Conform to the Third Manifesto
NASA Astrophysics Data System (ADS)
Eessaar, Erki
The Third Manifesto (TTM) presents the principles of a relational database language that is free of deficiencies and ambiguities of SQL. There are database management systems that are created according to TTM. Developers need tools that support the development of databases by using these database management systems. UML is a widely used visual modeling language. It provides built-in extension mechanism that makes it possible to extend UML by creating profiles. In this paper, we introduce a UML profile for designing databases that correspond to the rules of TTM. We created the first version of the profile by translating existing profiles of SQL database design. After that, we extended and improved the profile. We implemented the profile by using UML CASE system StarUML™. We present an example of using the new profile. In addition, we describe problems that occurred during the profile development.
NASA Technical Reports Server (NTRS)
Barbre, Robert E., Jr.
2012-01-01
This paper presents the process used by the Marshall Space Flight Center Natural Environments Branch (EV44) to quality control (QC) data from the Kennedy Space Center's 50-MHz Doppler Radar Wind Profiler for use in vehicle wind loads and steering commands. The database has been built to mitigate limitations of using the currently archived databases from weather balloons. The DRWP database contains wind measurements from approximately 2.7-18.6 km altitude at roughly five minute intervals for the August 1997 to December 2009 period of record, and the extensive QC process was designed to remove spurious data from various forms of atmospheric and non-atmospheric artifacts. The QC process is largely based on DRWP literature, but two new algorithms have been developed to remove data contaminated by convection and excessive first guess propagations from the Median Filter First Guess Algorithm. In addition to describing the automated and manual QC process in detail, this paper describes the extent of the data retained. Roughly 58% of all possible wind observations exist in the database, with approximately 100 times as many complete profile sets existing relative to the EV44 balloon databases. This increased sample of near-continuous wind profile measurements may help increase launch availability by reducing the uncertainty of wind changes during launch countdown
Peng, Mingkai; Southern, Danielle A; Williamson, Tyler; Quan, Hude
2017-12-01
This study examined the coding validity of hypertension, diabetes, obesity and depression related to the presence of their co-existing conditions, death status and the number of diagnosis codes in hospital discharge abstract database. We randomly selected 4007 discharge abstract database records from four teaching hospitals in Alberta, Canada and reviewed their charts to extract 31 conditions listed in Charlson and Elixhauser comorbidity indices. Conditions associated with the four study conditions were identified through multivariable logistic regression. Coding validity (i.e. sensitivity, positive predictive value) of the four conditions was related to the presence of their associated conditions. Sensitivity increased with increasing number of diagnosis code. Impact of death on coding validity is minimal. Coding validity of conditions is closely related to its clinical importance and complexity of patients' case mix. We recommend mandatory coding of certain secondary diagnosis to meet the need of health research based on administrative health data.
Legacy2Drupal: Conversion of an existing relational oceanographic database to a Drupal 7 CMS
NASA Astrophysics Data System (ADS)
Work, T. T.; Maffei, A. R.; Chandler, C. L.; Groman, R. C.
2011-12-01
Content Management Systems (CMSs) such as Drupal provide powerful features that can be of use to oceanographic (and other geo-science) data managers. However, in many instances, geo-science data management offices have already designed and implemented customized schemas for their metadata. The NSF funded Biological Chemical and Biological Data Management Office (BCO-DMO) has ported an existing relational database containing oceanographic metadata, along with an existing interface coded in Cold Fusion middleware, to a Drupal 7 Content Management System. This is an update on an effort described as a proof-of-concept in poster IN21B-1051, presented at AGU2009. The BCO-DMO project has translated all the existing database tables, input forms, website reports, and other features present in the existing system into Drupal CMS features. The replacement features are made possible by the use of Drupal content types, CCK node-reference fields, a custom theme, and a number of other supporting modules. This presentation describes the process used to migrate content in the original BCO-DMO metadata database to Drupal 7, some problems encountered during migration, and the modules used to migrate the content successfully. Strategic use of Drupal 7 CMS features that enable three separate but complementary interfaces to provide access to oceanographic research metadata will also be covered: 1) a Drupal 7-powered user front-end; 2) REST-ful JSON web services (providing a Mapserver interface to the metadata and data; and 3) a SPARQL interface to a semantic representation of the repository metadata (this feeding a new faceted search capability currently under development). The existing BCO-DMO ontology, developed in collaboration with Rensselaer Polytechnic Institute's Tetherless World Constellation, makes strategic use of pre-existing ontologies and will be used to drive semantically-enabled faceted search capabilities planned for the site. At this point, the use of semantic technologies included in the Drupal 7 core is anticipated. Using a public domain CMS as opposed to proprietary middleware, and taking advantage of the many features of Drupal 7 that are designed to support semantically-enabled interfaces will help prepare the BCO-DMO and other science data repositories for interoperability between systems that serve ecosystem research data.
Long Term Pavement Performance (LTPP) climatic database revision and expansion
DOT National Transportation Integrated Search
2014-03-07
In 2012, the Federal Highway Administration (FHWA) established the Health in Transportation Working Group to examine the agencys existing policies and programs and their impacts on health-related issues such as air quality, active transportation, ...
Archetype relational mapping - a practical openEHR persistence solution.
Wang, Li; Min, Lingtong; Wang, Rui; Lu, Xudong; Duan, Huilong
2015-11-05
One of the primary obstacles to the widespread adoption of openEHR methodology is the lack of practical persistence solutions for future-proof electronic health record (EHR) systems as described by the openEHR specifications. This paper presents an archetype relational mapping (ARM) persistence solution for the archetype-based EHR systems to support healthcare delivery in the clinical environment. First, the data requirements of the EHR systems are analysed and organized into archetype-friendly concepts. The Clinical Knowledge Manager (CKM) is queried for matching archetypes; when necessary, new archetypes are developed to reflect concepts that are not encompassed by existing archetypes. Next, a template is designed for each archetype to apply constraints related to the local EHR context. Finally, a set of rules is designed to map the archetypes to data tables and provide data persistence based on the relational database. A comparison study was conducted to investigate the differences among the conventional database of an EHR system from a tertiary Class A hospital in China, the generated ARM database, and the Node + Path database. Five data-retrieving tests were designed based on clinical workflow to retrieve exams and laboratory tests. Additionally, two patient-searching tests were designed to identify patients who satisfy certain criteria. The ARM database achieved better performance than the conventional database in three of the five data-retrieving tests, but was less efficient in the remaining two tests. The time difference of query executions conducted by the ARM database and the conventional database is less than 130 %. The ARM database was approximately 6-50 times more efficient than the conventional database in the patient-searching tests, while the Node + Path database requires far more time than the other two databases to execute both the data-retrieving and the patient-searching tests. The ARM approach is capable of generating relational databases using archetypes and templates for archetype-based EHR systems, thus successfully adapting to changes in data requirements. ARM performance is similar to that of conventionally-designed EHR systems, and can be applied in a practical clinical environment. System components such as ARM can greatly facilitate the adoption of openEHR architecture within EHR systems.
The Biomarker Knowledge System Informatics Pilot Project goal will develop network interfaces among databases that contain information about existing clinical populations and biospecimens and data relating to those specimens that are important in biomarker assay validation. This protocol comprises one of two that will comprise the Moffitt participation in the Biomarker Knowledge System Informatics Pilot Project. THIS PROTOCOL (58) is the Sput-Epi Database.
Project Ares: A Systems Engineering and Operations Architecture for the Exploration of Mars
1992-03-20
increased use of automation, experiential databases , expert systems, and fail-soft’ configurations and designs (33:252-253). Automatic communication relay and...communications satellite’s lifetimes, we assume that uplink data rates on the order of 10 Kbps should suffice for command and database uploads. Current...squashed, 20-sided polyhedron configuration which should be relatively easy to obtain. Thus, two extremes for configuration exist. At one end is the site
CVcat: An interactive database on cataclysmic variables
NASA Astrophysics Data System (ADS)
Kube, J.; Gänsicke, B. T.; Euchner, F.; Hoffmann, B.
2003-06-01
CVcat is a database that contains published data on cataclysmic variables and related objects. Unlike in the existing online sources, the users are allowed to add data to the catalogue. The concept of an ``open catalogue'' approach is reviewed together with the experience from one year of public usage of CVcat. New concepts to be included in the upcoming AstroCat framework and the next CVcat implementation are presented. CVcat can be found at http://www.cvcat.org.
NASA Astrophysics Data System (ADS)
Bi, Jiantao; Luo, Guilin; Wang, Xingxing; Zhu, Zuojia
2014-03-01
As the bridge over the Chinese and Western civilization, the ancient Silk Road has made a huge contribution to cultural, economic, political exchanges between China and western countries. In this paper, we treated the historical period of Western Han Dynasty, Eastern Han Dynasty and Tang Dynasty as the research time domain, and the Western Regions' countries that were existed along the Silk Road at the mean time as the research spatial domain. Then we imported these data into the SQL Server database we constructed, from which we could either query the attribute information such as population, military force, the era of the Central Plains empire, the significant events taking place in the country and some related attribute information of these events like the happened calendar year in addition to some related spatial information such as the present location, the coordinates of the capital and the territory by inputting the name of the Western countries. At the same time we could query the significant events, government institution in Central Plains and the existent Western countries at the mean time by inputting the calendar year. Based on the database, associated with GIS, RS, Flex, C# and other related information technology and network technology, we could not only browsing, searching and editing the information of the ancient Silk Road in Xinjiang Province during the Han and Tang Dynasties, but preliminary analysing as well. This is the combination of archaeology and modern information technology, and the database could also be a reference to further study, research and practice in the related fields in the future.
Wang, Lei; Alpert, Kathryn I.; Calhoun, Vince D.; Cobia, Derin J.; Keator, David B.; King, Margaret D.; Kogan, Alexandr; Landis, Drew; Tallis, Marcelo; Turner, Matthew D.; Potkin, Steven G.; Turner, Jessica A.; Ambite, Jose Luis
2015-01-01
SchizConnect (www.schizconnect.org) is built to address the issues of multiple data repositories in schizophrenia neuroimaging studies. It includes a level of mediation—translating across data sources—so that the user can place one query, e.g. for diffusion images from male individuals with schizophrenia, and find out from across participating data sources how many datasets there are, as well as downloading the imaging and related data. The current version handles the Data Usage Agreements across different studies, as well as interpreting database-specific terminologies into a common framework. New data repositories can also be mediated to bring immediate access to existing datasets. Compared with centralized, upload data sharing models, SchizConnect is a unique, virtual database with a focus on schizophrenia and related disorders that can mediate live data as information are being updated at each data source. It is our hope that SchizConnect can facilitate testing new hypotheses through aggregated datasets, promoting discovery related to the mechanisms underlying schizophrenic dysfunction. PMID:26142271
NASA Astrophysics Data System (ADS)
Kadhem, Hasan; Amagasa, Toshiyuki; Kitagawa, Hiroyuki
Encryption can provide strong security for sensitive data against inside and outside attacks. This is especially true in the “Database as Service” model, where confidentiality and privacy are important issues for the client. In fact, existing encryption approaches are vulnerable to a statistical attack because each value is encrypted to another fixed value. This paper presents a novel database encryption scheme called MV-OPES (Multivalued — Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level. Our idea is to encrypt a value to different multiple values to prevent statistical attacks. At the same time, MV-OPES preserves the order of the integer values to allow comparison operations to be directly applied on encrypted data. Using calculated distance (range), we propose a novel method that allows a join query between relations based on inequality over encrypted values. We also present techniques to offload query execution load to a database server as much as possible, thereby making a better use of server resources in a database outsourcing environment. Our scheme can easily be integrated with current database systems as it is designed to work with existing indexing structures. It is robust against statistical attack and the estimation of true values. MV-OPES experiments show that security for sensitive data can be achieved with reasonable overhead, establishing the practicability of the scheme.
Sun, Xiudong; Zhou, Shumei; Meng, Fanlu; Liu, Shiqi
2012-10-01
Garlic is widely used as a spice throughout the world for the culinary value of its flavor and aroma, which are created by the chemical transformation of a series of organic sulfur compounds. To analyze the transcriptome of Allium sativum and discover the genes involved in sulfur metabolism, cDNAs derived from the total RNA of Allium sativum buds were analyzed by Illumina sequencing. Approximately 26.67 million 90 bp paired-end clean reads were achieved in two libraries. A total of 127,933 unigenes were generated by de novo assembly and were compared with the sequences in public databases. Of these, 45,286 unigenes had significant hits to the sequences in the Nr database, 29,514 showed significant similarity to known proteins in the Swiss-Prot database and, 20,706 and 21,952 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Moreover, genes involved in organic sulfur biosynthesis were identified. These unigenes data will provide the foundation for research on gene expression, genomics and functional genomics in Allium sativum. Key message The obtained unigenes will provide the foundation for research on functional genomics in Allium sativum and its closely related species, and fill the gap of the existing plant EST database.
Modeling biology using relational databases.
Peitzsch, Robert M
2003-02-01
There are several different methodologies that can be used for designing a database schema; no one is the best for all occasions. This unit demonstrates two different techniques for designing relational tables and discusses when each should be used. These two techniques presented are (1) traditional Entity-Relationship (E-R) modeling and (2) a hybrid method that combines aspects of data warehousing and E-R modeling. The method of choice depends on (1) how well the information and all its inherent relationships are understood, (2) what types of questions will be asked, (3) how many different types of data will be included, and (4) how much data exists.
Toolbox for Evaluating Residents as Teachers
ERIC Educational Resources Information Center
Coverdale, John H.; Ismail, Nadia; Mian, Ayesha; Dewey, Charlene
2010-01-01
Objective: The authors review existing assessment tools related to evaluating residents' teaching skills and teaching effectiveness. Methods: PubMed and PsycInfo databases were searched using combinations of keywords including "residents," "residents as teachers," "teaching skills," and "assessments" or "rating scales." Results: Eleven evaluation…
AgeFactDB—the JenAge Ageing Factor Database—towards data integration in ageing research
Hühne, Rolf; Thalheim, Torsten; Sühnel, Jürgen
2014-01-01
AgeFactDB (http://agefactdb.jenage.de) is a database aimed at the collection and integration of ageing phenotype data including lifespan information. Ageing factors are considered to be genes, chemical compounds or other factors such as dietary restriction, whose action results in a changed lifespan or another ageing phenotype. Any information related to the effects of ageing factors is called an observation and is presented on observation pages. To provide concise access to the complete information for a particular ageing factor, corresponding observations are also summarized on ageing factor pages. In a first step, ageing-related data were primarily taken from existing databases such as the Ageing Gene Database—GenAge, the Lifespan Observations Database and the Dietary Restriction Gene Database—GenDR. In addition, we have started to include new ageing-related information. Based on homology data taken from the HomoloGene Database, AgeFactDB also provides observation and ageing factor pages of genes that are homologous to known ageing-related genes. These homologues are considered as candidate or putative ageing-related genes. AgeFactDB offers a variety of search and browse options, and also allows the download of ageing factor or observation lists in TSV, CSV and XML formats. PMID:24217911
Haire-Joshu, Debra; Elliott, Michael; Schermbeck, Rebecca; Taricone, Elsa; Green, Scoie; Brownson, Ross C
2010-07-01
The objective of this study was to develop the Missouri Obesity, Nutrition, and Activity Policy Database, a geographically representative baseline of Missouri's existing obesity-related local policies on healthy eating and physical activity. The database is organized to reflect 7 local environments (government, community, health care, worksite, school, after school, and child care) and to describe the prevalence of obesity-related policies in these environments. We employed a stratified nested cluster design using key informant interviews and review of public records to sample 2,356 sites across the 7 target environments for the presence or absence of obesity-related policies. The school environment had the most policies (88%), followed by after school (47%) and health care (32%). Community, government, and child care environments reported smaller proportions of obesity-related policies but higher rates of funding for these policies. Worksite environments had low numbers of obesity-related policies and low funding levels (17% and 6%, respectively). Sixteen of the sampled counties had high obesity-related policy occurrence; 65 had moderate and 8 had low occurrences. Except in Missouri schools, the presence of obesity-related policies is limited. More obesity-related policies are needed so that people have access to environments that support the model behaviors necessary to halt the obesity epidemic. The Missouri Obesity, Nutrition, and Activity Policy Database provides a benchmark for evaluating progress toward the development of obesity-related policies across multiple environments in Missouri.
Tai, David; Fang, Jianwen
2012-08-27
The large sizes of today's chemical databases require efficient algorithms to perform similarity searches. It can be very time consuming to compare two large chemical databases. This paper seeks to build upon existing research efforts by describing a novel strategy for accelerating existing search algorithms for comparing large chemical collections. The quest for efficiency has focused on developing better indexing algorithms by creating heuristics for searching individual chemical against a chemical library by detecting and eliminating needless similarity calculations. For comparing two chemical collections, these algorithms simply execute searches for each chemical in the query set sequentially. The strategy presented in this paper achieves a speedup upon these algorithms by indexing the set of all query chemicals so redundant calculations that arise in the case of sequential searches are eliminated. We implement this novel algorithm by developing a similarity search program called Symmetric inDexing or SymDex. SymDex shows over a 232% maximum speedup compared to the state-of-the-art single query search algorithm over real data for various fingerprint lengths. Considerable speedup is even seen for batch searches where query set sizes are relatively small compared to typical database sizes. To the best of our knowledge, SymDex is the first search algorithm designed specifically for comparing chemical libraries. It can be adapted to most, if not all, existing indexing algorithms and shows potential for accelerating future similarity search algorithms for comparing chemical databases.
NASA Astrophysics Data System (ADS)
Tifafi, Marwa; Guenet, Bertrand; Hatté, Christine
2018-01-01
Soils are the major component of the terrestrial ecosystem and the largest organic carbon reservoir on Earth. However, they are a nonrenewable natural resource and especially reactive to human disturbance and climate change. Despite its importance, soil carbon dynamics is an important source of uncertainty for future climate predictions and there is a growing need for more precise information to better understand the mechanisms controlling soil carbon dynamics and better constrain Earth system models. The aim of our work is to compare soil organic carbon stocks given by different global and regional databases that already exist. We calculated global and regional soil carbon stocks at 1 m depth given by three existing databases (SoilGrids, the Harmonized World Soil Database, and the Northern Circumpolar Soil Carbon Database). We observed that total stocks predicted by each product differ greatly: it is estimated to be around 3,400 Pg by SoilGrids and is about 2,500 Pg according to Harmonized World Soil Database. This difference is marked in particular for boreal regions where differences can be related to high disparities in soil organic carbon concentration. Differences in other regions are more limited and may be related to differences in bulk density estimates. Finally, evaluation of the three data sets versus ground truth data shows that (i) there is a significant difference in spatial patterns between ground truth data and compared data sets and that (ii) data sets underestimate by more than 40% the soil organic carbon stock compared to field data.
Dr. John H. Hopps Jr. Research Scholars Program
2014-10-20
Program staff, alumni and existing participants. Over the course of the last five months, SageFox has successfully obtained IRB approval for all...and awards. Progress made in development of the HoppsNet system included design and implementation of a relational database in MySQL , development of
Hoderlein, Xenia; Moseley, Anne M; Elkins, Mark R
2017-08-01
Many clinical trials are reported without reference to the existing relevant high-quality research. This study aimed to investigate the extent to which authors of reports of clinical trials of physiotherapy interventions try to use high-quality clinical research to (1) help justify the need for the trial in the introduction and (2) help interpret the trial's results in the discussion. Data were extracted from 221 clinical trials that were randomly selected from the Physiotherapy Evidence Database: 70 published in 2001 (10% sample) and 151 published in 2015 (10% sample). The Physiotherapy Evidence Database score (which rates methodological quality and completeness of reporting) for each trial was also downloaded. Overall 41% of trial reports cited a systematic review or the results of a search for other evidence in the introduction section: 20% for 2001 and 50% for 2015 (relative risk = 2.3, 95% confidence interval = 1.5-3.8). For the discussion section, only 1 of 221 trials integrated the results of the trial into an existing meta-analysis, but citation of a relevant systematic review did increase from 17% in 2001 to 34% in 2015. There was no relationship between citation of existing research and the total Physiotherapy Evidence Database score. Published reports of clinical trials of physiotherapy interventions increasingly cite a systematic review or the results of a search for other evidence in the introduction, but integration with existing research in the discussion section is very rare. To encourage the use of existing research, stronger recommendations to refer to existing systematic reviews (where available) could be incorporated into reporting checklists and journal editorial guidelines.
Freshwater Biological Traits Database (Final Report)
EPA announced the release of the final report, Freshwater Biological Traits Database. This report discusses the development of a database of freshwater biological traits. The database combines several existing traits databases into an online format. The database is also...
The Magnetics Information Consortium (MagIC)
NASA Astrophysics Data System (ADS)
Johnson, C.; Constable, C.; Tauxe, L.; Koppers, A.; Banerjee, S.; Jackson, M.; Solheid, P.
2003-12-01
The Magnetics Information Consortium (MagIC) is a multi-user facility to establish and maintain a state-of-the-art relational database and digital archive for rock and paleomagnetic data. The goal of MagIC is to make such data generally available and to provide an information technology infrastructure for these and other research-oriented databases run by the international community. As its name implies, MagIC will not be restricted to paleomagnetic or rock magnetic data only, although MagIC will focus on these kinds of information during its setup phase. MagIC will be hosted under EarthRef.org at http://earthref.org/MAGIC/ where two "integrated" web portals will be developed, one for paleomagnetism (currently functional as a prototype that can be explored via the http://earthref.org/databases/PMAG/ link) and one for rock magnetism. The MagIC database will store all measurements and their derived properties for studies of paleomagnetic directions (inclination, declination) and their intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). Ultimately, this database will allow researchers to study "on the internet" and to download important data sets that display paleo-secular variations in the intensity of the Earth's magnetic field over geological time, or that display magnetic data in typical Zijderveld, hysteresis/FORC and various magnetization/remanence diagrams. The MagIC database is completely integrated in the EarthRef.org relational database structure and thus benefits significantly from already-existing common database components, such as the EarthRef Reference Database (ERR) and Address Book (ERAB). The ERR allows researchers to find complete sets of literature resources as used in GERM (Geochemical Earth Reference Model), REM (Reference Earth Model) and MagIC. The ERAB contains addresses for all contributors to the EarthRef.org databases, and also for those who participated in data collection, archiving and analysis in the magnetic studies. Integration with these existing components will guarantee direct traceability to the original sources of the MagIC data and metadata. The MagIC database design focuses around the general workflow that results in the determination of typical paleomagnetic and rock magnetic analyses. This ensures that individual data points can be traced between the actual measurements and their associated specimen, sample, site, rock formation and locality. This permits a distinction between original and derived data, where the actual measurements are performed at the specimen level, and data at the sample level and higher are then derived products in the database. These relations will also allow recalculation of derived properties, such as site means, when new data becomes available for a specific locality. Data contribution to the MagIC database is critical in achieving a useful research tool. We have developed a standard data and metadata template that can be used to provide all data at the same time as publication. Software tools are provided to facilitate easy population of these templates. The tools allow for the import/export of data files in a delimited text format, and they provide some advanced functionality to validate data and to check internal coherence of the data in the template. During and after publication these standardized MagIC templates will be stored in the ERR database of EarthRef.org from where they can be downloaded at all times. Finally, the contents of these template files will be automatically parsed into the online relational database.
Wang, Lei; Alpert, Kathryn I; Calhoun, Vince D; Cobia, Derin J; Keator, David B; King, Margaret D; Kogan, Alexandr; Landis, Drew; Tallis, Marcelo; Turner, Matthew D; Potkin, Steven G; Turner, Jessica A; Ambite, Jose Luis
2016-01-01
SchizConnect (www.schizconnect.org) is built to address the issues of multiple data repositories in schizophrenia neuroimaging studies. It includes a level of mediation--translating across data sources--so that the user can place one query, e.g. for diffusion images from male individuals with schizophrenia, and find out from across participating data sources how many datasets there are, as well as downloading the imaging and related data. The current version handles the Data Usage Agreements across different studies, as well as interpreting database-specific terminologies into a common framework. New data repositories can also be mediated to bring immediate access to existing datasets. Compared with centralized, upload data sharing models, SchizConnect is a unique, virtual database with a focus on schizophrenia and related disorders that can mediate live data as information is being updated at each data source. It is our hope that SchizConnect can facilitate testing new hypotheses through aggregated datasets, promoting discovery related to the mechanisms underlying schizophrenic dysfunction. Copyright © 2015 Elsevier Inc. All rights reserved.
Draft secure medical database standard.
Pangalos, George
2002-01-01
Medical database security is a particularly important issue for all Healthcare establishments. Medical information systems are intended to support a wide range of pertinent health issues today, for example: assure the quality of care, support effective management of the health services institutions, monitor and contain the cost of care, implement technology into care without violating social values, ensure the equity and availability of care, preserve humanity despite the proliferation of technology etc.. In this context, medical database security aims primarily to support: high availability, accuracy and consistency of the stored data, the medical professional secrecy and confidentiality, and the protection of the privacy of the patient. These properties, though of technical nature, basically require that the system is actually helpful for medical care and not harmful to patients. These later properties require in turn not only that fundamental ethical principles are not violated by employing database systems, but instead, are effectively enforced by technical means. This document reviews the existing and emerging work on the security of medical database systems. It presents in detail the related problems and requirements related to medical database security. It addresses the problems of medical database security policies, secure design methodologies and implementation techniques. It also describes the current legal framework and regulatory requirements for medical database security. The issue of medical database security guidelines is also examined in detailed. The current national and international efforts in the area are studied. It also gives an overview of the research work in the area. The document also presents in detail the most complete to our knowledge set of security guidelines for the development and operation of medical database systems.
NASA Astrophysics Data System (ADS)
Wolfgramm, Bettina; Hurni, Hans; Liniger, Hanspeter; Ruppen, Sebastian; Milne, Eleanor; Bader, Hans-Peter; Scheidegger, Ruth; Amare, Tadele; Yitaferu, Birru; Nazarmavloev, Farrukh; Conder, Malgorzata; Ebneter, Laura; Qadamov, Aslam; Shokirov, Qobiljon; Hergarten, Christian; Schwilch, Gudrun
2013-04-01
There is a fundamental mutual interest between enhancing soil organic carbon (SOC) in the world's soils and the objectives of the major global environmental conventions (UNFCCC, UNCBD, UNCCD). While there is evidence at the case study level that sustainable land management (SLM) technologies increase SOC stocks and SOC related benefits, there is no quantitative data available on the potential for increasing SOC benefits from different SLM technologies and especially from case studies in the developing countries, and a clear understanding of the trade-offs related to SLM up-scaling is missing. This study aims at assessing the potential increase of SOC under SLM technologies worldwide, evaluating tradeoffs and gains in up-scaling SLM for case studies in Tajikistan, Ethiopia and Switzerland. It makes use of the SLM technologies documented in the online database of the World Overview of Conservation Approaches and Technologies (WOCAT). The study consists of three components: 1) Identifying SOC benefits contributing to the major global environmental issues for SLM technologies worldwide as documented in the WOCAT global database 2) Validation of SOC storage potentials and SOC benefit predictions for SLM technologies from the WOCAT database using results from existing comparative case studies at the plot level, using soil spectral libraries and standardized documentations of ecosystem service from the WOCAT database. 3) Understanding trade-offs and win-win scenarios of up-scaling SLM technologies from the plot to the household and landscape level using material flow analysis. This study builds on the premise that the most promising way to increase benefits from land management is to consider already existing sustainable strategies. Such SLM technologies from all over the world documented are accessible in a standardized way in the WOCAT online database. The study thus evaluates SLM technologies from the WOCAT database by calculating the potential SOC storage increase and related benefits by comparing SOC estimates before-and-after establishment of the SLM technology. These results are validated using comparative case studies of plots with-and-without SLM technologies (existing SLM systems versus surrounding, degrading systems). In view of upscaling SLM technologies, it is crucial to understand tradeoffs and gains supporting or hindering the further spread. Systemic biomass management analysis using material flow analysis allows quantifying organic carbon flows and storages for different land management options at the household, but also at landscape level. The study shows results relevant for science, policy and practice for accounting, monitoring and evaluating SOC related ecosystem services: - A comprehensive methodology for SLM impact assessments allowing quantification of SOC storage and SOC related benefits under different SLM technologies, and - Improved understanding of upscaling options for SLM technologies and tradeoffs as well as win-win opportunities for biomass management, SOC content increase, and ecosystem services improvement at the plot and household level.
DataSpread: Unifying Databases and Spreadsheets.
Bendre, Mangesh; Sun, Bofan; Zhang, Ding; Zhou, Xinyan; Chang, Kevin ChenChuan; Parameswaran, Aditya
2015-08-01
Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current "pane" (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases.
DataSpread: Unifying Databases and Spreadsheets
Bendre, Mangesh; Sun, Bofan; Zhang, Ding; Zhou, Xinyan; Chang, Kevin ChenChuan; Parameswaran, Aditya
2015-01-01
Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current “pane” (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases. PMID:26900487
Using ontology databases for scalable query answering, inconsistency detection, and data integration
Dou, Dejing
2011-01-01
An ontology database is a basic relational database management system that models an ontology plus its instances. To reason over the transitive closure of instances in the subsumption hierarchy, for example, an ontology database can either unfold views at query time or propagate assertions using triggers at load time. In this paper, we use existing benchmarks to evaluate our method—using triggers—and we demonstrate that by forward computing inferences, we not only improve query time, but the improvement appears to cost only more space (not time). However, we go on to show that the true penalties were simply opaque to the benchmark, i.e., the benchmark inadequately captures load-time costs. We have applied our methods to two case studies in biomedicine, using ontologies and data from genetics and neuroscience to illustrate two important applications: first, ontology databases answer ontology-based queries effectively; second, using triggers, ontology databases detect instance-based inconsistencies—something not possible using views. Finally, we demonstrate how to extend our methods to perform data integration across multiple, distributed ontology databases. PMID:22163378
FlavonoidSearch: A system for comprehensive flavonoid annotation by mass spectrometry.
Akimoto, Nayumi; Ara, Takeshi; Nakajima, Daisuke; Suda, Kunihiro; Ikeda, Chiaki; Takahashi, Shingo; Muneto, Reiko; Yamada, Manabu; Suzuki, Hideyuki; Shibata, Daisuke; Sakurai, Nozomu
2017-04-28
Currently, in mass spectrometry-based metabolomics, limited reference mass spectra are available for flavonoid identification. In the present study, a database of probable mass fragments for 6,867 known flavonoids (FsDatabase) was manually constructed based on new structure- and fragmentation-related rules using new heuristics to overcome flavonoid complexity. We developed the FlavonoidSearch system for flavonoid annotation, which consists of the FsDatabase and a computational tool (FsTool) to automatically search the FsDatabase using the mass spectra of metabolite peaks as queries. This system showed the highest identification accuracy for the flavonoid aglycone when compared to existing tools and revealed accurate discrimination between the flavonoid aglycone and other compounds. Sixteen new flavonoids were found from parsley, and the diversity of the flavonoid aglycone among different fruits and vegetables was investigated.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-07-11
..., Assistant General Counsel for Legislation, Regulation, and Energy Efficiency, U.S. Department of Energy... ineffectively used. Related to appliance efficiency standards rulemakings, two comments expressed concern that... encouraged DOE to streamline its reporting databases to improve efficiency and reduce maintenance costs...
Saokaew, Surasak; Sugimoto, Takashi; Kamae, Isao; Pratoomsoot, Chayanin; Chaiyakunapruk, Nathorn
2015-01-01
Health technology assessment (HTA) has been continuously used for value-based healthcare decisions over the last decade. Healthcare databases represent an important source of information for HTA, which has seen a surge in use in Western countries. Although HTA agencies have been established in Asia-Pacific region, application and understanding of healthcare databases for HTA is rather limited. Thus, we reviewed existing databases to assess their potential for HTA in Thailand where HTA has been used officially and Japan where HTA is going to be officially introduced. Existing healthcare databases in Thailand and Japan were compiled and reviewed. Databases' characteristics e.g. name of database, host, scope/objective, time/sample size, design, data collection method, population/sample, and variables were described. Databases were assessed for its potential HTA use in terms of safety/efficacy/effectiveness, social/ethical, organization/professional, economic, and epidemiological domains. Request route for each database was also provided. Forty databases- 20 from Thailand and 20 from Japan-were included. These comprised of national censuses, surveys, registries, administrative data, and claimed databases. All databases were potentially used for epidemiological studies. In addition, data on mortality, morbidity, disability, adverse events, quality of life, service/technology utilization, length of stay, and economics were also found in some databases. However, access to patient-level data was limited since information about the databases was not available on public sources. Our findings have shown that existing databases provided valuable information for HTA research with limitation on accessibility. Mutual dialogue on healthcare database development and usage for HTA among Asia-Pacific region is needed.
ReprDB and panDB: minimalist databases with maximal microbial representation.
Zhou, Wei; Gay, Nicole; Oh, Julia
2018-01-18
Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses.
MGDB: a comprehensive database of genes involved in melanoma.
Zhang, Di; Zhu, Rongrong; Zhang, Hanqian; Zheng, Chun-Hou; Xia, Junfeng
2015-01-01
The Melanoma Gene Database (MGDB) is a manually curated catalog of molecular genetic data relating to genes involved in melanoma. The main purpose of this database is to establish a network of melanoma related genes and to facilitate the mechanistic study of melanoma tumorigenesis. The entries describing the relationships between melanoma and genes in the current release were manually extracted from PubMed abstracts, which contains cumulative to date 527 human melanoma genes (422 protein-coding and 105 non-coding genes). Each melanoma gene was annotated in seven different aspects (General Information, Expression, Methylation, Mutation, Interaction, Pathway and Drug). In addition, manually curated literature references have also been provided to support the inclusion of the gene in MGDB and establish its association with melanoma. MGDB has a user-friendly web interface with multiple browse and search functions. We hoped MGDB will enrich our knowledge about melanoma genetics and serve as a useful complement to the existing public resources. Database URL: http://bioinfo.ahu.edu.cn:8080/Melanoma/index.jsp. © The Author(s) 2015. Published by Oxford University Press.
A Codasyl-Type Schema for Natural Language Medical Records
Sager, N.; Tick, L.; Story, G.; Hirschman, L.
1980-01-01
This paper describes a CODASYL (network) database schema for information derived from narrative clinical reports. The goal of this work is to create an automated process that accepts natural language documents as input and maps this information into a database of a type managed by existing database management systems. The schema described here represents the medical events and facts identified through the natural language processing. This processing decomposes each narrative into a set of elementary assertions, represented as MEDFACT records in the database. Each assertion in turn consists of a subject and a predicate classed according to a limited number of medical event types, e.g., signs/symptoms, laboratory tests, etc. The subject and predicate are represented by EVENT records which are owned by the MEDFACT record associated with the assertion. The CODASYL-type network structure was found to be suitable for expressing most of the relations needed to represent the natural language information. However, special mechanisms were developed for storing the time relations between EVENT records and for recording connections (such as causality) between certain MEDFACT records. This schema has been implemented using the UNIVAC DMS-1100 DBMS.
Barron, Andrew D.; Ramsey, David W.; Smith, James G.
2014-01-01
This digital database contains information used to produce the geologic map published as Sheet 1 in U.S. Geological Survey Miscellaneous Investigations Series Map I-2005. (Sheet 2 of Map I-2005 shows sources of geologic data used in the compilation and is available separately). Sheet 1 of Map I-2005 shows the distribution and relations of volcanic and related rock units in the Cascade Range of Washington at a scale of 1:500,000. This digital release is produced from stable materials originally compiled at 1:250,000 scale that were used to publish Sheet 1. The database therefore contains more detailed geologic information than is portrayed on Sheet 1. This is most noticeable in the database as expanded polygons of surficial units and the presence of additional strands of concealed faults. No stable compilation materials exist for Sheet 1 at 1:500,000 scale. The main component of this digital release is a spatial database prepared using geographic information systems (GIS) applications. This release also contains links to files to view or print the map sheet, main report text, and accompanying mapping reference sheet from Map I-2005. For more information on volcanoes in the Cascade Range in Washington, Oregon, or California, please refer to the U.S. Geological Survey Volcano Hazards Program website.
Sharma, Amit K; Gohel, Sangeeta; Singh, Satya P
2012-01-01
Actinobase is a relational database of molecular diversity, phylogeny and biocatalytic potential of haloalkaliphilic actinomycetes. The main objective of this data base is to provide easy access to range of information, data storage, comparison and analysis apart from reduced data redundancy, data entry, storage, retrieval costs and improve data security. Information related to habitat, cell morphology, Gram reaction, biochemical characterization and molecular features would allow researchers in understanding identification and stress adaptation of the existing and new candidates belonging to salt tolerant alkaliphilic actinomycetes. The PHP front end helps to add nucleotides and protein sequence of reported entries which directly help researchers to obtain the required details. Analysis of the genus wise status of the salt tolerant alkaliphilic actinomycetes indicated 6 different genera among the 40 classified entries of the salt tolerant alkaliphilic actinomycetes. The results represented wide spread occurrence of salt tolerant alkaliphilic actinomycetes belonging to diverse taxonomic positions. Entries and information related to actinomycetes in the database are publicly accessible at http://www.actinobase.in. On clustalW/X multiple sequence alignment of the alkaline protease gene sequences, different clusters emerged among the groups. The narrow search and limit options of the constructed database provided comparable information. The user friendly access to PHP front end facilitates would facilitate addition of sequences of reported entries. The database is available for free at http://www.actinobase.in.
CampusGIS of the University of Cologne: a tool for orientation, navigation, and management
NASA Astrophysics Data System (ADS)
Baaser, U.; Gnyp, M. L.; Hennig, S.; Hoffmeister, D.; Köhn, N.; Laudien, R.; Bareth, G.
2006-10-01
The working group for GIS and Remote Sensing at the Department of Geography at the University of Cologne has established a WebGIS called CampusGIS of the University of Cologne. The overall task of the CampusGIS is the connection of several existing databases at the University of Cologne with spatial data. These existing databases comprise data about staff, buildings, rooms, lectures, and general infrastructure like bus stops etc. These information were yet not linked to their spatial relation. Therefore, a GIS-based method is developed to link all the different databases to spatial entities. Due to the philosophy of the CampusGIS, an online-GUI is programmed which enables users to search for staff, buildings, or institutions. The query results are linked to the GIS database which allows the visualization of the spatial location of the searched entity. This system was established in 2005 and is operational since early 2006. In this contribution, the focus is on further developments. First results of (i) including routing services in, (ii) programming GUIs for mobile devices for, and (iii) including infrastructure management tools in the CampusGIS are presented. Consequently, the CampusGIS is not only available for spatial information retrieval and orientation. It also serves for on-campus navigation and administrative management.
A Circular Dichroism Reference Database for Membrane Proteins
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wallace,B.; Wien, F.; Stone, T.
2006-01-01
Membrane proteins are a major product of most genomes and the target of a large number of current pharmaceuticals, yet little information exists on their structures because of the difficulty of crystallising them; hence for the most part they have been excluded from structural genomics programme targets. Furthermore, even methods such as circular dichroism (CD) spectroscopy which seek to define secondary structure have not been fully exploited because of technical limitations to their interpretation for membrane embedded proteins. Empirical analyses of circular dichroism (CD) spectra are valuable for providing information on secondary structures of proteins. However, the accuracy of themore » results depends on the appropriateness of the reference databases used in the analyses. Membrane proteins have different spectral characteristics than do soluble proteins as a result of the low dielectric constants of membrane bilayers relative to those of aqueous solutions (Chen & Wallace (1997) Biophys. Chem. 65:65-74). To date, no CD reference database exists exclusively for the analysis of membrane proteins, and hence empirical analyses based on current reference databases derived from soluble proteins are not adequate for accurate analyses of membrane protein secondary structures (Wallace et al (2003) Prot. Sci. 12:875-884). We have therefore created a new reference database of CD spectra of integral membrane proteins whose crystal structures have been determined. To date it contains more than 20 proteins, and spans the range of secondary structures from mostly helical to mostly sheet proteins. This reference database should enable more accurate secondary structure determinations of membrane embedded proteins and will become one of the reference database options in the CD calculation server DICHROWEB (Whitmore & Wallace (2004) NAR 32:W668-673).« less
Saokaew, Surasak; Sugimoto, Takashi; Kamae, Isao; Pratoomsoot, Chayanin; Chaiyakunapruk, Nathorn
2015-01-01
Background Health technology assessment (HTA) has been continuously used for value-based healthcare decisions over the last decade. Healthcare databases represent an important source of information for HTA, which has seen a surge in use in Western countries. Although HTA agencies have been established in Asia-Pacific region, application and understanding of healthcare databases for HTA is rather limited. Thus, we reviewed existing databases to assess their potential for HTA in Thailand where HTA has been used officially and Japan where HTA is going to be officially introduced. Method Existing healthcare databases in Thailand and Japan were compiled and reviewed. Databases’ characteristics e.g. name of database, host, scope/objective, time/sample size, design, data collection method, population/sample, and variables were described. Databases were assessed for its potential HTA use in terms of safety/efficacy/effectiveness, social/ethical, organization/professional, economic, and epidemiological domains. Request route for each database was also provided. Results Forty databases– 20 from Thailand and 20 from Japan—were included. These comprised of national censuses, surveys, registries, administrative data, and claimed databases. All databases were potentially used for epidemiological studies. In addition, data on mortality, morbidity, disability, adverse events, quality of life, service/technology utilization, length of stay, and economics were also found in some databases. However, access to patient-level data was limited since information about the databases was not available on public sources. Conclusion Our findings have shown that existing databases provided valuable information for HTA research with limitation on accessibility. Mutual dialogue on healthcare database development and usage for HTA among Asia-Pacific region is needed. PMID:26560127
Fine-grained policy control in U.S. Army Research Laboratory (ARL) multimodal signatures database
NASA Astrophysics Data System (ADS)
Bennett, Kelly; Grueneberg, Keith; Wood, David; Calo, Seraphin
2014-06-01
The U.S. Army Research Laboratory (ARL) Multimodal Signatures Database (MMSDB) consists of a number of colocated relational databases representing a collection of data from various sensors. Role-based access to this data is granted to external organizations such as DoD contractors and other government agencies through a client Web portal. In the current MMSDB system, access control is only at the database and firewall level. In order to offer finer grained security, changes to existing user profile schemas and authentication mechanisms are usually needed. In this paper, we describe a software middleware architecture and implementation that allows fine-grained access control to the MMSDB at a dataset, table, and row level. Result sets from MMSDB queries issued in the client portal are filtered with the use of a policy enforcement proxy, with minimal changes to the existing client software and database. Before resulting data is returned to the client, policies are evaluated to determine if the user or role is authorized to access the data. Policies can be authored to filter data at the row, table or column level of a result set. The system uses various technologies developed in the International Technology Alliance in Network and Information Science (ITA) for policy-controlled information sharing and dissemination1. Use of the Policy Management Library provides a mechanism for the management and evaluation of policies to support finer grained access to the data in the MMSDB system. The GaianDB is a policy-enabled, federated database that acts as a proxy between the client application and the MMSDB system.
Ontology to relational database transformation for web application development and maintenance
NASA Astrophysics Data System (ADS)
Mahmudi, Kamal; Inggriani Liem, M. M.; Akbar, Saiful
2018-03-01
Ontology is used as knowledge representation while database is used as facts recorder in a KMS (Knowledge Management System). In most applications, data are managed in a database system and updated through the application and then they are transformed to knowledge as needed. Once a domain conceptor defines the knowledge in the ontology, application and database can be generated from the ontology. Most existing frameworks generate application from its database. In this research, ontology is used for generating the application. As the data are updated through the application, a mechanism is designed to trigger an update to the ontology so that the application can be rebuilt based on the newest ontology. By this approach, a knowledge engineer has a full flexibility to renew the application based on the latest ontology without dependency to a software developer. In many cases, the concept needs to be updated when the data changed. The framework is built and tested in a spring java environment. A case study was conducted to proof the concepts.
High-quality unsaturated zone hydraulic property data for hydrologic applications
Perkins, Kimberlie; Nimmo, John R.
2009-01-01
In hydrologic studies, especially those using dynamic unsaturated zone moisture modeling, calculations based on property transfer models informed by hydraulic property databases are often used in lieu of measured data from the site of interest. Reliance on database-informed predicted values has become increasingly common with the use of neural networks. High-quality data are needed for databases used in this way and for theoretical and property transfer model development and testing. Hydraulic properties predicted on the basis of existing databases may be adequate in some applications but not others. An obvious problem occurs when the available database has few or no data for samples that are closely related to the medium of interest. The data set presented in this paper includes saturated and unsaturated hydraulic conductivity, water retention, particle-size distributions, and bulk properties. All samples are minimally disturbed, all measurements were performed using the same state of the art techniques and the environments represented are diverse.
PhamDB: a web-based application for building Phamerator databases.
Lamine, James G; DeJong, Randall J; Nelesen, Serita M
2016-07-01
PhamDB is a web application which creates databases of bacteriophage genes, grouped by gene similarity. It is backwards compatible with the existing Phamerator desktop software while providing an improved database creation workflow. Key features include a graphical user interface, validation of uploaded GenBank files, and abilities to import phages from existing databases, modify existing databases and queue multiple jobs. Source code and installation instructions for Linux, Windows and Mac OSX are freely available at https://github.com/jglamine/phage PhamDB is also distributed as a docker image which can be managed via Kitematic. This docker image contains the application and all third party software dependencies as a pre-configured system, and is freely available via the installation instructions provided. snelesen@calvin.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Forensic DNA databases in Western Balkan region: retrospectives, perspectives, and initiatives
Marjanović, Damir; Konjhodžić, Rijad; Butorac, Sara Sanela; Drobnič, Katja; Merkaš, Siniša; Lauc, Gordan; Primorac, Damir; Anđelinović, Šimun; Milosavljević, Mladen; Karan, Željko; Vidović, Stojko; Stojković, Oliver; Panić, Bojana; Vučetić Dragović, Anđelka; Kovačević, Sandra; Jakovski, Zlatko; Asplen, Chris; Primorac, Dragan
2011-01-01
The European Network of Forensic Science Institutes (ENFSI) recommended the establishment of forensic DNA databases and specific implementation and management legislations for all EU/ENFSI members. Therefore, forensic institutions from Bosnia and Herzegovina, Serbia, Montenegro, and Macedonia launched a wide set of activities to support these recommendations. To assess the current state, a regional expert team completed detailed screening and investigation of the existing forensic DNA data repositories and associated legislation in these countries. The scope also included relevant concurrent projects and a wide spectrum of different activities in relation to forensics DNA use. The state of forensic DNA analysis was also determined in the neighboring Slovenia and Croatia, which already have functional national DNA databases. There is a need for a ‘regional supplement’ to the current documentation and standards pertaining to forensic application of DNA databases, which should include regional-specific preliminary aims and recommendations. PMID:21674821
DOE Office of Scientific and Technical Information (OSTI.GOV)
Starr, D. L.; Wozniak, P. R.; Vestrand, W. T.
2002-01-01
SkyDOT (Sky Database for Objects in Time-Domain) is a Virtual Observatory currently comprised of data from the RAPTOR, ROTSE I, and OGLE I1 survey projects. This makes it a very large time domain database. In addition, the RAPTOR project provides SkyDOT with real-time variability data as well as stereoscopic information. With its web interface, we believe SkyDOT will be a very useful tool for both astronomers, and the public. Our main task has been to construct an efficient relational database containing all existing data, while handling a real-time inflow of data. We also provide a useful web interface allowing easymore » access to both astronomers and the public. Initially, this server will allow common searches, specific queries, and access to light curves. In the future we will include machine learning classification tools and access to spectral information.« less
Forensic DNA databases in Western Balkan region: retrospectives, perspectives, and initiatives.
Marjanović, Damir; Konjhodzić, Rijad; Butorac, Sara Sanela; Drobnic, Katja; Merkas, Sinisa; Lauc, Gordan; Primorac, Damir; Andjelinović, Simun; Milosavljević, Mladen; Karan, Zeljko; Vidović, Stojko; Stojković, Oliver; Panić, Bojana; Vucetić Dragović, Andjelka; Kovacević, Sandra; Jakovski, Zlatko; Asplen, Chris; Primorac, Dragan
2011-06-01
The European Network of Forensic Science Institutes (ENFSI) recommended the establishment of forensic DNA databases and specific implementation and management legislations for all EU/ENFSI members. Therefore, forensic institutions from Bosnia and Herzegovina, Serbia, Montenegro, and Macedonia launched a wide set of activities to support these recommendations. To assess the current state, a regional expert team completed detailed screening and investigation of the existing forensic DNA data repositories and associated legislation in these countries. The scope also included relevant concurrent projects and a wide spectrum of different activities in relation to forensics DNA use. The state of forensic DNA analysis was also determined in the neighboring Slovenia and Croatia, which already have functional national DNA databases. There is a need for a 'regional supplement' to the current documentation and standards pertaining to forensic application of DNA databases, which should include regional-specific preliminary aims and recommendations.
Planners and decision makers are challenged to consider not only direct market costs, but also ecological externalities. There is an increasing emphasis on ecosystem services in the context of human well-being, and therefore the valuation and accounting of ecosystem services is b...
Agreement among Response to Intervention Criteria for Identifying Responder Status
ERIC Educational Resources Information Center
Barth, Amy E.; Stuebing, Karla K.; Anthony, Jason L.; Denton, Carolyn A.; Mathes, Patricia G.; Fletcher, Jack M.; Francis, David J.
2008-01-01
In order to better understand the extent to which operationalizations of response to intervention (RTI) overlap and agree in identifying adequate and inadequate responders, an existing database of 399 first grade students was evaluated in relation to cut-points, measures, and methods frequently cited for the identification of inadequate responders…
Carcinogenicity and Mutagenicity Data: New Initiatives to ...
Currents models for prediction of chemical carcinogenicity and mutagenicity rely upon a relatively small number of publicly available data resources, where the data being modeled are highly summarized and aggregated representations of the actual experimental results. A number of new initiatives are underway to improve access to existing public carcinogenicity and mutagenicity data for use in modeling, as well as to encourage new approaches to the use of data in modeling. Rodent bioassay results from the NIEHS National Toxicology Program (NTP) and the Berkeley Carcinogenic Potency Database (CPDB) have provided the largest public data resources for building carcinogenicity prediction models to date. However, relatively few and limited representations of these data have actually informed existing models. Initiatives, such as EPA's DSSTox Database Network, offer elaborated and quality reviewed presentations of the CPDB and expanded data linkages and coverage of chemical space for carcinogenicity and mutagenicity. In particular the latest published DSSTox CPDBAS structure-data file includes a number of species-specific and summary activity fields, including a species-specific normalized score for carcinogenic potency (TD50) and various weighted summary activities. These data are being incorporated into PubChem to provide broad
Padliya, Neerav D; Garrett, Wesley M; Campbell, Kimberly B; Tabb, David L; Cooper, Bret
2007-11-01
LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.
Cadastral Database Positional Accuracy Improvement
NASA Astrophysics Data System (ADS)
Hashim, N. M.; Omar, A. H.; Ramli, S. N. M.; Omar, K. M.; Din, N.
2017-10-01
Positional Accuracy Improvement (PAI) is the refining process of the geometry feature in a geospatial dataset to improve its actual position. This actual position relates to the absolute position in specific coordinate system and the relation to the neighborhood features. With the growth of spatial based technology especially Geographical Information System (GIS) and Global Navigation Satellite System (GNSS), the PAI campaign is inevitable especially to the legacy cadastral database. Integration of legacy dataset and higher accuracy dataset like GNSS observation is a potential solution for improving the legacy dataset. However, by merely integrating both datasets will lead to a distortion of the relative geometry. The improved dataset should be further treated to minimize inherent errors and fitting to the new accurate dataset. The main focus of this study is to describe a method of angular based Least Square Adjustment (LSA) for PAI process of legacy dataset. The existing high accuracy dataset known as National Digital Cadastral Database (NDCDB) is then used as bench mark to validate the results. It was found that the propose technique is highly possible for positional accuracy improvement of legacy spatial datasets.
Nakayama, Takeo; Imanaka, Yuichi; Okuno, Yasushi; Kato, Genta; Kuroda, Tomohiro; Goto, Rei; Tanaka, Shiro; Tamura, Hiroshi; Fukuhara, Shunichi; Fukuma, Shingo; Muto, Manabu; Yanagita, Motoko; Yamamoto, Yosuke
2017-06-06
As Japan becomes a super-aging society, presentation of the best ways to provide medical care for the elderly, and the direction of that care, are important national issues. Elderly people have multi-morbidity with numerous medical conditions and use many medical resources for complex treatment patterns. This increases the likelihood of inappropriate medical practices and an evidence-practice gap. The present study aimed to: derive findings that are applicable to policy from an elucidation of the actual state of medical care for the elderly; establish a foundation for the utilization of National Database of Health Insurance Claims and Specific Health Checkups of Japan (NDB), and present measures for the utilization of existing databases in parallel with NDB validation.Cross-sectional and retrospective cohort studies were conducted using the NDB built by the Ministry of Health, Labor and Welfare of Japan, private health insurance claims databases, and the Kyoto University Hospital database (including related hospitals). Medical practices (drug prescription, interventional procedures, testing) related to four issues-potential inappropriate medication, cancer therapy, chronic kidney disease treatment, and end-of-life care-will be described. The relationships between these issues and clinical outcomes (death, initiation of dialysis and other adverse events) will be evaluated, if possible.
Assessment of COPD-related outcomes via a national electronic medical record database.
Asche, Carl; Said, Quayyim; Joish, Vijay; Hall, Charles Oaxaca; Brixner, Diana
2008-01-01
The technology and sophistication of healthcare utilization databases have expanded over the last decade to include results of lab tests, vital signs, and other clinical information. This review provides an assessment of the methodological and analytical challenges of conducting chronic obstructive pulmonary disease (COPD) outcomes research in a national electronic medical records (EMR) dataset and its potential application towards the assessment of national health policy issues, as well as a description of the challenges or limitations. An EMR database and its application to measuring outcomes for COPD are described. The ability to measure adherence to the COPD evidence-based practice guidelines, generated by the NIH and HEDIS quality indicators, in this database was examined. Case studies, before and after their publication, were used to assess the adherence to guidelines and gauge the conformity to quality indicators. EMR was the only source of information for pulmonary function tests, but low frequency in ordering by primary care was an issue. The EMR data can be used to explore impact of variation in healthcare provision on clinical outcomes. The EMR database permits access to specific lab data and biometric information. The richness and depth of information on "real world" use of health services for large population-based analytical studies at relatively low cost render such databases an attractive resource for outcomes research. Various sources of information exist to perform outcomes research. It is important to understand the desired endpoints of such research and choose the appropriate database source.
Batista Rodríguez, Gabriela; Balla, Andrea; Corradetti, Santiago; Martinez, Carmen; Hernández, Pilar; Bollo, Jesús; Targarona, Eduard M
2018-06-01
"Big data" refers to large amount of dataset. Those large databases are useful in many areas, including healthcare. The American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) and the National Inpatient Sample (NIS) are big databases that were developed in the USA in order to record surgical outcomes. The aim of the present systematic review is to evaluate the type and clinical impact of the information retrieved through NISQP and NIS big database articles focused on laparoscopic colorectal surgery. A systematic review was conducted using The Meta-Analysis Of Observational Studies in Epidemiology (MOOSE) guidelines. The research was carried out on PubMed database and revealed 350 published papers. Outcomes of articles in which laparoscopic colorectal surgery was the primary aim were analyzed. Fifty-five studies, published between 2007 and February 2017, were included. Articles included were categorized in groups according to the main topic as: outcomes related to surgical technique comparisons, morbidity and perioperatory results, specific disease-related outcomes, sociodemographic disparities, and academic training impact. NSQIP and NIS databases are just the tip of the iceberg for the potential application of Big Data technology and analysis in MIS. Information obtained through big data is useful and could be considered as external validation in those situations where a significant evidence-based medicine exists; also, those databases establish benchmarks to measure the quality of patient care. Data retrieved helps to inform decision-making and improve healthcare delivery.
USE OF EXISTING DATABASES FOR THE PURPOSE OF HAZARD IDENTIFICATION: AN EXAMPLE
Keywords: existing databases, hazard identification, cancer mortality, birth malformations
Background: Associations between adverse health effects and environmental exposures are difficult to study, because exposures may be widespread, low-dose in nature, and common thro...
MTO-like reference mask modeling for advanced inverse lithography technology patterns
NASA Astrophysics Data System (ADS)
Park, Jongju; Moon, Jongin; Son, Suein; Chung, Donghoon; Kim, Byung-Gook; Jeon, Chan-Uk; LoPresti, Patrick; Xue, Shan; Wang, Sonny; Broadbent, Bill; Kim, Soonho; Hur, Jiuk; Choo, Min
2017-07-01
Advanced Inverse Lithography Technology (ILT) can result in mask post-OPC databases with very small address units, all-angle figures, and very high vertex counts. This creates mask inspection issues for existing mask inspection database rendering. These issues include: large data volumes, low transfer rate, long data preparation times, slow inspection throughput, and marginal rendering accuracy leading to high false detections. This paper demonstrates the application of a new rendering method including a new OASIS-like mask inspection format, new high-speed rendering algorithms, and related hardware to meet the inspection challenges posed by Advanced ILT masks.
Ontological interpretation of biomedical database content.
Santana da Silva, Filipe; Jansen, Ludger; Freitas, Fred; Schulz, Stefan
2017-06-26
Biological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval. The exact meaning of such annotations in the context of a database record is often ambiguous. We address this problem by grounding implicit and explicit database content in a formal-ontological framework. By using a typical extract from the databases UniProt and Ensembl, annotated with content from GO, PR, ChEBI and NCBI Taxonomy, we created four ontological models (in OWL), which generate explicit, distinct interpretations under the BioTopLite2 (BTL2) upper-level ontology. The first three models interpret database entries as individuals (IND), defined classes (SUBC), and classes with dispositions (DISP), respectively; the fourth model (HYBR) is a combination of SUBC and DISP. For the evaluation of these four models, we consider (i) database content retrieval, using ontologies as query vocabulary; (ii) information completeness; and, (iii) DL complexity and decidability. The models were tested under these criteria against four competency questions (CQs). IND does not raise any ontological claim, besides asserting the existence of sample individuals and relations among them. Modelling patterns have to be created for each type of annotation referent. SUBC is interpreted regarding maximally fine-grained defined subclasses under the classes referred to by the data. DISP attempts to extract truly ontological statements from the database records, claiming the existence of dispositions. HYBR is a hybrid of SUBC and DISP and is more parsimonious regarding expressiveness and query answering complexity. For each of the four models, the four CQs were submitted as DL queries. This shows the ability to retrieve individuals with IND, and classes in SUBC and HYBR. DISP does not retrieve anything because the axioms with disposition are embedded in General Class Inclusion (GCI) statements. Ambiguity of biological database content is addressed by a method that identifies implicit knowledge behind semantic annotations in biological databases and grounds it in an expressive upper-level ontology. The result is a seamless representation of database structure, content and annotations as OWL models.
Autonomous mission planning and scheduling: Innovative, integrated, responsive
NASA Technical Reports Server (NTRS)
Sary, Charisse; Liu, Simon; Hull, Larry; Davis, Randy
1994-01-01
Autonomous mission scheduling, a new concept for NASA ground data systems, is a decentralized and distributed approach to scientific spacecraft planning, scheduling, and command management. Systems and services are provided that enable investigators to operate their own instruments. In autonomous mission scheduling, separate nodes exist for each instrument and one or more operations nodes exist for the spacecraft. Each node is responsible for its own operations which include planning, scheduling, and commanding; and for resolving conflicts with other nodes. One or more database servers accessible to all nodes enable each to share mission and science planning, scheduling, and commanding information. The architecture for autonomous mission scheduling is based upon a realistic mix of state-of-the-art and emerging technology and services, e.g., high performance individual workstations, high speed communications, client-server computing, and relational databases. The concept is particularly suited to the smaller, less complex missions of the future.
Fahy, Michael; Doyle, Orla; Denny, Kevin; McAuliffe, Fionnuala M; Robson, Michael
2013-05-01
Increasing birth rates have raised questions for policy makers and hospital management about the economic costs of childbirth. The purpose of this article is to identify and review all existing scientific studies in relation to the economic costs of alternative modes of childbirth delivery and to highlight deficiencies in the existing scientific research. We searched Cochrane, Centre for Reviews and Dissemination, EconLit, the Excerpta Medica Database, the Health Economic Evaluations Database, MEDLINE and PubMed. Thirty articles are included in this review. The main findings suggest that there is no internationally acceptable childbirth cost and clinical outcome classification system that allows for comparisons across different delivery modes. This review demonstrates that a better understanding and classification of the costs and associated clinical outcomes of childbirth is required to allow for valid comparisons between maternity units, and to inform policy makers and hospital management. © 2013 The Authors Acta Obstetricia et Gynecologica Scandinavica © 2013 Nordic Federation of Societies of Obstetrics and Gynecology.
Teaching Case: Adapting the Access Northwind Database to Support a Database Course
ERIC Educational Resources Information Center
Dyer, John N.; Rogers, Camille
2015-01-01
A common problem encountered when teaching database courses is that few large illustrative databases exist to support teaching and learning. Most database textbooks have small "toy" databases that are chapter objective specific, and thus do not support application over the complete domain of design, implementation and management concepts…
BBN technical memorandum W1291 infrasound model feasibility study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrell, T., BBN Systems and Technologies
1998-05-01
The purpose of this study is to determine the need and level of effort required to add existing atmospheric databases and infrasound propagation models to the DOE`s Hydroacoustic Coverage Assessment Model (HydroCAM) [1,2]. The rationale for the study is that the performance of the infrasound monitoring network will be an important factor for both the International Monitoring System (IMS) and US national monitoring capability. Many of the technical issues affecting the design and performance of the infrasound network are directly related to the variability of the atmosphere and the corresponding uncertainties in infrasound propagation. It is clear that the studymore » of these issues will be enhanced by the availability of software tools for easy manipulation and interfacing of various atmospheric databases and infrasound propagation models. In addition, since there are many similarities between propagation in the oceans and in the atmosphere, it is anticipated that much of the software infrastructure developed for hydroacoustic database manipulation and propagation modeling in HydroCAM will be directly extendible to an infrasound capability. The study approach was to talk to the acknowledged domain experts in the infrasound monitoring area to determine: 1. The major technical issues affecting infrasound monitoring network performance. 2. The need for an atmospheric database/infrasound propagation modeling capability similar to HydroCAM. 3. The state of existing infrasound propagation codes and atmospheric databases. 4. A recommended approach for developing the required capabilities. A list of the people who contributed information to this study is provided in Table 1. We also relied on our knowledge of oceanographic and meteorological data sources to determine the availability of atmospheric databases and the feasibility of incorporating this information into the existing HydroCAM geographic database software. This report presents a summary of the need for an integrated infrasound modeling capability in Section 2.0. Section 3.0 provides a recommended approach for developing this capability in two stages; a basic capability and an extended capability. This section includes a discussion of the available static and dynamic databases, and the various modeling tools which are available or could be developed under such a task. The conclusions and recommendations of the study are provided in Section 4.0.« less
Lindsley, Kristina; Li, Tianjing; Ssemanda, Elizabeth; Virgili, Gianni; Dickersin, Kay
2016-04-01
Are existing systematic reviews of interventions for age-related macular degeneration incorporated into clinical practice guidelines? High-quality systematic reviews should be used to underpin evidence-based clinical practice guidelines and clinical care. We examined the reliability of systematic reviews of interventions for age-related macular degeneration (AMD) and described the main findings of reliable reviews in relation to clinical practice guidelines. Eligible publications were systematic reviews of the effectiveness of treatment interventions for AMD. We searched a database of systematic reviews in eyes and vision without language or date restrictions; the database was up to date as of May 6, 2014. Two authors independently screened records for eligibility and abstracted and assessed the characteristics and methods of each review. We classified reviews as reliable when they reported eligibility criteria, comprehensive searches, methodologic quality of included studies, appropriate statistical methods for meta-analysis, and conclusions based on results. We mapped treatment recommendations from the American Academy of Ophthalmology (AAO) Preferred Practice Patterns (PPPs) for AMD to systematic reviews and citations of reliable systematic reviews to support each treatment recommendation. Of 1570 systematic reviews in our database, 47 met inclusion criteria; most targeted neovascular AMD and investigated anti-vascular endothelial growth factor (VEGF) interventions, dietary supplements, or photodynamic therapy. We classified 33 (70%) reviews as reliable. The quality of reporting varied, with criteria for reliable reporting met more often by Cochrane reviews and reviews whose authors disclosed conflicts of interest. Anti-VEGF agents and photodynamic therapy were the only interventions identified as effective by reliable reviews. Of 35 treatment recommendations extracted from the PPPs, 15 could have been supported with reliable systematic reviews; however, only 1 recommendation cited a reliable intervention systematic review. No reliable systematic review was identified for 20 treatment recommendations, highlighting areas of evidence gaps. For AMD, reliable systematic reviews exist for many treatment recommendations in the AAO PPPs and should be cited to support these recommendations. We also identified areas where no high-level evidence exists. Mapping clinical practice guidelines to existing systematic reviews is one way to highlight areas where evidence generation or evidence synthesis is either available or needed. Copyright © 2016 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Development of an electronic database for Acute Pain Service outcomes
Love, Brandy L; Jensen, Louise A; Schopflocher, Donald; Tsui, Ban CH
2012-01-01
BACKGROUND: Quality assurance is increasingly important in the current health care climate. An electronic database can be used for tracking patient information and as a research tool to provide quality assurance for patient care. OBJECTIVE: An electronic database was developed for the Acute Pain Service, University of Alberta Hospital (Edmonton, Alberta) to record patient characteristics, identify at-risk populations, compare treatment efficacies and guide practice decisions. METHOD: Steps in the database development involved identifying the goals for use, relevant variables to include, and a plan for data collection, entry and analysis. Protocols were also created for data cleaning quality control. The database was evaluated with a pilot test using existing data to assess data collection burden, accuracy and functionality of the database. RESULTS: A literature review resulted in an evidence-based list of demographic, clinical and pain management outcome variables to include. Time to assess patients and collect the data was 20 min to 30 min per patient. Limitations were primarily software related, although initial data collection completion was only 65% and accuracy of data entry was 96%. CONCLUSIONS: The electronic database was found to be relevant and functional for the identified goals of data storage and research. PMID:22518364
48 CFR 5.601 - Governmentwide database of contracts.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 48 Federal Acquisition Regulations System 1 2014-10-01 2014-10-01 false Governmentwide database of... database of contracts. (a) A Governmentwide database of contracts and other procurement instruments.../contractdirectory/.This searchable database is a tool that may be used to identify existing contracts and other...
48 CFR 5.601 - Governmentwide database of contracts.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 48 Federal Acquisition Regulations System 1 2011-10-01 2011-10-01 false Governmentwide database of... database of contracts. (a) A Governmentwide database of contracts and other procurement instruments.../contractdirectory/. This searchable database is a tool that may be used to identify existing contracts and other...
48 CFR 5.601 - Governmentwide database of contracts.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 48 Federal Acquisition Regulations System 1 2012-10-01 2012-10-01 false Governmentwide database of... database of contracts. (a) A Governmentwide database of contracts and other procurement instruments.../contractdirectory/ .This searchable database is a tool that may be used to identify existing contracts and other...
48 CFR 5.601 - Governmentwide database of contracts.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 1 2013-10-01 2013-10-01 false Governmentwide database of... database of contracts. (a) A Governmentwide database of contracts and other procurement instruments.../contractdirectory/.This searchable database is a tool that may be used to identify existing contracts and other...
ERIC Educational Resources Information Center
DeLong, Richard A.
1984-01-01
Unusually hard hit by the 1970s recession, the University of Michigan accumulated more deferred maintenance problems than could be analyzed efficiently either by hand or with existing computer systems. Using an existing microcomputer and a database management software package, the maintenance service developed its own database to support…
Quadcopter Control Using Speech Recognition
NASA Astrophysics Data System (ADS)
Malik, H.; Darma, S.; Soekirno, S.
2018-04-01
This research reported a comparison from a success rate of speech recognition systems that used two types of databases they were existing databases and new databases, that were implemented into quadcopter as motion control. Speech recognition system was using Mel frequency cepstral coefficient method (MFCC) as feature extraction that was trained using recursive neural network method (RNN). MFCC method was one of the feature extraction methods that most used for speech recognition. This method has a success rate of 80% - 95%. Existing database was used to measure the success rate of RNN method. The new database was created using Indonesian language and then the success rate was compared with results from an existing database. Sound input from the microphone was processed on a DSP module with MFCC method to get the characteristic values. Then, the characteristic values were trained using the RNN which result was a command. The command became a control input to the single board computer (SBC) which result was the movement of the quadcopter. On SBC, we used robot operating system (ROS) as the kernel (Operating System).
Methods for structuring scientific knowledge from many areas related to aging research.
Zhavoronkov, Alex; Cantor, Charles R
2011-01-01
Aging and age-related disease represents a substantial quantity of current natural, social and behavioral science research efforts. Presently, no centralized system exists for tracking aging research projects across numerous research disciplines. The multidisciplinary nature of this research complicates the understanding of underlying project categories, the establishment of project relations, and the development of a unified project classification scheme. We have developed a highly visual database, the International Aging Research Portfolio (IARP), available at AgingPortfolio.org to address this issue. The database integrates information on research grants, peer-reviewed publications, and issued patent applications from multiple sources. Additionally, the database uses flexible project classification mechanisms and tools for analyzing project associations and trends. This system enables scientists to search the centralized project database, to classify and categorize aging projects, and to analyze the funding aspects across multiple research disciplines. The IARP is designed to provide improved allocation and prioritization of scarce research funding, to reduce project overlap and improve scientific collaboration thereby accelerating scientific and medical progress in a rapidly growing area of research. Grant applications often precede publications and some grants do not result in publications, thus, this system provides utility to investigate an earlier and broader view on research activity in many research disciplines. This project is a first attempt to provide a centralized database system for research grants and to categorize aging research projects into multiple subcategories utilizing both advanced machine algorithms and a hierarchical environment for scientific collaboration.
Duda, Jeffrey J.; Wieferich, Daniel J.; Bristol, R. Sky; Bellmore, J. Ryan; Hutchison, Vivian B.; Vittum, Katherine M.; Craig, Laura; Warrick, Jonathan A.
2016-08-18
The removal of dams has recently increased over historical levels due to aging infrastructure, changing societal needs, and modern safety standards rendering some dams obsolete. Where possibilities for river restoration, or improved safety, exceed the benefits of retaining a dam, removal is more often being considered as a viable option. Yet, as this is a relatively new development in the history of river management, science is just beginning to guide our understanding of the physical and ecological implications of dam removal. Ultimately, the “lessons learned” from previous scientific studies on the outcomes dam removal could inform future scientific understanding of ecosystem outcomes, as well as aid in decision-making by stakeholders. We created a database visualization tool, the Dam Removal Information Portal (DRIP), to display map-based, interactive information about the scientific studies associated with dam removals. Serving both as a bibliographic source as well as a link to other existing databases like the National Hydrography Dataset, the derived National Dam Removal Science Database serves as the foundation for a Web-based application that synthesizes the existing scientific studies associated with dam removals. Thus, using the DRIP application, users can explore information about completed dam removal projects (for example, their location, height, and date removed), as well as discover sources and details of associated of scientific studies. As such, DRIP is intended to be a dynamic collection of scientific information related to dams that have been removed in the United States and elsewhere. This report describes the architecture and concepts of this “metaknowledge” database and the DRIP visualization tool.
NASA Technical Reports Server (NTRS)
Abiteboul, Serge
1997-01-01
The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in the systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specic interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of data- formats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semi- structured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data.
Planners and decision makers are challenged to consider not only direct market costs, but also ecological externalities. There is an increasing emphasis on ecosystem services in the context of human well-being, and therefore the valuation and accounting of ecosystem services is b...
Is the Class Schedule the Only Difference between Morning and Afternoon Shift Schools in Mexico?
ERIC Educational Resources Information Center
Cardenas Denham, Sergio
2009-01-01
Double-shift schooling has been implemented in Mexico for several decades as a strategy to achieve universal access to basic education. This study provides evidence on the existence of social inequalities related to the implementation of this schooling model. Using quantitative data from several databases including the National Census, the…
Son, K H; Lim, N-K; Lee, J-W; Cho, M-C; Park, H-Y
2015-01-01
Aims To evaluate the effects of gestational diabetes and pre-existing diabetes on maternal morbidity and medical costs, using data from the Korea National Health Insurance Claims Database of the Health Insurance Review and Assessment Service. Methods Delivery cases in 2010, 2011 and 2012 (459 842, 442 225 and 380 431 deliveries) were extracted from the Health Insurance Review and Assessment Service database. The complications and medical costs were compared among the following three pregnancy groups: normal, gestational diabetes and pre-existing diabetes. Results Although, the rates of pre-existing diabetes did not fluctuate (2.5, 2.4 and 2.7%) throughout the study, the rate of gestational diabetes steadily increased (4.6, 6.2 and 8.0%). Furthermore, the rates of pre-existing diabetes and gestational diabetes increased in conjunction with maternal age, pre-existing hypertension and cases of multiple pregnancy. The risk of pregnancy-induced hypertension, urinary tract infections, premature delivery, liver disease and chronic renal disease were greater in the gestational diabetes and pre-existing diabetes groups than in the normal group. The risk of venous thromboembolism, antepartum haemorrhage, shoulder dystocia and placenta disorder were greater in the pre-existing diabetes group, but not the gestational diabetes group, compared with the normal group. The medical costs associated with delivery, the costs during pregnancy and the number of in-hospital days for the subjects in the pre-existing diabetes group were the highest among the three groups. Conclusions The study showed that the rates of pre-existing diabetes and gestational diabetes increased with maternal age at pregnancy and were associated with increases in medical costs and pregnancy-related complications. PMID:25472691
Palaeo-sea-level and palaeo-ice-sheet databases: problems, strategies, and perspectives
NASA Astrophysics Data System (ADS)
Düsterhus, André; Rovere, Alessio; Carlson, Anders E.; Horton, Benjamin P.; Klemann, Volker; Tarasov, Lev; Barlow, Natasha L. M.; Bradwell, Tom; Clark, Jorie; Dutton, Andrea; Gehrels, W. Roland; Hibbert, Fiona D.; Hijma, Marc P.; Khan, Nicole; Kopp, Robert E.; Sivan, Dorit; Törnqvist, Torbjörn E.
2016-04-01
Sea-level and ice-sheet databases have driven numerous advances in understanding the Earth system. We describe the challenges and offer best strategies that can be adopted to build self-consistent and standardised databases of geological and geochemical information used to archive palaeo-sea-levels and palaeo-ice-sheets. There are three phases in the development of a database: (i) measurement, (ii) interpretation, and (iii) database creation. Measurement should include the objective description of the position and age of a sample, description of associated geological features, and quantification of uncertainties. Interpretation of the sample may have a subjective component, but it should always include uncertainties and alternative or contrasting interpretations, with any exclusion of existing interpretations requiring a full justification. During the creation of a database, an approach based on accessibility, transparency, trust, availability, continuity, completeness, and communication of content (ATTAC3) must be adopted. It is essential to consider the community that creates and benefits from a database. We conclude that funding agencies should not only consider the creation of original data in specific research-question-oriented projects, but also include the possibility of using part of the funding for IT-related and database creation tasks, which are essential to guarantee accessibility and maintenance of the collected data.
A comparative study of six European databases of medically oriented Web resources.
Abad García, Francisca; González Teruel, Aurora; Bayo Calduch, Patricia; de Ramón Frias, Rosa; Castillo Blasco, Lourdes
2005-10-01
The paper describes six European medically oriented databases of Web resources, pertaining to five quality-controlled subject gateways, and compares their performance. The characteristics, coverage, procedure for selecting Web resources, record structure, searching possibilities, and existence of user assistance were described for each database. Performance indicators for each database were obtained by means of searches carried out using the key words, "myocardial infarction." Most of the databases originated in the 1990s in an academic or library context and include all types of Web resources of an international nature. Five databases use Medical Subject Headings. The number of fields per record varies between three and nineteen. The language of the search interfaces is mostly English, and some of them allow searches in other languages. In some databases, the search can be extended to Pubmed. Organizing Medical Networked Information, Catalogue et Index des Sites Médicaux Francophones, and Diseases, Disorders and Related Topics produced the best results. The usefulness of these databases as quick reference resources is clear. In addition, their lack of content overlap means that, for the user, they complement each other. Their continued survival faces three challenges: the instability of the Internet, maintenance costs, and lack of use in spite of their potential usefulness.
ARCTOS: a relational database relating specimens, specimen-based science, and archival documentation
Jarrell, Gordon H.; Ramotnik, Cindy A.; McDonald, D.L.
2010-01-01
Data are preserved when they are perpetually discoverable, but even in the Information Age, discovery of legacy data appropriate to particular investigations is uncertain. Secure Internet storage is necessary but insufficient. Data can be discovered only when they are adequately described, and visibility increases markedly if the data are related to other data that are receiving usage. Such relationships can be built within (1) the framework of a relational database, or (1) they can be built among separate resources, within the framework of the Internet. Evolving primarily around biological collections, Arctos is a database that does both of these tasks. It includes data structures for a diversity of specimen attributes, essentially all collection-management tasks, plus literature citations, project descriptions, etc. As a centralized collaboration of several university museums, Arctos is an ideal environment for capitalizing on the many relationships that often exist between items in separate collections. Arctos is related to NIH’s DNA-sequence repository (GenBank) with record-to-record reciprocal linkages, and it serves data to several discipline-specific web portals, including the Global Biodiversity Information Network (GBIF). The University of Alaska Museum’s paleontological collection is Arctos’s recent extension beyond the constraints of neontology. With about 1.3 million cataloged items, additional collections are being added each year.
Therrell, Bradford L
2003-01-01
At birth, patient demographic and health information begin to accumulate in varied databases. There are often multiple sources of the same or similar data. New public health programs are often created without considering data linkages. Recently, newborn hearing screening (NHS) programs and immunization programs have virtually ignored the existence of newborn dried blood spot (DBS) newborn screening databases containing similar demographic data, creating data duplication in their 'new' systems. Some progressive public health departments are developing data warehouses of basic, recurrent patient information, and linking these databases to other health program databases where programs and services can benefit from such linkages. Demographic data warehousing saves time (and money) by eliminating duplicative data entry and reducing the chances of data errors. While newborn screening data are usually the first data available, they should not be the only data source considered for early data linkage or for populating a data warehouse. Birth certificate information should also be considered along with other data sources for infants that may not have received newborn screening or who may have been born outside of the jurisdiction and not have birth certificate information locally available. This newborn screening serial number provides a convenient identification number for use in the DBS program and for linking with other systems. As a minimum, data linkages should exist between newborn dried blood spot screening, newborn hearing screening, immunizations, birth certificates and birth defect registries.
Experiment on building Sundanese lexical database based on WordNet
NASA Astrophysics Data System (ADS)
Dewi Budiwati, Sari; Nurani Setiawan, Novihana
2018-03-01
Sundanese language is the second biggest local language used in Indonesia. Currently, Sundanese language is rarely used since we have the Indonesian language in everyday conversation and as the national language. We built a Sundanese lexical database based on WordNet and Indonesian WordNet as an alternative way to preserve the language as one of local culture. WordNet was chosen because of Sundanese language has three levels of word delivery, called language code of conduct. Web user participant involved in this research for specifying Sundanese semantic relations, and an expert linguistic for validating the relations. The merge methodology was implemented in this experiment. Some words are equivalent with WordNet while another does not have its equivalence since some words are not exist in another culture.
Distributed structure-searchable toxicity (DSSTox) public database network: a proposal.
Richard, Ann M; Williams, ClarLynda R
2002-01-29
The ability to assess the potential genotoxicity, carcinogenicity, or other toxicity of pharmaceutical or industrial chemicals based on chemical structure information is a highly coveted and shared goal of varied academic, commercial, and government regulatory groups. These diverse interests often employ different approaches and have different criteria and use for toxicity assessments, but they share a need for unrestricted access to existing public toxicity data linked with chemical structure information. Currently, there exists no central repository of toxicity information, commercial or public, that adequately meets the data requirements for flexible analogue searching, Structure-Activity Relationship (SAR) model development, or building of chemical relational databases (CRD). The distributed structure-searchable toxicity (DSSTox) public database network is being proposed as a community-supported, web-based effort to address these shared needs of the SAR and toxicology communities. The DSSTox project has the following major elements: (1) to adopt and encourage the use of a common standard file format (structure data file (SDF)) for public toxicity databases that includes chemical structure, text and property information, and that can easily be imported into available CRD applications; (2) to implement a distributed source approach, managed by a DSSTox Central Website, that will enable decentralized, free public access to structure-toxicity data files, and that will effectively link knowledgeable toxicity data sources with potential users of these data from other disciplines (such as chemistry, modeling, and computer science); and (3) to engage public/commercial/academic/industry groups in contributing to and expanding this community-wide, public data sharing and distribution effort. The DSSTox project's overall aims are to effect the closer association of chemical structure information with existing toxicity data, and to promote and facilitate structure-based exploration of these data within a common chemistry-based framework that spans toxicological disciplines.
DISTRIBUTED STRUCTURE-SEARCHABLE TOXICITY ...
The ability to assess the potential genotoxicity, carcinogenicity, or other toxicity of pharmaceutical or industrial chemicals based on chemical structure information is a highly coveted and shared goal of varied academic, commercial, and government regulatory groups. These diverse interests often employ different approaches and have different criteria and use for toxicity assessments, but they share a need for unrestricted access to existing public toxicity data linked with chemical structure information. Currently, there exists no central repository of toxicity information, commercial or public, that adequately meets the data requirements for flexible analogue searching, SAR model development, or building of chemical relational databases (CRD). The Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network is being proposed as a community-supported, web-based effort to address these shared needs of the SAR and toxicology communities. The DSSTox project has the following major elements: 1) to adopt and encourage the use of a common standard file format (SDF) for public toxicity databases that includes chemical structure, text and property information, and that can easily be imported into available CRD applications; 2) to implement a distributed source approach, managed by a DSSTox Central Website, that will enable decentralized, free public access to structure-toxicity data files, and that will effectively link knowledgeable toxicity data s
Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.
Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K
2011-01-01
Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.
A Chronostratigraphic Relational Database Ontology
NASA Astrophysics Data System (ADS)
Platon, E.; Gary, A.; Sikora, P.
2005-12-01
A chronostratigraphic research database was donated by British Petroleum to the Stratigraphy Group at the Energy and Geoscience Institute (EGI), University of Utah. These data consists of over 2,000 measured sections representing over three decades of research into the application of the graphic correlation method. The data are global and includes both microfossil (foraminifera, calcareous nannoplankton, spores, pollen, dinoflagellate cysts, etc) and macrofossil data. The objective of the donation was to make the research data available to the public in order to encourage additional chronostratigraphy studies, specifically regarding graphic correlation. As part of the National Science Foundation's Cyberinfrastructure for the Geosciences (GEON) initiative these data have been made available to the public at http://css.egi.utah.edu. To encourage further research using the graphic correlation method, EGI has developed a software package, StrataPlot that will soon be publicly available from the GEON website as a standalone software download. The EGI chronostratigraphy research database, although relatively large, has many data holes relative to some paleontological disciplines and geographical areas, so the challenge becomes how do we expand the data available for chronostratigrahic studies using graphic correlation. There are several public or soon-to-be public databases available to chronostratigraphic research, but they have their own data structures and modes of presentation. The heterogeneous nature of these database schemas hinders their integration and makes it difficult for the user to retrieve and consolidate potentially valuable chronostratigraphic data. The integration of these data sources would facilitate rapid and comprehensive data searches, thus helping advance studies in chronostratigraphy. The GEON project will host a number of databases within the geology domain, some of which contain biostratigraphic data. Ontologies are being developed to provide an integrated query system for the searching across GEON's biostratigraphy databases, as well as databases available in the public domain. Although creating an ontology directly from the existing database metadata would have been effective and straightforward, our effort was directed towards creating a more efficient representation of our database, as well as a general representation of the biostratigraphic domain.
Promberger, Marianne; Marteau, Theresa M
2013-09-01
To review existing evidence on the potential of incentives to undermine or "crowd out" intrinsic motivation, in order to establish whether and when it predicts financial incentives to crowd out motivation for health-related behaviors. We conducted a conceptual analysis to compare definitions and operationalizations of the effect, and reviewed existing evidence to identify potential moderators of the effect. In the psychological literature, we find strong evidence for an undermining effect of tangible rewards on intrinsic motivation for simple tasks when motivation manifest in behavior is initially high. In the economic literature, evidence for undermining effects exists for a broader variety of behaviors, in settings that involve a conflict of interest between parties. By contrast, for health related behaviors, baseline levels of incentivized behaviors are usually low, and only a subset involve an interpersonal conflict of interest. Correspondingly, we find no evidence for crowding out of incentivized health behaviors. The existing evidence does not warrant a priori predictions that an undermining effect would be found for health-related behaviors. Health-related behaviors and incentives schemes differ greatly in moderating characteristics, which should be the focus of future research. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Harris, Eric S J; Erickson, Sean D; Tolopko, Andrew N; Cao, Shugeng; Craycroft, Jane A; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E; Eisenberg, David M
2011-05-17
Ethnobotanically driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically driven natural product collection and drug-discovery programs. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Harris, Eric S. J.; Erickson, Sean D.; Tolopko, Andrew N.; Cao, Shugeng; Craycroft, Jane A.; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E.; Eisenberg, David M.
2011-01-01
Aim of the study. Ethnobotanically-driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine-Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. Materials and Methods. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. Results. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. Conclusions. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically-driven natural product collection and drug-discovery programs. PMID:21420479
User’s guide and metada for the PICES Nonindigenous Species Information System
Lee,; Reusser, Deborah A.; Marko,; Ranelletti,
2012-01-01
The overall goal of both the database and Atlas was to simplify and standardize the dissemination of distributional, habitat, and life history characteristics of near-coastal and estuarine nonindigenous species. This database provides a means of querying these data and displaying the information in a consistent format. The specific classes of information the database captures include: Regional and global ranges of native and nonindigenous near-coastal and estuarine species at different hierarchical spatial scales. Habitat and physiological requirements of near-coastal and estuarine species. Life history characteristics of near-coastal and estuarine species. Invasion history and vectors for nonindigenous species. This standardized and synthesized data in the database and the Atlas provide the basic information needed to address a number of managerial and scientific needs. Thus, users will be able to: Create a baseline on the extent of invasion by region in order to assess new invasions. Use existing geographical patterns of invasion to gain some insights into potential new invaders. Use existing geographical patters of invasion to gain some insights into mechanisms affecting relative invasibility of different areas. Use life history attributes and environmental requirements of the reported nonindigenous species to evaluate traits of invaders. Understand the potential spread of invaders based on their habitat and environmental requirements. Understand importance of different vectors of introduction of nonindigenous species by region. The data in the Atlas of Nonindigenous Marine and Estuarine Species in the North Pacific (Lee and Reusser, 2012) are up-to-date as of June 2012. Updates to the PICES database were made in September 2012.
CRITICA: coding region identification tool invoking comparative analysis
NASA Technical Reports Server (NTRS)
Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)
1999-01-01
Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).
A novel data storage logic in the cloud
Mátyás, Bence; Szarka, Máté; Járvás, Gábor; Kusper, Gábor; Argay, István; Fialowski, Alice
2016-01-01
Databases which store and manage long-term scientific information related to life science are used to store huge amount of quantitative attributes. Introduction of a new entity attribute requires modification of the existing data tables and the programs that use these data tables. The solution is increasing the virtual data tables while the number of screens remains the same. The main objective of the present study was to introduce a logic called Joker Tao (JT) which provides universal data storage for cloud-based databases. It means all types of input data can be interpreted as an entity and attribute at the same time, in the same data table. PMID:29026521
A novel data storage logic in the cloud.
Mátyás, Bence; Szarka, Máté; Járvás, Gábor; Kusper, Gábor; Argay, István; Fialowski, Alice
2016-01-01
Databases which store and manage long-term scientific information related to life science are used to store huge amount of quantitative attributes. Introduction of a new entity attribute requires modification of the existing data tables and the programs that use these data tables. The solution is increasing the virtual data tables while the number of screens remains the same. The main objective of the present study was to introduce a logic called Joker Tao (JT) which provides universal data storage for cloud-based databases. It means all types of input data can be interpreted as an entity and attribute at the same time, in the same data table.
Interconnecting heterogeneous database management systems
NASA Technical Reports Server (NTRS)
Gligor, V. D.; Luckenbaugh, G. L.
1984-01-01
It is pointed out that there is still a great need for the development of improved communication between remote, heterogeneous database management systems (DBMS). Problems regarding the effective communication between distributed DBMSs are primarily related to significant differences between local data managers, local data models and representations, and local transaction managers. A system of interconnected DBMSs which exhibit such differences is called a network of distributed, heterogeneous DBMSs. In order to achieve effective interconnection of remote, heterogeneous DBMSs, the users must have uniform, integrated access to the different DBMs. The present investigation is mainly concerned with an analysis of the existing approaches to interconnecting heterogeneous DBMSs, taking into account four experimental DBMS projects.
Metabolonote: A Wiki-Based Database for Managing Hierarchical Metadata of Metabolome Analyses
Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu
2015-01-01
Metabolomics – technology for comprehensive detection of small molecules in an organism – lags behind the other “omics” in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called “Togo Metabolome Data” (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers’ understanding and use of data but also submitters’ motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/. PMID:25905099
Metabolonote: a wiki-based database for managing hierarchical metadata of metabolome analyses.
Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu
2015-01-01
Metabolomics - technology for comprehensive detection of small molecules in an organism - lags behind the other "omics" in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called "Togo Metabolome Data" (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers' understanding and use of data but also submitters' motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/.
PMAG: Relational Database Definition
NASA Astrophysics Data System (ADS)
Keizer, P.; Koppers, A.; Tauxe, L.; Constable, C.; Genevey, A.; Staudigel, H.; Helly, J.
2002-12-01
The Scripps center for Physical and Chemical Earth References (PACER) was established to help create databases for reference data and make them available to the Earth science community. As part of these efforts PACER supports GERM, REM and PMAG and maintains multiple online databases under the http://earthref.org umbrella website. This website has been built on top of a relational database that allows for the archiving and electronic access to a great variety of data types and formats, permitting data queries using a wide range of metadata. These online databases are designed in Oracle 8.1.5 and they are maintained at the San Diego Supercomputer Center. They are directly available via http://earthref.org/databases/. A prototype of the PMAG relational database is now operational within the existing EarthRef.org framework under http://earthref.org/databases/PMAG/. As will be shown in our presentation, the PMAG design focuses around the general workflow that results in the determination of typical paleo-magnetic analyses. This ensures that individual data points can be traced between the actual analysis and the specimen, sample, site, locality and expedition it belongs to. These relations guarantee traceability of the data by distinguishing between original and derived data, where the actual (raw) measurements are performed on the specimen level, and data on the sample level and higher are then derived products in the database. These relations may also serve to recalculate site means when new data becomes available for that locality. The PMAG data records are extensively described in terms of metadata. These metadata are used when scientists search through this online database in order to view and download their needed data. They minimally include method descriptions for field sampling, laboratory techniques and statistical analyses. They also include selection criteria used during the interpretation of the data and, most importantly, critical information about the site location (latitude, longitude, elevation), geography (continent, country, region), geological setting (lithospheric plate or block, tectonic setting), geological age (age range, timescale name, stratigraphic position) and materials (rock type, classification, alteration state). Each data point and method description is also related to its peer-reviewed reference [citation ID] as archived in the EarthRef Reference Database (ERR). This guarantees direct traceability all the way to its original source, where the user can find the bibliography of each PMAG reference along with every abstract, data table, technical note and/or appendix that are available in digital form and that can be downloaded as PDF/JPEG images and Microsoft Excel/Word data files. This may help scientists and teachers in performing their research since they have easy access to all the scientific data. It also allows for checking potential errors during the digitization process. Please visit the PMAG website at http://earthref.org/PMAG/ for more information.
NASA Astrophysics Data System (ADS)
Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee
2010-04-01
The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.
Bare, Jane; Gloria, Thomas; Norris, Gregory
2006-08-15
Normalization is an optional step within Life Cycle Impact Assessment (LCIA) that may be used to assist in the interpretation of life cycle inventory data as well as life cycle impact assessment results. Normalization transforms the magnitude of LCI and LCIA results into relative contribution by substance and life cycle impact category. Normalization thus can significantly influence LCA-based decisions when tradeoffs exist. The U. S. Environmental Protection Agency (EPA) has developed a normalization database based on the spatial scale of the 48 continental U.S. states, Hawaii, Alaska, the District of Columbia, and Puerto Rico with a one-year reference time frame. Data within the normalization database were compiled based on the impact methodologies and lists of stressors used in TRACI-the EPA's Tool for the Reduction and Assessment of Chemical and other environmental Impacts. The new normalization database published within this article may be used for LCIA case studies within the United States, and can be used to assist in the further development of a global normalization database. The underlying data analyzed for the development of this database are included to allow the development of normalization data consistent with other impact assessment methodologies as well.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Tengfang; Piette, Mary Ann
2004-08-05
The original scope of work was to obtain and analyze existing and emerging data in four states: California, Florida, New York, and Wisconsin. The goal of this data collection was to deliver a baseline database or recommendations for such a database that could possibly contain window and daylighting features and energy performance characteristics of Kindergarten through 12th grade (K-12) school buildings (or those of classrooms when available). In particular, data analyses were performed based upon the California Commercial End-Use Survey (CEUS) databases to understand school energy use, features of window glazing, and availability of daylighting in California K-12 schools. Themore » outcomes from this baseline task can be used to assist in establishing a database of school energy performance, assessing applications of existing technologies relevant to window and daylighting design, and identifying future R&D needs. These are in line with the overall project goals as outlined in the proposal. Through the review and analysis of this data, it is clear that there are many compounding factors impacting energy use in K-12 school buildings in the U.S., and that there are various challenges in understanding the impact of K-12 classroom energy use associated with design features of window glazing and skylight. First, the energy data in the existing CEUS databases has, at most, provided the aggregated electricity and/or gas usages for the building establishments that include other school facilities on top of the classroom spaces. Although the percentage of classroom floor area in schools is often available from the databases, there is no additional information that can be used to quantitatively segregate the EUI for classroom spaces. In order to quantify the EUI for classrooms, sub-metering of energy usage by classrooms must be obtained. Second, magnitudes of energy use for electricity lighting are not attainable from the existing databases, nor are the lighting levels contributed by artificial lighting or daylight. It is impossible to reasonably estimate the lighting energy consumption for classroom areas in the sample of schools studied in this project. Third, there are many other compounding factors that may as well influence the overall classroom energy use, e.g., ventilation, insulation, system efficiency, occupancy, control, schedules, and weather. Fourth, although we have examined the school EUI grouped by various factors such as climate zones, window and daylighting design features from the California databases, no statistically significant associations can be identified from the sampled California K-12 schools in the current California CEUS. There are opportunities to expand such analyses by developing and including more powerful CEUS databases in the future. Finally, a list of parameters is recommended for future database development and for use of future investigation in K-12 classroom energy use, window and skylight design, and possible relations between them. Some of the key parameters include: (1) Energy end use data for lighting systems, classrooms, and schools; (2) Building design and operation including features for windows and daylighting; and (3) Other key parameters and information that would be available to investigate overall energy uses, building and systems design, their operation, and services provided.« less
ERIC Educational Resources Information Center
Blackwell, Michael Lind
This study evaluates the "Education Resources Information Center" (ERIC), "Library and Information Science Abstracts" (LISA), and "Library Literature" (LL) databases, determining how long the databases take to enter records (indexing delay), how much duplication of effort exists among the three databases (indexing…
Soranno, Patricia A; Bissell, Edward G; Cheruvelil, Kendra S; Christel, Samuel T; Collins, Sarah M; Fergus, C Emi; Filstrup, Christopher T; Lapierre, Jean-Francois; Lottig, Noah R; Oliver, Samantha K; Scott, Caren E; Smith, Nicole J; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A; Gries, Corinna; Henry, Emily N; Skaff, Nick K; Stanley, Emily H; Stow, Craig A; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E
2015-01-01
Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km(2)). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.
Soranno, Patricia A.; Bissell, E.G.; Cheruvelil, Kendra S.; Christel, Samuel T.; Collins, Sarah M.; Fergus, C. Emi; Filstrup, Christopher T.; Lapierre, Jean-Francois; Lotting, Noah R.; Oliver, Samantha K.; Scott, Caren E.; Smith, Nicole J.; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A.; Gries, Corinna; Henry, Emily N.; Skaff, Nick K.; Stanley, Emily H.; Stow, Craig A.; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E.
2015-01-01
Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km2). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.
Multi-viewpoint Coronal Mass Ejection Catalog Based on STEREO COR2 Observations
NASA Astrophysics Data System (ADS)
Vourlidas, Angelos; Balmaceda, Laura A.; Stenborg, Guillermo; Dal Lago, Alisson
2017-04-01
We present the first multi-viewpoint coronal mass ejection (CME) catalog. The events are identified visually in simultaneous total brightness observations from the twin SECCHI/COR2 coronagraphs on board the Solar Terrestrial Relations Observatory mission. The Multi-View CME Catalog differs from past catalogs in three key aspects: (1) all events between the two viewpoints are cross-linked, (2) each event is assigned a physics-motivated morphological classification (e.g., jet, wave, and flux rope), and (3) kinematic and geometric information is extracted semi-automatically via a supervised image segmentation algorithm. The database extends from the beginning of the COR2 synoptic program (2007 March) to the end of dual-viewpoint observations (2014 September). It contains 4473 unique events with 3358 events identified in both COR2s. Kinematic properties exist currently for 1747 events (26% of COR2-A events and 17% of COR2-B events). We examine several issues, made possible by this cross-linked CME database, including the role of projection on the perceived morphology of events, the missing CME rate, the existence of cool material in CMEs, the solar cycle dependence on CME rate, speeds and width, and the existence of flux rope within CMEs. We discuss the implications for past single-viewpoint studies and for Space Weather research. The database is publicly available on the web including all available measurements. We hope that it will become a useful resource for the community.
Map-Based Querying for Multimedia Database
2014-09-01
existing assets in a custom multimedia database based on an area of interest. It also describes the augmentation of an Android Tactical Assault Kit (ATAK......for Multimedia Database Somiya Metu Computational and Information Sciences Directorate, ARL
StreptomycesInforSys: A web-enabled information repository
Jain, Chakresh Kumar; Gupta, Vidhi; Gupta, Ashvarya; Gupta, Sanjay; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Sarethy, Indira P
2012-01-01
Members of Streptomyces produce 70% of natural bioactive products. There is considerable amount of information available based on polyphasic approach for classification of Streptomyces. However, this information based on phenotypic, genotypic and bioactive component production profiles is crucial for pharmacological screening programmes. This is scattered across various journals, books and other resources, many of which are not freely accessible. The designed database incorporates polyphasic typing information using combinations of search options to aid in efficient screening of new isolates. This will help in the preliminary categorization of appropriate groups. It is a free relational database compatible with existing operating systems. A cross platform technology with XAMPP Web server has been used to develop, manage, and facilitate the user query effectively with database support. Employment of PHP, a platform-independent scripting language, embedded in HTML and the database management software MySQL will facilitate dynamic information storage and retrieval. The user-friendly, open and flexible freeware (PHP, MySQL and Apache) is foreseen to reduce running and maintenance cost. Availability www.sis.biowaves.org PMID:23275736
StreptomycesInforSys: A web-enabled information repository.
Jain, Chakresh Kumar; Gupta, Vidhi; Gupta, Ashvarya; Gupta, Sanjay; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Sarethy, Indira P
2012-01-01
Members of Streptomyces produce 70% of natural bioactive products. There is considerable amount of information available based on polyphasic approach for classification of Streptomyces. However, this information based on phenotypic, genotypic and bioactive component production profiles is crucial for pharmacological screening programmes. This is scattered across various journals, books and other resources, many of which are not freely accessible. The designed database incorporates polyphasic typing information using combinations of search options to aid in efficient screening of new isolates. This will help in the preliminary categorization of appropriate groups. It is a free relational database compatible with existing operating systems. A cross platform technology with XAMPP Web server has been used to develop, manage, and facilitate the user query effectively with database support. Employment of PHP, a platform-independent scripting language, embedded in HTML and the database management software MySQL will facilitate dynamic information storage and retrieval. The user-friendly, open and flexible freeware (PHP, MySQL and Apache) is foreseen to reduce running and maintenance cost. www.sis.biowaves.org.
Development of a video tampering dataset for forensic investigation.
Ismael Al-Sanjary, Omar; Ahmed, Ahmed Abdullah; Sulong, Ghazali
2016-09-01
Forgery is an act of modifying a document, product, image or video, among other media. Video tampering detection research requires an inclusive database of video modification. This paper aims to discuss a comprehensive proposal to create a dataset composed of modified videos for forensic investigation, in order to standardize existing techniques for detecting video tampering. The primary purpose of developing and designing this new video library is for usage in video forensics, which can be consciously associated with reliable verification using dynamic and static camera recognition. To the best of the author's knowledge, there exists no similar library among the research community. Videos were sourced from YouTube and by exploring social networking sites extensively by observing posted videos and rating their feedback. The video tampering dataset (VTD) comprises a total of 33 videos, divided among three categories in video tampering: (1) copy-move, (2) splicing, and (3) swapping-frames. Compared to existing datasets, this is a higher number of tampered videos, and with longer durations. The duration of every video is 16s, with a 1280×720 resolution, and a frame rate of 30 frames per second. Moreover, all videos possess the same formatting quality (720p(HD).avi). Both temporal and spatial video features were considered carefully during selection of the videos, and there exists complete information related to the doctored regions in every modified video in the VTD dataset. This database has been made publically available for research on splicing, Swapping frames, and copy-move tampering, and, as such, various video tampering detection issues with ground truth. The database has been utilised by many international researchers and groups of researchers. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Analysis of commercial and public bioactivity databases.
Tiikkainen, Pekka; Franke, Lutz
2012-02-27
Activity data for small molecules are invaluable in chemoinformatics. Various bioactivity databases exist containing detailed information of target proteins and quantitative binding data for small molecules extracted from journals and patents. In the current work, we have merged several public and commercial bioactivity databases into one bioactivity metabase. The molecular presentation, target information, and activity data of the vendor databases were standardized. The main motivation of the work was to create a single relational database which allows fast and simple data retrieval by in-house scientists. Second, we wanted to know the amount of overlap between databases by commercial and public vendors to see whether the former contain data complementing the latter. Third, we quantified the degree of inconsistency between data sources by comparing data points derived from the same scientific article cited by more than one vendor. We found that each data source contains unique data which is due to different scientific articles cited by the vendors. When comparing data derived from the same article we found that inconsistencies between the vendors are common. In conclusion, using databases of different vendors is still useful since the data overlap is not complete. It should be noted that this can be partially explained by the inconsistencies and errors in the source data.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites.
Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo
2014-01-01
Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites
Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W.; Gordân, Raluca; Rohs, Remo
2014-01-01
Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein–DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone. PMID:24214955
Konias, Sokratis; Chouvarda, Ioanna; Vlahavas, Ioannis; Maglaveras, Nicos
2005-09-01
Current approaches for mining association rules usually assume that the mining is performed in a static database, where the problem of missing attribute values does not practically exist. However, these assumptions are not preserved in some medical databases, like in a home care system. In this paper, a novel uncertainty rule algorithm is illustrated, namely URG-2 (Uncertainty Rule Generator), which addresses the problem of mining dynamic databases containing missing values. This algorithm requires only one pass from the initial dataset in order to generate the item set, while new metrics corresponding to the notion of Support and Confidence are used. URG-2 was evaluated over two medical databases, introducing randomly multiple missing values for each record's attribute (rate: 5-20% by 5% increments) in the initial dataset. Compared with the classical approach (records with missing values are ignored), the proposed algorithm was more robust in mining rules from datasets containing missing values. In all cases, the difference in preserving the initial rules ranged between 30% and 60% in favour of URG-2. Moreover, due to its incremental nature, URG-2 saved over 90% of the time required for thorough re-mining. Thus, the proposed algorithm can offer a preferable solution for mining in dynamic relational databases.
Mather, M. E.; Parrish, D.L.; Dettmers, J.M.
2008-01-01
In the last 25 years, the number and scope of fish-related journals have changed. New and existing journals are increasingly specialized. Journals that are read and cited are changing because of differential accessibility via electronic databases. In this review, we examine shifts in numbers and foci of existing fish-related journals. We ask how these fish-related metrics differ across type of application, ecological system, taxa, and discipline. Although many journals overlap to some extent in content, there are distinct groups of journals for authors to consider. By systematically reviewing the focus of an individual manuscript, comparing it to the suite of journals available and examining the audience for the manuscript, we believe that authors can make informed decisions about which journals are most suitable for their work. Our goal here is to help authors find relevant journals and deliver scientific publications to the appropriate readership.
NASA Astrophysics Data System (ADS)
Wajszczyk, Bronisław; Biernacki, Konrad
2018-04-01
The increase of interoperability of radio electronic systems used in the Armed Forces requires the processing of very large amounts of data. Requirements for the integration of information from many systems and sensors, including radar recognition, electronic and optical recognition, force to look for more efficient methods to support information retrieval in even-larger database resources. This paper presents the results of research on methods of improving the efficiency of databases using various types of indexes. The data structure indexing technique is a solution used in RDBMS systems (relational database management system). However, the analysis of the performance of indices, the description of potential applications, and in particular the presentation of a specific scale of performance growth for individual indices are limited to few studies in this field. This paper contains analysis of methods affecting the work efficiency of a relational database management system. As a result of the research, a significant increase in the efficiency of operations on data was achieved through the strategy of indexing data structures. The presentation of the research topic discussed in this paper mainly consists of testing the operation of various indexes against the background of different queries and data structures. The conclusions from the conducted experiments allow to assess the effectiveness of the solutions proposed and applied in the research. The results of the research indicate the existence of a real increase in the performance of operations on data using indexation of data structures. In addition, the level of this growth is presented, broken down by index types.
Building Inventory Database on the Urban Scale Using GIS for Earthquake Risk Assessment
NASA Astrophysics Data System (ADS)
Kaplan, O.; Avdan, U.; Guney, Y.; Helvaci, C.
2016-12-01
The majority of the existing buildings are not safe against earthquakes in most of the developing countries. Before a devastating earthquake, existing buildings need to be assessed and the vulnerable ones must be determined. Determining the seismic performance of existing buildings which is usually made with collecting the attributes of existing buildings, making the analysis and the necessary queries, and producing the result maps is very hard and complicated procedure that can be simplified with Geographic Information System (GIS). The aim of this study is to produce a building inventory database using GIS for assessing the earthquake risk of existing buildings. In this paper, a building inventory database for 310 buildings, located in Eskisehir, Turkey, was produced in order to assess the earthquake risk of the buildings. The results from this study show that 26% of the buildings have high earthquake risk, 33% of the buildings have medium earthquake risk and the 41% of the buildings have low earthquake risk. The produced building inventory database can be very useful especially for governments in dealing with the problem of determining seismically vulnerable buildings in the large existing building stocks. With the help of this kind of methods, determination of the buildings, which may collapse and cause life and property loss during a possible future earthquake, will be very quick, cheap and reliable.
Development of a database for Louisiana highway bridge scour data : technical summary.
DOT National Transportation Integrated Search
1999-10-01
The objectives of the project included: 1) developed a database with manipulation capabilities such as data retrieval, visualization, and update; 2) Input the existing scour data from DOTD files into the database.
Over 20 years of reaction access systems from MDL: a novel reaction substructure search algorithm.
Chen, Lingran; Nourse, James G; Christie, Bradley D; Leland, Burton A; Grier, David L
2002-01-01
From REACCS, to MDL ISIS/Host Reaction Gateway, and most recently to MDL Relational Chemistry Server, a new product based on Oracle data cartridge technology, MDL's reaction database management and retrieval systems have undergone great changes. The evolution of the system architecture is briefly discussed. The evolution of MDL reaction substructure search (RSS) algorithms is detailed. This article mainly describes a novel RSS algorithm. This algorithm is based on a depth-first search approach and is able to fully and prospectively use reaction specific information, such as reacting center and atom-atom mapping (AAM) information. The new algorithm has been used in the recently released MDL Relational Chemistry Server and allows the user to precisely find reaction instances in databases while minimizing unrelated hits. Finally, the existing and new RSS algorithms are compared with several examples.
United States Army Medical Materiel Development Activity: 1997 Annual Report.
1997-01-01
business planning and execution information management system (Project Management Division Database ( PMDD ) and Product Management Database System (PMDS...MANAGEMENT • Project Management Division Database ( PMDD ), Product Management Database System (PMDS), and Special Users Database System:The existing...System (FMS), were investigated. New Product Managers and Project Managers were added into PMDS and PMDD . A separate division, Support, was
Wickham, James; Riitters, Kurt; Vogt, Peter; Costanza, Jennifer; Neale, Anne
2017-11-01
Landscape context is an important factor in restoration ecology, but the use of landscape context for site prioritization has not been as fully developed. We used morphological image processing to identify candidate ecological restoration areas based on their proximity to existing natural vegetation. We identified 1,102,720 candidate ecological restoration areas across the continental United States. Candidate ecological restoration areas were concentrated in the Great Plains and eastern United States. We populated the database of candidate ecological restoration areas with 17 attributes related to site content and context, including factors such as soil fertility and roads (site content), and number and area of potentially conjoined vegetated regions (site context) to facilitate its use for site prioritization. We demonstrate the utility of the database in the state of North Carolina, U.S.A. for a restoration objective related to restoration of water quality (mandated by the U.S. Clean Water Act), wetlands, and forest. The database will be made publicly available on the U.S. Environmental Protection Agency's EnviroAtlas website (http://enviroatlas.epa.gov) for stakeholders interested in ecological restoration.
Wickham, James; Riitters, Kurt; Vogt, Peter; Costanza, Jennifer; Neale, Anne
2018-01-01
Landscape context is an important factor in restoration ecology, but the use of landscape context for site prioritization has not been as fully developed. We used morphological image processing to identify candidate ecological restoration areas based on their proximity to existing natural vegetation. We identified 1,102,720 candidate ecological restoration areas across the continental United States. Candidate ecological restoration areas were concentrated in the Great Plains and eastern United States. We populated the database of candidate ecological restoration areas with 17 attributes related to site content and context, including factors such as soil fertility and roads (site content), and number and area of potentially conjoined vegetated regions (site context) to facilitate its use for site prioritization. We demonstrate the utility of the database in the state of North Carolina, U.S.A. for a restoration objective related to restoration of water quality (mandated by the U.S. Clean Water Act), wetlands, and forest. The database will be made publicly available on the U.S. Environmental Protection Agency's EnviroAtlas website (http://enviroatlas.epa.gov) for stakeholders interested in ecological restoration. PMID:29683130
Leveraging Semantic Knowledge in IRB Databases to Improve Translation Science
Hurdle, John F.; Botkin, Jeffery; Rindflesch, Thomas C.
2007-01-01
We introduce the notion that research administrative databases (RADs), such as those increasingly used to manage information flow in the Institutional Review Board (IRB), offer a novel, useful, and mine-able data source overlooked by informaticists. As a proof of concept, using an IRB database we extracted all titles and abstracts from system startup through January 2007 (n=1,876); formatted these in a pseudo-MEDLINE format; and processed them through the SemRep semantic knowledge extraction system. Even though SemRep is tuned to find semantic relations in MEDLINE citations, we found that it performed comparably well on the IRB texts. When adjusted to eliminate non-healthcare IRB submissions (e.g., economic and education studies), SemRep extracted an average of 7.3 semantic relations per IRB abstract (compared to an average of 11.1 for MEDLINE citations) with a precision of 70% (compared to 78% for MEDLINE). We conclude that RADs, as represented by IRB data, are mine-able with existing tools, but that performance will improve as these tools are tuned for RAD structures. PMID:18693856
Covariance analysis for evaluating head trackers
NASA Astrophysics Data System (ADS)
Kang, Donghoon
2017-10-01
Existing methods for evaluating the performance of head trackers usually rely on publicly available face databases, which contain facial images and the ground truths of their corresponding head orientations. However, most of the existing publicly available face databases are constructed by assuming that a frontal head orientation can be determined by compelling the person under examination to look straight ahead at the camera on the first video frame. Since nobody can accurately direct one's head toward the camera, this assumption may be unrealistic. Rather than obtaining estimation errors, we present a method for computing the covariance of estimation error rotations to evaluate the reliability of head trackers. As an uncertainty measure of estimators, the Schatten 2-norm of a square root of error covariance (or the algebraic average of relative error angles) can be used. The merit of the proposed method is that it does not disturb the person under examination by asking him to direct his head toward certain directions. Experimental results using real data validate the usefulness of our method.
Data Model and Relational Database Design for Highway Runoff Water-Quality Metadata
Granato, Gregory E.; Tessler, Steven
2001-01-01
A National highway and urban runoff waterquality metadatabase was developed by the U.S. Geological Survey in cooperation with the Federal Highway Administration as part of the National Highway Runoff Water-Quality Data and Methodology Synthesis (NDAMS). The database was designed to catalog available literature and to document results of the synthesis in a format that would facilitate current and future research on highway and urban runoff. This report documents the design and implementation of the NDAMS relational database, which was designed to provide a catalog of available information and the results of an assessment of the available data. All the citations and the metadata collected during the review process are presented in a stratified metadatabase that contains citations for relevant publications, abstracts (or previa), and reportreview metadata for a sample of selected reports that document results of runoff quality investigations. The database is referred to as a metadatabase because it contains information about available data sets rather than a record of the original data. The database contains the metadata needed to evaluate and characterize how valid, current, complete, comparable, and technically defensible published and available information may be when evaluated for application to the different dataquality objectives as defined by decision makers. This database is a relational database, in that all information is ultimately linked to a given citation in the catalog of available reports. The main database file contains 86 tables consisting of 29 data tables, 11 association tables, and 46 domain tables. The data tables all link to a particular citation, and each data table is focused on one aspect of the information collected in the literature search and the evaluation of available information. This database is implemented in the Microsoft (MS) Access database software because it is widely used within and outside of government and is familiar to many existing and potential customers. The stratified metadatabase design for the NDAMS program is presented in the MS Access file DBDESIGN.mdb and documented with a data dictionary in the NDAMS_DD.mdb file recorded on the CD-ROM. The data dictionary file includes complete documentation of the table names, table descriptions, and information about each of the 419 fields in the database.
Lindsley, Kristina; Li, Tianjing; Ssemanda, Elizabeth; Virgili, Gianni; Dickersin, Kay
2016-01-01
Topic Are existing systematic reviews of interventions for age-related macular degeneration incorporated into clinical practice guidelines? Clinical relevance High-quality systematic reviews should be used to underpin evidence-based clinical practice guidelines and clinical care. We have examined the reliability of systematic reviews of interventions for age-related macular degeneration (AMD) and described the main findings of reliable reviews in relation to clinical practice guidelines. Methods Eligible publications are systematic reviews of the effectiveness of treatment interventions for AMD. We searched a database of systematic reviews in eyes and vision and employed no language or date restrictions; the database is up-to-date as of May 6, 2014. Two authors independently screened records for eligibility and abstracted and assessed the characteristics and methods of each review. We classified reviews as “reliable” when they reported eligibility criteria, comprehensive searches, appraisal of methodological quality of included studies, appropriate statistical methods for meta-analysis, and conclusions based on results. We mapped treatment recommendations from the American Academy of Ophthalmology Preferred Practice Patterns (AAO PPP) for AMD to the identified systematic reviews and assessed whether any reliable systematic review was cited or could have been cited to support each treatment recommendation. Results Of 1,570 systematic reviews in our database, 47 met our inclusion criteria. Most of the systematic reviews targeted neovascular AMD and investigated anti-vascular endothelial growth factor (anti-VEGF) interventions, dietary supplements or photodynamic therapy. We classified over two-thirds (33/47) of the reports as reliable. The quality of reporting varied, with criteria for reliable reporting met more often for Cochrane reviews and for reviews whose authors disclosed conflicts of interest. Although most systematic reviews were reliable, anti-VEGF agents and photodynamic therapy were the only interventions identified as effective by reliable reviews. Of 35 treatment recommendations extracted from the AAO PPP, 15 could have been supported with reliable systematic reviews; however, only one recommendation had an accompanying intervention systematic review citation, which we assessed as a reliable systematic review. No reliable systematic review was identified for 20 treatment recommendations, highlighting areas of evidence gaps. Conclusions For AMD, reliable systematic reviews exist for many treatment recommendations in the AAO PPP and should be used to support these recommendations. We also identified areas where no high-level evidence exists. Mapping clinical practice guidelines to existing systematic reviews is one way to highlight areas where evidence generation or evidence synthesis is either available or needed. PMID:26804762
Combinational Reasoning of Quantitative Fuzzy Topological Relations for Simple Fuzzy Regions
Liu, Bo; Li, Dajun; Xia, Yuanping; Ruan, Jian; Xu, Lili; Wu, Huanyi
2015-01-01
In recent years, formalization and reasoning of topological relations have become a hot topic as a means to generate knowledge about the relations between spatial objects at the conceptual and geometrical levels. These mechanisms have been widely used in spatial data query, spatial data mining, evaluation of equivalence and similarity in a spatial scene, as well as for consistency assessment of the topological relations of multi-resolution spatial databases. The concept of computational fuzzy topological space is applied to simple fuzzy regions to efficiently and more accurately solve fuzzy topological relations. Thus, extending the existing research and improving upon the previous work, this paper presents a new method to describe fuzzy topological relations between simple spatial regions in Geographic Information Sciences (GIS) and Artificial Intelligence (AI). Firstly, we propose a new definition for simple fuzzy line segments and simple fuzzy regions based on the computational fuzzy topology. And then, based on the new definitions, we also propose a new combinational reasoning method to compute the topological relations between simple fuzzy regions, moreover, this study has discovered that there are (1) 23 different topological relations between a simple crisp region and a simple fuzzy region; (2) 152 different topological relations between two simple fuzzy regions. In the end, we have discussed some examples to demonstrate the validity of the new method, through comparisons with existing fuzzy models, we showed that the proposed method can compute more than the existing models, as it is more expressive than the existing fuzzy models. PMID:25775452
Use of Graph Database for the Integration of Heterogeneous Biological Data.
Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young
2017-03-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Use of Graph Database for the Integration of Heterogeneous Biological Data
Yoon, Byoung-Ha; Kim, Seon-Kyu
2017-01-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946
ERIC Educational Resources Information Center
Bell, Steven J.
2003-01-01
Discusses full-text databases and whether existing aggregator databases are meeting user needs. Topics include the need for better search interfaces; concepts of quality research and information retrieval; information overload; full text in electronic journal collections versus aggregator databases; underrepresentation of certain disciplines; and…
Spatio-Temporal Data Model for Integrating Evolving Nation-Level Datasets
NASA Astrophysics Data System (ADS)
Sorokine, A.; Stewart, R. N.
2017-10-01
Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc.) and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets). Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.
NASA Astrophysics Data System (ADS)
Reyes, J. C.; Vernon, F. L.; Newman, R. L.; Steidl, J. H.
2010-12-01
The Waveform Server is an interactive web-based interface to multi-station, multi-sensor and multi-channel high-density time-series data stored in Center for Seismic Studies (CSS) 3.0 schema relational databases (Newman et al., 2009). In the last twelve months, based on expanded specifications and current user feedback, both the server-side infrastructure and client-side interface have been extensively rewritten. The Python Twisted server-side code-base has been fundamentally modified to now present waveform data stored in cluster-based databases using a multi-threaded architecture, in addition to supporting the pre-existing single database model. This allows interactive web-based access to high-density (broadband @ 40Hz to strong motion @ 200Hz) waveform data that can span multiple years; the common lifetime of broadband seismic networks. The client-side interface expands on it's use of simple JSON-based AJAX queries to now incorporate a variety of User Interface (UI) improvements including standardized calendars for defining time ranges, applying on-the-fly data calibration to display SI-unit data, and increased rendering speed. This presentation will outline the various cyber infrastructure challenges we have faced while developing this application, the use-cases currently in existence, and the limitations of web-based application development.
Human Chromosome Y and Haplogroups; introducing YDHS Database.
Tiirikka, Timo; Moilanen, Jukka S
2015-12-01
As the high throughput sequencing efforts generate more biological information, scientists from different disciplines are interpreting the polymorphisms that make us unique. In addition, there is an increasing trend in general public to research their own genealogy, find distant relatives and to know more about their biological background. Commercial vendors are providing analyses of mitochondrial and Y-chromosomal markers for such purposes. Clearly, an easy-to-use free interface to the existing data on the identified variants would be in the interest of general public and professionals less familiar with the field. Here we introduce a novel metadatabase YDHS that aims to provide such an interface for Y-chromosomal DNA (Y-DNA) haplogroups and sequence variants. The database uses ISOGG Y-DNA tree as the source of mutations and haplogroups and by using genomic positions of the mutations the database links them to genes and other biological entities. YDHS contains analysis tools for deeper Y-SNP analysis. YDHS addresses the shortage of Y-DNA related databases. We have tested our database using a set of different cases from literature ranging from infertility to autism. The database is at http://www.semanticgen.net/ydhs Y-chromosomal DNA (Y-DNA) haplogroups and sequence variants have not been in the scientific limelight, excluding certain specialized fields like forensics, mainly because there is not much freely available information or it is scattered in different sources. However, as we have demonstrated Y-SNPs do play a role in various cases on the haplogroup level and it is possible to create a free Y-DNA dedicated bioinformatics resource.
Meta-analysis of the relative sensitivity of semi-natural vegetation species to ozone.
Hayes, F; Jones, M L M; Mills, G; Ashmore, M
2007-04-01
This study identified 83 species from existing publications suitable for inclusion in a database of sensitivity of species to ozone (OZOVEG database). An index, the relative sensitivity to ozone, was calculated for each species based on changes in biomass in order to test for species traits associated with ozone sensitivity. Meta-analysis of the ozone sensitivity data showed a wide inter-specific range in response to ozone. Some relationships in comparison to plant physiological and ecological characteristics were identified. Plants of the therophyte lifeform were particularly sensitive to ozone. Species with higher mature leaf N concentration were more sensitive to ozone than those with lower leaf N concentration. Some relationships between relative sensitivity to ozone and Ellenberg habitat requirements were also identified. In contrast, no relationships between relative sensitivity to ozone and mature leaf P concentration, Grime's CSR strategy, leaf longevity, flowering season, stomatal density and maximum altitude were found. The relative sensitivity of species and relationships with plant characteristics identified in this study could be used to predict sensitivity to ozone of untested species and communities.
Kang, Hong; Wang, Frank; Zhou, Sicheng; Miao, Qi; Gong, Yang
2017-01-01
Health information technology (HIT) events, a subtype of patient safety events, pose a major threat and barrier toward a safer healthcare system. It is crucial to gain a better understanding of the nature of the errors and adverse events caused by current HIT systems. The scarcity of HIT event-exclusive databases and event reporting systems indicates the challenge of identifying the HIT events from existing resources. FDA Manufacturer and User Facility Device Experience (MAUDE) database is a potential resource for HIT events. However, the low proportion and the rapid evolvement of HIT-related events present challenges for distinguishing them from other equipment failures and hazards. We proposed a strategy to identify and synchronize HIT events from MAUDE by using a filter based on structured features and classifiers based on unstructured features. The strategy will help us develop and grow an HIT event-exclusive database, keeping pace with updates to MAUDE toward shared learning.
Spatiotemporal database of US congressional elections, 1896–2014
Wolf, Levi John
2017-01-01
High-quality historical data about US Congressional elections has long provided common ground for electoral studies. However, advances in geographic information science have recently made it efficient to compile, distribute, and analyze large spatio-temporal data sets on the structure of US Congressional districts. A single spatio-temporal data set that relates US Congressional election results to the spatial extent of the constituencies has not yet been developed. To address this, existing high-quality data sets of elections returns were combined with a spatiotemporal data set on Congressional district boundaries to generate a new spatio-temporal database of US Congressional election results that are explicitly linked to the geospatial data about the districts themselves. PMID:28809849
An Algorithm of Association Rule Mining for Microbial Energy Prospection
Shaheen, Muhammad; Shahbaz, Muhammad
2017-01-01
The presence of hydrocarbons beneath earth’s surface produces some microbiological anomalies in soils and sediments. The detection of such microbial populations involves pure bio chemical processes which are specialized, expensive and time consuming. This paper proposes a new algorithm of context based association rule mining on non spatial data. The algorithm is a modified form of already developed algorithm which was for spatial database only. The algorithm is applied to mine context based association rules on microbial database to extract interesting and useful associations of microbial attributes with existence of hydrocarbon reserve. The surface and soil manifestations caused by the presence of hydrocarbon oxidizing microbes are selected from existing literature and stored in a shared database. The algorithm is applied on the said database to generate direct and indirect associations among the stored microbial indicators. These associations are then correlated with the probability of hydrocarbon’s existence. The numerical evaluation shows better accuracy for non-spatial data as compared to conventional algorithms at generating reliable and robust rules. PMID:28393846
Haile, Michael; Anderson, Kim; Evans, Alex; Crawford, Angela
2012-01-01
In part 1 of this series, we outlined the rationale behind the development of a centralized electronic database used to maintain nonsterile compounding formulation records in the Mission Health System, which is a union of several independent hospitals and satellite and regional pharmacies that form the cornerstone of advanced medical care in several areas of western North Carolina. Hospital providers in many healthcare systems require compounded formulations to meet the needs of their patients (in particular, pediatric patients). Before a centralized electronic compounding database was implemented in the Mission Health System, each satellite or regional pharmacy affiliated with that system had a specific set of formulation records, but no standardized format for those records existed. In this article, we describe the quality control, database platform selection, description, implementation, and execution of our intranet database system, which is designed to maintain, manage, and disseminate nonsterile compounding formulation records in the hospitals and affiliated pharmacies of the Mission Health System. The objectives of that project were to standardize nonsterile compounding formulation records, create a centralized computerized database that would increase healthcare staff members' access to formulation records, establish beyond-use dates based on published stability studies, improve quality control, reduce the potential for medication errors related to compounding medications, and (ultimately) improve patient safety.
Zhang, Yanqiong; Yang, Chunyuan; Wang, Shaochuang; Chen, Tao; Li, Mansheng; Wang, Xue; Li, Dongsheng; Wang, Kang; Ma, Jie; Wu, Songfeng; Zhang, Xueli; Zhu, Yunping; Wu, Jinsheng; He, Fuchu
2013-09-01
A large amount of liver-related physiological and pathological data exist in publicly available biological and bibliographic databases, which are usually far from comprehensive or integrated. Data collection, integration and mining processes pose a great challenge to scientific researchers and clinicians interested in the liver. To address these problems, we constructed LiverAtlas (http://liveratlas.hupo.org.cn), a comprehensive resource of biomedical knowledge related to the liver and various hepatic diseases by incorporating 53 databases. In the present version, LiverAtlas covers data on liver-related genomics, transcriptomics, proteomics, metabolomics and hepatic diseases. Additionally, LiverAtlas provides a wealth of manually curated information, relevant literature citations and cross-references to other databases. Importantly, an expert-confirmed Human Liver Disease Ontology, including relevant information for 227 types of hepatic disease, has been constructed and is used to annotate LiverAtlas data. Furthermore, we have demonstrated two examples of applying LiverAtlas data to identify candidate markers for hepatocellular carcinoma (HCC) at the systems level and to develop a systems biology-based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC differential diagnosis. LiverAtlas is the most comprehensive liver and hepatic disease resource, which helps biologists and clinicians to analyse their data at the systems level and will contribute much to the biomarker discovery and diagnostic performance enhancement for liver diseases. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Geographic Information Systems and Web Page Development
NASA Technical Reports Server (NTRS)
Reynolds, Justin
2004-01-01
The Facilities Engineering and Architectural Branch is responsible for the design and maintenance of buildings, laboratories, and civil structures. In order to improve efficiency and quality, the FEAB has dedicated itself to establishing a data infrastructure based on Geographic Information Systems, GIs. The value of GIS was explained in an article dating back to 1980 entitled "Need for a Multipurpose Cadastre which stated, "There is a critical need for a better land-information system in the United States to improve land-conveyance procedures, furnish a basis for equitable taxation, and provide much-needed information for resource management and environmental planning." Scientists and engineers both point to GIS as the solution. What is GIS? According to most text books, Geographic Information Systems is a class of software that stores, manages, and analyzes mapable features on, above, or below the surface of the earth. GIS software is basically database management software to the management of spatial data and information. Simply put, Geographic Information Systems manage, analyze, chart, graph, and map spatial information. At the outset, I was given goals and expectations from my branch and from my mentor with regards to the further implementation of GIs. Those goals are as follows: (1) Continue the development of GIS for the underground structures. (2) Extract and export annotated data from AutoCAD drawing files and construct a database (to serve as a prototype for future work). (3) Examine existing underground record drawings to determine existing and non-existing underground tanks. Once this data was collected and analyzed, I set out on the task of creating a user-friendly database that could be assessed by all members of the branch. It was important that the database be built using programs that most employees already possess, ruling out most AutoCAD-based viewers. Therefore, I set out to create an Access database that translated onto the web using Internet Explorer as the foundation. After some programming, it was possible to view AutoCAD files and other GIS-related applications on Internet Explorer, while providing the user with a variety of editing commands and setting options. I was also given the task of launching a divisional website using Macromedia Flash and other web- development programs.
NPACT: Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database
Mangal, Manu; Sagar, Parul; Singh, Harinder; Raghava, Gajendra P. S.; Agarwal, Subhash M.
2013-01-01
Plant-derived molecules have been highly valued by biomedical researchers and pharmaceutical companies for developing drugs, as they are thought to be optimized during evolution. Therefore, we have collected and compiled a central resource Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database (NPACT, http://crdd.osdd.net/raghava/npact/) that gathers the information related to experimentally validated plant-derived natural compounds exhibiting anti-cancerous activity (in vitro and in vivo), to complement the other databases. It currently contains 1574 compound entries, and each record provides information on their structure, manually curated published data on in vitro and in vivo experiments along with reference for users referral, inhibitory values (IC50/ED50/EC50/GI50), properties (physical, elemental and topological), cancer types, cell lines, protein targets, commercial suppliers and drug likeness of compounds. NPACT can easily be browsed or queried using various options, and an online similarity tool has also been made available. Further, to facilitate retrieval of existing data, each record is hyperlinked to similar databases like SuperNatural, Herbal Ingredients’ Targets, Comparative Toxicogenomics Database, PubChem and NCI-60 GI50 data. PMID:23203877
Making your database available through Wikipedia: the pros and cons.
Finn, Robert D; Gardner, Paul P; Bateman, Alex
2012-01-01
Wikipedia, the online encyclopedia, is the most famous wiki in use today. It contains over 3.7 million pages of content; with many pages written on scientific subject matters that include peer-reviewed citations, yet are written in an accessible manner and generally reflect the consensus opinion of the community. In this, the 19th Annual Database Issue of Nucleic Acids Research, there are 11 articles that describe the use of a wiki in relation to a biological database. In this commentary, we discuss how biological databases can be integrated with Wikipedia, thereby utilising the pre-existing infrastructure, tools and above all, large community of authors (or Wikipedians). The limitations to the content that can be included in Wikipedia are highlighted, with examples drawn from articles found in this issue and other wiki-based resources, indicating why other wiki solutions are necessary. We discuss the merits of using open wikis, like Wikipedia, versus other models, with particular reference to potential vandalism. Finally, we raise the question about the future role of dedicated database biocurators in context of the thousands of crowdsourced, community annotations that are now being stored in wikis.
Making your database available through Wikipedia: the pros and cons
Finn, Robert D.; Gardner, Paul P.; Bateman, Alex
2012-01-01
Wikipedia, the online encyclopedia, is the most famous wiki in use today. It contains over 3.7 million pages of content; with many pages written on scientific subject matters that include peer-reviewed citations, yet are written in an accessible manner and generally reflect the consensus opinion of the community. In this, the 19th Annual Database Issue of Nucleic Acids Research, there are 11 articles that describe the use of a wiki in relation to a biological database. In this commentary, we discuss how biological databases can be integrated with Wikipedia, thereby utilising the pre-existing infrastructure, tools and above all, large community of authors (or Wikipedians). The limitations to the content that can be included in Wikipedia are highlighted, with examples drawn from articles found in this issue and other wiki-based resources, indicating why other wiki solutions are necessary. We discuss the merits of using open wikis, like Wikipedia, versus other models, with particular reference to potential vandalism. Finally, we raise the question about the future role of dedicated database biocurators in context of the thousands of crowdsourced, community annotations that are now being stored in wikis. PMID:22144683
Spatial Data Integration Using Ontology-Based Approach
NASA Astrophysics Data System (ADS)
Hasani, S.; Sadeghi-Niaraki, A.; Jelokhani-Niaraki, M.
2015-12-01
In today's world, the necessity for spatial data for various organizations is becoming so crucial that many of these organizations have begun to produce spatial data for that purpose. In some circumstances, the need to obtain real time integrated data requires sustainable mechanism to process real-time integration. Case in point, the disater management situations that requires obtaining real time data from various sources of information. One of the problematic challenges in the mentioned situation is the high degree of heterogeneity between different organizations data. To solve this issue, we introduce an ontology-based method to provide sharing and integration capabilities for the existing databases. In addition to resolving semantic heterogeneity, better access to information is also provided by our proposed method. Our approach is consisted of three steps, the first step is identification of the object in a relational database, then the semantic relationships between them are modelled and subsequently, the ontology of each database is created. In a second step, the relative ontology will be inserted into the database and the relationship of each class of ontology will be inserted into the new created column in database tables. Last step is consisted of a platform based on service-oriented architecture, which allows integration of data. This is done by using the concept of ontology mapping. The proposed approach, in addition to being fast and low cost, makes the process of data integration easy and the data remains unchanged and thus takes advantage of the legacy application provided.
ALDB: a domestic-animal long noncoding RNA database.
Li, Aimin; Zhang, Junying; Zhou, Zhongyin; Wang, Lei; Liu, Yujuan; Liu, Yajun
2015-01-01
Long noncoding RNAs (lncRNAs) have attracted significant attention in recent years due to their important roles in many biological processes. Domestic animals constitute a unique resource for understanding the genetic basis of phenotypic variation and are ideal models relevant to diverse areas of biomedical research. With improving sequencing technologies, numerous domestic-animal lncRNAs are now available. Thus, there is an immediate need for a database resource that can assist researchers to store, organize, analyze and visualize domestic-animal lncRNAs. The domestic-animal lncRNA database, named ALDB, is the first comprehensive database with a focus on the domestic-animal lncRNAs. It currently archives 12,103 pig intergenic lncRNAs (lincRNAs), 8,923 chicken lincRNAs and 8,250 cow lincRNAs. In addition to the annotations of lincRNAs, it offers related data that is not available yet in existing lncRNA databases (lncRNAdb and NONCODE), such as genome-wide expression profiles and animal quantitative trait loci (QTLs) of domestic animals. Moreover, a collection of interfaces and applications, such as the Basic Local Alignment Search Tool (BLAST), the Generic Genome Browser (GBrowse) and flexible search functionalities, are available to help users effectively explore, analyze and download data related to domestic-animal lncRNAs. ALDB enables the exploration and comparative analysis of lncRNAs in domestic animals. A user-friendly web interface, integrated information and tools make it valuable to researchers in their studies. ALDB is freely available from http://res.xaut.edu.cn/aldb/index.jsp.
The Papillomavirus Episteme: a central resource for papillomavirus sequence data and analysis.
Van Doorslaer, Koenraad; Tan, Qina; Xirasagar, Sandhya; Bandaru, Sandya; Gopalan, Vivek; Mohamoud, Yasmin; Huyen, Yentram; McBride, Alison A
2013-01-01
The goal of the Papillomavirus Episteme (PaVE) is to provide an integrated resource for the analysis of papillomavirus (PV) genome sequences and related information. The PaVE is a freely accessible, web-based tool (http://pave.niaid.nih.gov) created around a relational database, which enables storage, analysis and exchange of sequence information. From a design perspective, the PaVE adopts an Open Source software approach and stresses the integration and reuse of existing tools. Reference PV genome sequences have been extracted from publicly available databases and reannotated using a custom-created tool. To date, the PaVE contains 241 annotated PV genomes, 2245 genes and regions, 2004 protein sequences and 47 protein structures, which users can explore, analyze or download. The PaVE provides scientists with the data and tools needed to accelerate scientific progress for the study and treatment of diseases caused by PVs.
Integrated Array/Metadata Analytics
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Baumann, Peter
2015-04-01
Data comes in various forms and types, and integration usually presents a problem that is often simply ignored and solved with ad-hoc solutions. Multidimensional arrays are an ubiquitous data type, that we find at the core of virtually all science and engineering domains, as sensor, model, image, statistics data. Naturally, arrays are richly described by and intertwined with additional metadata (alphanumeric relational data, XML, JSON, etc). Database systems, however, a fundamental building block of what we call "Big Data", lack adequate support for modelling and expressing these array data/metadata relationships. Array analytics is hence quite primitive or non-existent at all in modern relational DBMS. Recognizing this, we extended SQL with a new SQL/MDA part seamlessly integrating multidimensional array analytics into the standard database query language. We demonstrate the benefits of SQL/MDA with real-world examples executed in ASQLDB, an open-source mediator system based on HSQLDB and rasdaman, that already implements SQL/MDA.
Content based information retrieval in forensic image databases.
Geradts, Zeno; Bijhold, Jurrien
2002-03-01
This paper gives an overview of the various available image databases and ways of searching these databases on image contents. The developments in research groups of searching in image databases is evaluated and compared with the forensic databases that exist. Forensic image databases of fingerprints, faces, shoeprints, handwriting, cartridge cases, drugs tablets, and tool marks are described. The developments in these fields appear to be valuable for forensic databases, especially that of the framework in MPEG-7, where the searching in image databases is standardized. In the future, the combination of the databases (also DNA-databases) and possibilities to combine these can result in stronger forensic evidence.
[HIV and the nursing professional in the face of needlestick accidents].
Vieira, Mariana; Padilha, Maria Itayra Coelho de Souza
2008-12-01
The goal of this study was to identify the scientific production about work-related needlestick accidents among nursing professionals involving HIV-contaminated biological material, as well as to characterize the pre-existing factors to such accidents, such as procedures occurring after the exposure to potentially HIV-contaminated needlestick material. This is a literature review, whose bibliographic search for keywords was carried out within the LILACS databases from the year 2000 onward. This study confirms that pre-existing factors for the occurrence of work-related needlestick accidents are related to work conditions as much as to individual conditions. In face of these accidents, the nursing workers need to know the conducts concerning post-exposure to potentially HIV-contaminated needlestick material. We conclude that the adoption of standardized precautions when working in healthcare is a fundamental condition for worker safety, independently of their area of expertise, given the increasing number of HIV cases.
De Backer, Tine L M; Vander Stichele, Robert H; Van Bortel, Luc M
2009-01-01
Benefit-risk assessment should be ongoing during the life cycle of a pharmaceutical agent. New products are subjected to rigorous registration laws and rules, which attempt to assure the availability and validity of evidence. For older products, bias in benefit-risk assessment is more likely, as a number of safeguards were not in place at the time these products were registered. This issue of bias in benefit-risk assessment of older products is illustrated here with an example: buflomedil in intermittent claudication. Data on efficacy were retrieved from a Cochrane systematic review. Data on safety were obtained by comparing the number of reports of serious adverse events and fatalities published in the literature with those reported in postmarketing surveillance databases. In the case of efficacy, the slim basis of evidence for the benefit of buflomedil is undermined by documented publication bias. In the case of safety, bias in reporting to international safety databases is illustrated by the discrepancy between the number of drug-related deaths published in the literature (20), the potentially drug-related deaths in the WHO database (20) and deaths attributed to buflomedil in the database of the international marketing authorization holder (11). In older products, efficacy cannot be evaluated without a thorough search for publication bias. For safety, case reporting of drug-related serious events and deaths in the literature remains a necessary instrument for risk appraisal of older medicines, despite the existence of postmarketing safety databases. The enforcement of efficient communication between healthcare workers, drug companies, national centres of pharmacovigilance, national poison centers and the WHO is necessary to ensure the validity of postmarketing surveillance reporting systems. Drugs considered obsolete because of unfavourable benefit-risk assessment should not be allowed to stay on the market.
Lee, Casey J.; Glysson, G. Douglas
2013-01-01
Human-induced and natural changes to the transport of sediment and sediment-associated constituents can degrade aquatic ecosystems and limit human uses of streams and rivers. The lack of a dedicated, easily accessible, quality-controlled database of sediment and ancillary data has made it difficult to identify sediment-related water-quality impairments and has limited understanding of how human actions affect suspended-sediment concentrations and transport. The purpose of this report is to describe the creation of a quality-controlled U.S. Geological Survey suspended-sediment database, provide guidance for its use, and summarize characteristics of suspended-sediment data through 2010. The database is provided as an online application at http://cida.usgs.gov/sediment to allow users to view, filter, and retrieve available suspended-sediment and ancillary data. A data recovery, filtration, and quality-control process was performed to expand the availability, representativeness, and utility of existing suspended-sediment data collected by the U.S. Geological Survey in the United States before January 1, 2011. Information on streamflow condition, sediment grain size, and upstream landscape condition were matched to sediment data and sediment-sampling sites to place data in context with factors that may influence sediment transport. Suspended-sediment and selected ancillary data are presented from across the United States with respect to time, streamflow, and landscape condition. Examples of potential uses of this database for identifying sediment-related impairments, assessing trends, and designing new data collection activities are provided. This report and database can support local and national-level decision making, project planning, and data mining activities related to the transport of suspended-sediment and sediment-associated constituents.
A structured vocabulary for indexing dietary supplements in databases in the United States
USDA-ARS?s Scientific Manuscript database
Food composition databases are critical to assess and plan dietary intakes. Dietary supplement databases are also needed because dietary supplements make significant contributions to total nutrient intakes. However, no uniform system exists for classifying dietary supplement products and indexing ...
Dynamic XML-based exchange of relational data: application to the Human Brain Project.
Tang, Zhengming; Kadiyska, Yana; Li, Hao; Suciu, Dan; Brinkley, James F
2003-01-01
This paper discusses an approach to exporting relational data in XML format for data exchange over the web. We describe the first real-world application of SilkRoute, a middleware program that dynamically converts existing relational data to a user-defined XML DTD. The application, called XBrain, wraps SilkRoute in a Java Server Pages framework, thus permitting a web-based XQuery interface to a legacy relational database. The application is demonstrated as a query interface to the University of Washington Brain Project's Language Map Experiment Management System, which is used to manage data about language organization in the brain.
Intelligent communication assistant for databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jakobson, G.; Shaked, V.; Rowley, S.
1983-01-01
An intelligent communication assistant for databases, called FRED (front end for databases) is explored. FRED is designed to facilitate access to database systems by users of varying levels of experience. FRED is a second generation of natural language front-ends for databases and intends to solve two critical interface problems existing between end-users and databases: connectivity and communication problems. The authors report their experiences in developing software for natural language query processing, dialog control, and knowledge representation, as well as the direction of future work. 10 references.
Importance of Data Management in a Long-term Biological Monitoring Program
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christensen, Sigurd W; Brandt, Craig C; McCracken, Kitty
2011-01-01
The long-term Biological Monitoring and Abatement Program (BMAP) has always needed to collect and retain high-quality data on which to base its assessments of ecological status of streams and their recovery after remediation. Its formal quality assurance, data processing, and data management components all contribute to this need. The Quality Assurance Program comprehensively addresses requirements from various institutions, funders, and regulators, and includes a data management component. Centralized data management began a few years into the program. An existing relational database was adapted and extended to handle biological data. Data modeling enabled the program's database to process, store, and retrievemore » its data. The data base's main data tables and several key reference tables are described. One of the most important related activities supporting long-term analyses was the establishing of standards for sampling site names, taxonomic identification, flagging, and other components. There are limitations. Some types of program data were not easily accommodated in the central systems, and many possible data-sharing and integration options are not easily accessible to investigators. The implemented relational database supports the transmittal of data to the Oak Ridge Environmental Information System (OREIS) as the permanent repository. From our experience we offer data management advice to other biologically oriented long-term environmental sampling and analysis programs.« less
Importance of Data Management in a Long-Term Biological Monitoring Program
NASA Astrophysics Data System (ADS)
Christensen, Sigurd W.; Brandt, Craig C.; McCracken, Mary K.
2011-06-01
The long-term Biological Monitoring and Abatement Program (BMAP) has always needed to collect and retain high-quality data on which to base its assessments of ecological status of streams and their recovery after remediation. Its formal quality assurance, data processing, and data management components all contribute to meeting this need. The Quality Assurance Program comprehensively addresses requirements from various institutions, funders, and regulators, and includes a data management component. Centralized data management began a few years into the program when an existing relational database was adapted and extended to handle biological data. The database's main data tables and several key reference tables are described. One of the most important related activities supporting long-term analyses was the establishing of standards for sampling site names, taxonomic identification, flagging, and other components. The implemented relational database supports the transmittal of data to the Oak Ridge Environmental Information System (OREIS) as the permanent repository. We also discuss some limitations to our implementation. Some types of program data were not easily accommodated in the central systems, and many possible data-sharing and integration options are not easily accessible to investigators. From our experience we offer data management advice to other biologically oriented long-term environmental sampling and analysis programs.
Chemical databases evaluated by order theoretical tools.
Voigt, Kristina; Brüggemann, Rainer; Pudenz, Stefan
2004-10-01
Data on environmental chemicals are urgently needed to comply with the future chemicals policy in the European Union. The availability of data on parameters and chemicals can be evaluated by chemometrical and environmetrical methods. Different mathematical and statistical methods are taken into account in this paper. The emphasis is set on a new, discrete mathematical method called METEOR (method of evaluation by order theory). Application of the Hasse diagram technique (HDT) of the complete data-matrix comprising 12 objects (databases) x 27 attributes (parameters + chemicals) reveals that ECOTOX (ECO), environmental fate database (EFD) and extoxnet (EXT)--also called multi-database databases--are best. Most single databases which are specialised are found in a minimal position in the Hasse diagram; these are biocatalysis/biodegradation database (BID), pesticide database (PES) and UmweltInfo (UMW). The aggregation of environmental parameters and chemicals (equal weight) leads to a slimmer data-matrix on the attribute side. However, no significant differences are found in the "best" and "worst" objects. The whole approach indicates a rather bad situation in terms of the availability of data on existing chemicals and hence an alarming signal concerning the new and existing chemicals policies of the EEC.
DEXTER: Disease-Expression Relation Extraction from Text.
Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E; Hu, Yu; Wu, Cathy H; Mazumder, Raja; Vijay-Shanker, K
2018-01-01
Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression-disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress.Database URL: http://biotm.cis.udel.edu/DEXTER.
Chang, Sun Ju; Im, Eun-Ok
2014-01-01
The purpose of the study was to develop a situation-specific theory for explaining health-related quality of life (QOL) among older South Korean adults with type 2 diabetes. To develop a situation-specific theory, three sources were considered: (a) the conceptual model of health promotion and QOL for people with chronic and disabling conditions (an existing theory related to the QOL in patients with chronic diseases); (b) a literature review using multiple databases including Cumulative Index for Nursing and Allied Health Literature (CINAHL), PubMed, PsycINFO, and two Korean databases; and (c) findings from our structural equation modeling study on health-related QOL in older South Korean adults with type 2 diabetes. The proposed situation-specific theory is constructed with six major concepts including barriers, resources, perceptual factors, psychosocial factors, health-promoting behaviors, and health-related QOL. The theory also provides the interrelationships among concepts. Health care providers and nurses could incorporate the proposed situation-specific theory into development of diabetes education programs for improving health-related QOL in older South Korean adults with type 2 diabetes.
Most of the existing arsenic dietary databases were developed from the analysis of total arsenic in water and dietary samples. These databases have been used to estimate arsenic exposure and in turn human health risk. However, these dietary databases are becoming obsolete as the ...
NREL Opens Large Database of Inorganic Thin-Film Materials | News | NREL
Inorganic Thin-Film Materials April 3, 2018 An extensive experimental database of inorganic thin-film Energy Laboratory (NREL) is now publicly available. The High Throughput Experimental Materials (HTEM Schroeder / NREL) "All existing experimental databases either contain many entries or have all this
77 FR 16434 - Revocation of Multiple Domestic, Alaskan, and Hawaiian Compulsory Reporting Points
Federal Register 2010, 2011, 2012, 2013, 2014
2012-03-21
... previously removed from service and taken out of the FAA aeronautical database. The FAA is removing these... FAA's aeronautical database. This will avoid confusion and eliminate safety issues with existing fixes... and not contained in the FAA's aeronautical database as reporting points. The reporting points...
Designing Corporate Databases to Support Technology Innovation
ERIC Educational Resources Information Center
Gultz, Michael Jarett
2012-01-01
Based on a review of the existing literature on database design, this study proposed a unified database model to support corporate technology innovation. This study assessed potential support for the model based on the opinions of 200 technology industry executives, including Chief Information Officers, Chief Knowledge Officers and Chief Learning…
miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
Kim, You Jung; Boyd, Andrew; Athey, Brian D.; Patel, Jignesh M.
2005-01-01
A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users. PMID:16061938
Determining the semantic similarities among Gene Ontology terms.
Taha, Kamal
2013-05-01
We present in this paper novel techniques that determine the semantic relationships among GeneOntology (GO) terms. We implemented these techniques in a prototype system called GoSE, which resides between user application and GO database. Given a set S of GO terms, GoSE would return another set S' of GO terms, where each term in S' is semantically related to each term in S. Most current research is focused on determining the semantic similarities among GO ontology terms based solely on their IDs and proximity to one another in the GO graph structure, while overlooking the contexts of the terms, which may lead to erroneous results. The context of a GO term T is the set of other terms, whose existence in the GO graph structure is dependent on T. We propose novel techniques that determine the contexts of terms based on the concept of existence dependency. We present a stack-based sort-merge algorithm employing these techniques for determining the semantic similarities among GO terms.We evaluated GoSE experimentally and compared it with three existing methods. The results of measuring the semantic similarities among genes in KEGG and Pfam pathways retrieved from the DBGET and Sanger Pfam databases, respectively, have shown that our method outperforms the other three methods in recall and precision.
Yu, Kebing; Salomon, Arthur R
2009-12-01
Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through MS/MS. Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to various experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our high throughput autonomous proteomic pipeline used in the automated acquisition and post-acquisition analysis of proteomic data.
Xu, Qifang; Dunbrack, Roland L
2012-11-01
Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.
Pregnancy and Parenthood in the Navy: Results of the 2012-2013 Survey
2016-05-12
Tennessee 38055-1000 www.nprst.navy.mil NPRST-TN-16-1 May 2016 Pregnancy and Parenthood in the Navy: Results of the 2012-2013 Survey ...not readily accessible in existing databases. The 2012-2013 Pregnancy and Parenthood Survey was conducted to gather both attitudinal and objective...be difficult to accurately extrapolate. The Navy-wide biennial Pregnancy and Parenthood Survey has served, as the primary source of metrics related
Comet: an open-source MS/MS sequence database search tool.
Eng, Jimmy K; Jahan, Tahmina A; Hoopmann, Michael R
2013-01-01
Proteomics research routinely involves identifying peptides and proteins via MS/MS sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of commercial and open-source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Dietary Exposure Potential Model
Existing food consumption and contaminant residue databases, typically products of nutrition and regulatory monitoring, contain useful information to characterize dietary intake of environmental chemicals. A PC-based model with resident database system, termed the Die...
75 FR 72873 - Privacy Act Of 1974; System of Records
Federal Register 2010, 2011, 2012, 2013, 2014
2010-11-26
...) is amending two existing systems of records 121VA19, ``National Patient Databases--VA'', and 136VA19E... being amended for additional databases. DATES: Comments on the amendment of these systems of records... system identified as 121VA19, ``National Patient Databases--VA,'' as set forth in the Federal Register...
Rattner, B.A.; Pearson, J.L.; Golden, N.H.; Erwin, R.M.; Ottinger, M.A.
1998-01-01
The Biomonitoring of Environmental Status and Trends (BEST) program of the Department of the Interior is focused to identify and understand effects of contaminant stressors on biological resources under their stewardship. One BEST program activity involves evaluation of retrospective data to assess and predict the condition of biota in Atlantic coast estuaries. A 'Contaminant Exposure and Effects--Terrestrial Vertebrates' database (CEE-TV) has been compiled through computerized literature searches of Fish and Wildlife Reviews, BIOSIS, AGRICOLA, and TOXLINE, review of existing databases (e.g., US EPA Ecological Incident Information System, USGS Diagnostic and Epizootic Databases), and solicitation of unpublished reports from conservation agencies, private groups, and universities. Summary information has been entered into the CEE-TV database, including species, collection date (1965-present), site coordinates, sample matrix, contaminant concentrations, biomarker and bioindicator responses, and reference source, utilizing a 96-field dBase format. Currently, the CEE-TV database contains 3500 georeferenced records representing >200 vertebrate species and > 100,000 individuals residing in estuaries from Maine through Florida. This relational database can be directly queried, imported into the ARC/INFO geographic information system (GIS) to examine spatial tendencies, and used to identify 'hot-spots', generate hypotheses, and focus ecotoxicological assessments. An overview of temporal, phylogenetic, and geographic contaminant exposure and effects information, trends, and data gaps will be presented for terrestrial vertebrates residing in estuaries in the northeast United States.
Collision Cross Section (CCS) Database: An Additional Measure to Characterize Steroids.
Hernández-Mesa, Maykel; Le Bizec, Bruno; Monteau, Fabrice; García-Campaña, Ana M; Dervilly-Pinel, Gaud
2018-04-03
Ion mobility spectrometry enhances the performance characteristics of liquid chromatography-mass spectrometry workflows intended to steroid profiling by providing a new separation dimension and a novel characterization parameter, the so-called collision cross section (CCS). This work proposes the first CCS database for 300 steroids (i.e., endogenous, including phase I and phase II metabolites, and exogenous synthetic compounds), which involves 1080 ions and covers the CCS of 127 androgens, 84 estrogens, 50 corticosteroids, and 39 progestagens. This large database provides information related to all the ionized species identified for each steroid in positive electrospray ionization mode as well as for estrogens in negative ionization mode. CCS values have been measured using nitrogen as drift gas in the ion mobility cell. Generally, direct correlation exists between mass-to-charge ratio ( m/ z) and CCS because both are related parameters. However, several steroids mainly steroid glucuronides and steroid esters have been characterized as more compact or elongated molecules than expected. In such cases, CCS results in additional relevant information to retention time and mass spectral data for the identification of steroids. Moreover, several isomeric steroid pairs (e.g., 5β-androstane-3,17-dione and 5α-androstane-3,17-dione) have been separated based on their CCS differences. These results indicate that adding the CCS to databases in analytical workflows increases selectivity, thus improving the confidence in steroids analysis. Consequences in terms of identification and quantification are discussed. Quality criteria and a construction of an interlaboratory reproducibility approach are also reported for the obtained CCS values. The CCS database described here is made publicly available.
Databases for the Global Dynamics of Multiparameter Nonlinear Systems
2014-03-05
AFRL-OSR-VA-TR-2014-0078 DATABASES FOR THE GLOBAL DYNAMICS OF MULTIPARAMETER NONLINEAR SYSTEMS Konstantin Mischaikow RUTGERS THE STATE UNIVERSITY OF...University of New Jersey ASB III, Rutgers Plaza New Brunswick, NJ 08807 DATABASES FOR THE GLOBAL DYNAMICS OF MULTIPARAMETER NONLINEAR SYSTEMS ...dynamical systems . We refer to the output as a Database for Global Dynamics since it allows the user to query for information about the existence and
Evaluation of Sub Query Performance in SQL Server
NASA Astrophysics Data System (ADS)
Oktavia, Tanty; Sujarwo, Surya
2014-03-01
The paper explores several sub query methods used in a query and their impact on the query performance. The study uses experimental approach to evaluate the performance of each sub query methods combined with indexing strategy. The sub query methods consist of in, exists, relational operator and relational operator combined with top operator. The experimental shows that using relational operator combined with indexing strategy in sub query has greater performance compared with using same method without indexing strategy and also other methods. In summary, for application that emphasized on the performance of retrieving data from database, it better to use relational operator combined with indexing strategy. This study is done on Microsoft SQL Server 2012.
2013-01-01
Background Though India has sequenced water buffalo genome but its draft assembly is based on cattle genome BTau 4.0, thus de novo chromosome wise assembly is a major pending issue for global community. The existing radiation hybrid of buffalo and these reported STR can be used further in final gap plugging and “finishing” expected in de novo genome assembly. QTL and gene mapping needs mining of putative STR from buffalo genome at equal interval on each and every chromosome. Such markers have potential role in improvement of desirable characteristics, such as high milk yields, resistance to diseases, high growth rate. The STR mining from whole genome and development of user friendly database is yet to be done to reap the benefit of whole genome sequence. Description By in silico microsatellite mining of whole genome, we have developed first STR database of water buffalo, BuffSatDb (Buffalo MicroSatellite Database (http://cabindb.iasri.res.in/buffsatdb/) which is a web based relational database of 910529 microsatellite markers, developed using PHP and MySQL database. Microsatellite markers have been generated using MIcroSAtellite tool. It is simple and systematic web based search for customised retrieval of chromosome wise and genome-wide microsatellites. Search has been enabled based on chromosomes, motif type (mono-hexa), repeat motif and repeat kind (simple and composite). The search may be customised by limiting location of STR on chromosome as well as number of markers in that range. This is a novel approach and not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of the selected markers enabling researcher to select markers of choice at desired interval over the chromosome. The unique add-on of degenerate bases further helps in resolving presence of degenerate bases in current buffalo assembly. Conclusion Being first buffalo STR database in the world , this would not only pave the way in resolving current assembly problem but shall be of immense use for global community in QTL/gene mapping critically required to increase knowledge in the endeavour to increase buffalo productivity, especially for third world country where rural economy is significantly dependent on buffalo productivity. PMID:23336431
Sarika; Arora, Vasu; Iquebal, Mir Asif; Rai, Anil; Kumar, Dinesh
2013-01-19
Though India has sequenced water buffalo genome but its draft assembly is based on cattle genome BTau 4.0, thus de novo chromosome wise assembly is a major pending issue for global community. The existing radiation hybrid of buffalo and these reported STR can be used further in final gap plugging and "finishing" expected in de novo genome assembly. QTL and gene mapping needs mining of putative STR from buffalo genome at equal interval on each and every chromosome. Such markers have potential role in improvement of desirable characteristics, such as high milk yields, resistance to diseases, high growth rate. The STR mining from whole genome and development of user friendly database is yet to be done to reap the benefit of whole genome sequence. By in silico microsatellite mining of whole genome, we have developed first STR database of water buffalo, BuffSatDb (Buffalo MicroSatellite Database (http://cabindb.iasri.res.in/buffsatdb/) which is a web based relational database of 910529 microsatellite markers, developed using PHP and MySQL database. Microsatellite markers have been generated using MIcroSAtellite tool. It is simple and systematic web based search for customised retrieval of chromosome wise and genome-wide microsatellites. Search has been enabled based on chromosomes, motif type (mono-hexa), repeat motif and repeat kind (simple and composite). The search may be customised by limiting location of STR on chromosome as well as number of markers in that range. This is a novel approach and not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of the selected markers enabling researcher to select markers of choice at desired interval over the chromosome. The unique add-on of degenerate bases further helps in resolving presence of degenerate bases in current buffalo assembly. Being first buffalo STR database in the world , this would not only pave the way in resolving current assembly problem but shall be of immense use for global community in QTL/gene mapping critically required to increase knowledge in the endeavour to increase buffalo productivity, especially for third world country where rural economy is significantly dependent on buffalo productivity.
Filling Terrorism Gaps: VEOs, Evaluating Databases, and Applying Risk Terrain Modeling to Terrorism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hagan, Ross F.
2016-08-29
This paper aims to address three issues: the lack of literature differentiating terrorism and violent extremist organizations (VEOs), terrorism incident databases, and the applicability of Risk Terrain Modeling (RTM) to terrorism. Current open source literature and publicly available government sources do not differentiate between terrorism and VEOs; furthermore, they fail to define them. Addressing the lack of a comprehensive comparison of existing terrorism data sources, a matrix comparing a dozen terrorism databases is constructed, providing insight toward the array of data available. RTM, a method for spatial risk analysis at a micro level, has some applicability to terrorism research, particularlymore » for studies looking at risk indicators of terrorism. Leveraging attack data from multiple databases, combined with RTM, offers one avenue for closing existing research gaps in terrorism literature.« less
Adaptation of Decoy Fusion Strategy for Existing Multi-Stage Search Workflows
NASA Astrophysics Data System (ADS)
Ivanov, Mark V.; Levitsky, Lev I.; Gorshkov, Mikhail V.
2016-09-01
A number of proteomic database search engines implement multi-stage strategies aiming at increasing the sensitivity of proteome analysis. These approaches often employ a subset of the original database for the secondary stage of analysis. However, if target-decoy approach (TDA) is used for false discovery rate (FDR) estimation, the multi-stage strategies may violate the underlying assumption of TDA that false matches are distributed uniformly across the target and decoy databases. This violation occurs if the numbers of target and decoy proteins selected for the second search are not equal. Here, we propose a method of decoy database generation based on the previously reported decoy fusion strategy. This method allows unbiased TDA-based FDR estimation in multi-stage searches and can be easily integrated into existing workflows utilizing popular search engines and post-search algorithms.
Large-scale annotation of small-molecule libraries using public databases.
Zhou, Yingyao; Zhou, Bin; Chen, Kaisheng; Yan, S Frank; King, Frederick J; Jiang, Shumei; Winzeler, Elizabeth A
2007-01-01
While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that approximately 4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.
Rattner, B.A.; Pearson, J.L.; Golden, N.H.; Cohen, J.B.; Erwin, R.M.; Ottinger, M.A.
2000-01-01
In order to examine the condition of biota in Atlantic coast estuaries, a ?Contaminant Exposure and Effects--Terrestrial Vertebrates? database (CEE-TV) has been compiled through computerized search of published literature, review of existing databases, and solicitation of unpublished reports from conservation agencies, private groups, and universities. Summary information has been entered into the database, including species, collection date (1965-present), site coordinates, estuary name, hydrologic unit catalogue code, sample matrix, contaminant concentrations, biomarker and bioindicator responses, and reference source, utilizing a 98-field character and numeric format. Currently, the CEE-TV database contains 3699 georeferenced records representing 190 vertebrate species and >145,000 individuals residing in estuaries from Maine through Florida. This relational database can be directly queried, imported into a Geographic Information System to examine spatial patterns, identify data gaps and areas of concern, generate hypotheses, and focus ecotoxicological field assessments. Information on birds made up the vast majority (83%) of the database, with only a modicum of data on amphibians (75,000 chemical compounds in commerce, only 118 commonly measured environmental contaminants were quantified in tissues of terrestrial vertebrates. There were no CEE-TV data records in 15 of the 67 estuaries located along the Atlantic coast and Florida Gulf coast. The CEE-TV database has a number of potential applications including focusing biomonitoring efforts to generate critically needed ecotoxicological data in the numerous ?gaps? along the coast, reducing uncertainty about contaminant risk, identifying areas for mitigation, restoration or special management, and ranking ecological conditions of estuaries.
Lee, Langho; Wang, Kai; Li, Gang; Xie, Zhi; Wang, Yuli; Xu, Jiangchun; Sun, Shaoxian; Pocalyko, David; Bhak, Jong; Kim, Chulhong; Lee, Kee-Ho; Jang, Ye Jin; Yeom, Young Il; Yoo, Hyang-Sook; Hwang, Seungwoo
2011-11-30
Hepatocellular carcinoma (HCC) is the fifth most common cancer worldwide. A number of molecular profiling studies have investigated the changes in gene and protein expression that are associated with various clinicopathological characteristics of HCC and generated a wealth of scattered information, usually in the form of gene signature tables. A database of the published HCC gene signatures would be useful to liver cancer researchers seeking to retrieve existing differential expression information on a candidate gene and to make comparisons between signatures for prioritization of common genes. A challenge in constructing such database is that a direct import of the signatures as appeared in articles would lead to a loss or ambiguity of their context information that is essential for a correct biological interpretation of a gene's expression change. This challenge arises because designation of compared sample groups is most often abbreviated, ad hoc, or even missing from published signature tables. Without manual curation, the context information becomes lost, leading to uninformative database contents. Although several databases of gene signatures are available, none of them contains informative form of signatures nor shows comprehensive coverage on liver cancer. Thus we constructed Liverome, a curated database of liver cancer-related gene signatures with self-contained context information. Liverome's data coverage is more than three times larger than any other signature database, consisting of 143 signatures taken from 98 HCC studies, mostly microarray and proteome, and involving 6,927 genes. The signatures were post-processed into an informative and uniform representation and annotated with an itemized summary so that all context information is unambiguously self-contained within the database. The signatures were further informatively named and meaningfully organized according to ten functional categories for guided browsing. Its web interface enables a straightforward retrieval of known differential expression information on a query gene and a comparison of signatures to prioritize common genes. The utility of Liverome-collected data is shown by case studies in which useful biological insights on HCC are produced. Liverome database provides a comprehensive collection of well-curated HCC gene signatures and straightforward interfaces for gene search and signature comparison as well. Liverome is available at http://liverome.kobic.re.kr.
Virus taxonomy: the database of the International Committee on Taxonomy of Viruses (ICTV)
Dempsey, Donald M; Hendrickson, Robert Curtis; Orton, Richard J; Siddell, Stuart G; Smith, Donald B
2018-01-01
Abstract The International Committee on Taxonomy of Viruses (ICTV) is charged with the task of developing, refining, and maintaining a universal virus taxonomy. This task encompasses the classification of virus species and higher-level taxa according to the genetic and biological properties of their members; naming virus taxa; maintaining a database detailing the currently approved taxonomy; and providing the database, supporting proposals, and other virus-related information from an open-access, public web site. The ICTV web site (http://ictv.global) provides access to the current taxonomy database in online and downloadable formats, and maintains a complete history of virus taxa back to the first release in 1971. The ICTV has also published the ICTV Report on Virus Taxonomy starting in 1971. This Report provides a comprehensive description of all virus taxa covering virus structure, genome structure, biology and phylogenetics. The ninth ICTV report, published in 2012, is available as an open-access online publication from the ICTV web site. The current, 10th report (http://ictv.global/report/), is being published online, and is replacing the previous hard-copy edition with a completely open access, continuously updated publication. No other database or resource exists that provides such a comprehensive, fully annotated compendium of information on virus taxa and taxonomy. PMID:29040670
CCDB: a curated database of genes involved in cervix cancer.
Agarwal, Subhash M; Raghav, Dhwani; Singh, Harinder; Raghava, G P S
2011-01-01
The Cervical Cancer gene DataBase (CCDB, http://crdd.osdd.net/raghava/ccdb) is a manually curated catalog of experimentally validated genes that are thought, or are known to be involved in the different stages of cervical carcinogenesis. In spite of the large women population that is presently affected from this malignancy still at present, no database exists that catalogs information on genes associated with cervical cancer. Therefore, we have compiled 537 genes in CCDB that are linked with cervical cancer causation processes such as methylation, gene amplification, mutation, polymorphism and change in expression level, as evident from published literature. Each record contains details related to gene like architecture (exon-intron structure), location, function, sequences (mRNA/CDS/protein), ontology, interacting partners, homology to other eukaryotic genomes, structure and links to other public databases, thus augmenting CCDB with external data. Also, manually curated literature references have been provided to support the inclusion of the gene in the database and establish its association with cervix cancer. In addition, CCDB provides information on microRNA altered in cervical cancer as well as search facility for querying, several browse options and an online tool for sequence similarity search, thereby providing researchers with easy access to the latest information on genes involved in cervix cancer.
Yoo, Danny; Xu, Iris; Berardini, Tanya Z; Rhee, Seung Yon; Narayanasamy, Vijay; Twigger, Simon
2006-03-01
For most systems in biology, a large body of literature exists that describes the complexity of the system based on experimental results. Manual review of this literature to extract targeted information into biological databases is difficult and time consuming. To address this problem, we developed PubSearch and PubFetch, which store literature, keyword, and gene information in a relational database, index the literature with keywords and gene names, and provide a Web user interface for annotating the genes from experimental data found in the associated literature. A set of protocols is provided in this unit for installing, populating, running, and using PubSearch and PubFetch. In addition, we provide support protocols for performing controlled vocabulary annotations. Intended users of PubSearch and PubFetch are database curators and biology researchers interested in tracking the literature and capturing information about genes of interest in a more effective way than with conventional spreadsheets and lab notebooks.
NASA Astrophysics Data System (ADS)
Stewart, Brent K.; Langer, Steven G.; Martin, Kelly P.
1999-07-01
The purpose of this paper is to integrate multiple DICOM image webservers into the currently existing enterprises- wide web-browsable electronic medical record. Over the last six years the University of Washington has created a clinical data repository combining in a distributed relational database information from multiple departmental databases (MIND). A character cell-based view of this data called the Mini Medical Record (MMR) has been available for four years, MINDscape, unlike the text-based MMR. provides a platform independent, dynamic, web browser view of the MIND database that can be easily linked with medical knowledge resources on the network, like PubMed and the Federated Drug Reference. There are over 10,000 MINDscape user accounts at the University of Washington Academic Medical Centers. The weekday average number of hits to MINDscape is 35,302 and weekday average number of individual users is 1252. DICOM images from multiple webservers are now being viewed through the MINDscape electronic medical record.
The eNanoMapper database for nanomaterial safety information
Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon
2015-01-01
Summary Background: The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. Results: The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. Conclusion: We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the “representational state transfer” (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure–activity relationships for nanomaterials (NanoQSAR). PMID:26425413
NASA Astrophysics Data System (ADS)
Endres, Christian P.; Schlemmer, Stephan; Schilke, Peter; Stutzki, Jürgen; Müller, Holger S. P.
2016-09-01
The Cologne Database for Molecular Spectroscopy, CDMS, was founded 1998 to provide in its catalog section line lists of mostly molecular species which are or may be observed in various astronomical sources (usually) by radio astronomical means. The line lists contain transition frequencies with qualified accuracies, intensities, quantum numbers, as well as further auxiliary information. They have been generated from critically evaluated experimental line lists, mostly from laboratory experiments, employing established Hamiltonian models. Separate entries exist for different isotopic species and usually also for different vibrational states. As of December 2015, the number of entries is 792. They are available online as ascii tables with additional files documenting information on the entries. The Virtual Atomic and Molecular Data Centre, VAMDC, was founded more than 5 years ago as a common platform for atomic and molecular data. This platform facilitates exchange not only between spectroscopic databases related to astrophysics or astrochemistry, but also with collisional and kinetic databases. A dedicated infrastructure was developed to provide a common data format in the various databases enabling queries to a large variety of databases on atomic and molecular data at once. For CDMS, the incorporation in VAMDC was combined with several modifications on the generation of CDMS catalog entries. Here we introduce related changes to the data structure and the data content in the CDMS. The new data scheme allows us to incorporate all previous data entries but in addition allows us also to include entries based on new theoretical descriptions. Moreover, the CDMS entries have been transferred into a mySQL database format. These developments within the VAMDC framework have in part been driven by the needs of the astronomical community to be able to deal efficiently with large data sets obtained with the Herschel Space Telescope or, more recently, with the Atacama Large Millimeter Array.
Zhang, Bofei; Hu, Senyang; Baskin, Elizabeth; Patt, Andrew; Siddiqui, Jalal K.
2018-01-01
The value of metabolomics in translational research is undeniable, and metabolomics data are increasingly generated in large cohorts. The functional interpretation of disease-associated metabolites though is difficult, and the biological mechanisms that underlie cell type or disease-specific metabolomics profiles are oftentimes unknown. To help fully exploit metabolomics data and to aid in its interpretation, analysis of metabolomics data with other complementary omics data, including transcriptomics, is helpful. To facilitate such analyses at a pathway level, we have developed RaMP (Relational database of Metabolomics Pathways), which combines biological pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, WikiPathways, and the Human Metabolome DataBase (HMDB). To the best of our knowledge, an off-the-shelf, public database that maps genes and metabolites to biochemical/disease pathways and can readily be integrated into other existing software is currently lacking. For consistent and comprehensive analysis, RaMP enables batch and complex queries (e.g., list all metabolites involved in glycolysis and lung cancer), can readily be integrated into pathway analysis tools, and supports pathway overrepresentation analysis given a list of genes and/or metabolites of interest. For usability, we have developed a RaMP R package (https://github.com/Mathelab/RaMP-DB), including a user-friendly RShiny web application, that supports basic simple and batch queries, pathway overrepresentation analysis given a list of genes or metabolites of interest, and network visualization of gene-metabolite relationships. The package also includes the raw database file (mysql dump), thereby providing a stand-alone downloadable framework for public use and integration with other tools. In addition, the Python code needed to recreate the database on another system is also publicly available (https://github.com/Mathelab/RaMP-BackEnd). Updates for databases in RaMP will be checked multiple times a year and RaMP will be updated accordingly. PMID:29470400
Zhang, Bofei; Hu, Senyang; Baskin, Elizabeth; Patt, Andrew; Siddiqui, Jalal K; Mathé, Ewy A
2018-02-22
The value of metabolomics in translational research is undeniable, and metabolomics data are increasingly generated in large cohorts. The functional interpretation of disease-associated metabolites though is difficult, and the biological mechanisms that underlie cell type or disease-specific metabolomics profiles are oftentimes unknown. To help fully exploit metabolomics data and to aid in its interpretation, analysis of metabolomics data with other complementary omics data, including transcriptomics, is helpful. To facilitate such analyses at a pathway level, we have developed RaMP (Relational database of Metabolomics Pathways), which combines biological pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, WikiPathways, and the Human Metabolome DataBase (HMDB). To the best of our knowledge, an off-the-shelf, public database that maps genes and metabolites to biochemical/disease pathways and can readily be integrated into other existing software is currently lacking. For consistent and comprehensive analysis, RaMP enables batch and complex queries (e.g., list all metabolites involved in glycolysis and lung cancer), can readily be integrated into pathway analysis tools, and supports pathway overrepresentation analysis given a list of genes and/or metabolites of interest. For usability, we have developed a RaMP R package (https://github.com/Mathelab/RaMP-DB), including a user-friendly RShiny web application, that supports basic simple and batch queries, pathway overrepresentation analysis given a list of genes or metabolites of interest, and network visualization of gene-metabolite relationships. The package also includes the raw database file (mysql dump), thereby providing a stand-alone downloadable framework for public use and integration with other tools. In addition, the Python code needed to recreate the database on another system is also publicly available (https://github.com/Mathelab/RaMP-BackEnd). Updates for databases in RaMP will be checked multiple times a year and RaMP will be updated accordingly.
The eNanoMapper database for nanomaterial safety information.
Jeliazkova, Nina; Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon
2015-01-01
The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the "representational state transfer" (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure-activity relationships for nanomaterials (NanoQSAR).
Knowledge-rich temporal relation identification and classification in clinical notes
D’Souza, Jennifer; Ng, Vincent
2014-01-01
Motivation: We examine the task of temporal relation classification for the clinical domain. Our approach to this task departs from existing ones in that it is (i) ‘knowledge-rich’, employing sophisticated knowledge derived from discourse relations as well as both domain-independent and domain-dependent semantic relations, and (ii) ‘hybrid’, combining the strengths of rule-based and learning-based approaches. Evaluation results on the i2b2 Clinical Temporal Relations Challenge corpus show that our approach yields a 17–24% and 8–14% relative reduction in error over a state-of-the-art learning-based baseline system when gold-standard and automatically identified temporal relations are used, respectively. Database URL: http://www.hlt.utdallas.edu/~jld082000/temporal-relations/ PMID:25414383
The Nursing Diagnosis Disturbed Thought Processes: An Integrative Review.
Escalada-Hermández, Paula; Marín-Fernández, Blanca
2017-09-08
To analyze and synthetize the existing scientific literature in relation to the nursing diagnosis disturbed thought processes (DTPs) (00130). An integrative review was developed, identifying relevant papers through a search of international and Spanish databases and the examination of key manuals. Theoretical papers propose modifications for the nursing diagnosis DTPs. Most of the research papers offer data about its frequency in different clinical settings. There exists an interest in the nursing diagnosis DTPs. However, the available evidence is not very extensive and further work is necessary in order to refine this nursing diagnosis. The re-inclusion of DTPs in the NANDA-I classification will specially contribute to increment its utility in mental healthcare. © 2017 NANDA International, Inc.
Geer, Lewis Y; Marchler-Bauer, Aron; Geer, Renata C; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H
2010-01-01
The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI's Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets.
Development of a geotechnical information database.
DOT National Transportation Integrated Search
2009-06-01
The purpose of this project was to create a database for existing, current, and future geotechnical records and data. : The project originated from the Geotechnical Design Section at the Louisiana Department of Transportation and : Development (LADOT...
To provide global guidance on the establishment and maintenance of LCA databases, as the basis for improved dataset exchangeability and interlinkages of databases worldwide. Increase the credibility of existing LCA data, the generation of more data and their overall accessibilit...
MaizeGDB update: New tools, data, and interface for the maize model organism database
USDA-ARS?s Scientific Manuscript database
MaizeGDB is a highly curated, community-oriented database and informatics service to researchers focused on the crop plant and model organism Zea mays ssp. mays. Although some form of the maize community database has existed over the last 25 years, there have only been two major releases. In 1991, ...
WikiPEATia - a web based platform for assembling peatland data through ‘crowd sourcing’
NASA Astrophysics Data System (ADS)
Wisser, D.; Glidden, S.; Fieseher, C.; Treat, C. C.; Routhier, M.; Frolking, S. E.
2009-12-01
The Earth System Science community is realizing that peatlands are an important and unique terrestrial ecosystem that has not yet been well-integrated into large-scale earth system analyses. A major hurdle is the lack of accessible, geospatial data of peatland distribution, coupled with data on peatland properties (e.g., vegetation composition, peat depth, basal dates, soil chemistry, peatland class) at the global scale. This data, however, is available at the local scale. Although a comprehensive global database on peatlands probably lags similar data on more economically important ecosystems such as forests, grasslands, croplands, a large amount of field data have been collected over the past several decades. A few efforts have been made to map peatlands at large scales but existing data have not been assembled into a single geospatial database that is publicly accessible or do not depict data with a level of detail that is needed in the Earth System Science Community. A global peatland database would contribute to advances in a number of research fields such as hydrology, vegetation and ecosystem modeling, permafrost modeling, and earth system modeling. We present a Web 2.0 approach that uses state-of-the-art webserver and innovative online mapping technologies and is designed to create such a global database through ‘crowd-sourcing’. Primary functions of the online system include form-driven textual user input of peatland research metadata, spatial data input of peatland areas via a mapping interface, database editing and querying editing capabilities, as well as advanced visualization and data analysis tools. WikiPEATia provides an integrated information technology platform for assembling, integrating, and posting peatland-related geospatial datasets facilitates and encourages research community involvement. A successful effort will make existing peatland data much more useful to the research community, and will help to identify significant data gaps.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McPherson, Brian J.; Pan, Feng
2014-09-24
This report summarizes development of a coupled-process reservoir model for simulating enhanced geothermal systems (EGS) that utilize supercritical carbon dioxide as a working fluid. Specifically, the project team developed an advanced chemical kinetic model for evaluating important processes in EGS reservoirs, such as mineral precipitation and dissolution at elevated temperature and pressure, and for evaluating potential impacts on EGS surface facilities by related chemical processes. We assembled a new database for better-calibrated simulation of water/brine/ rock/CO2 interactions in EGS reservoirs. This database utilizes existing kinetic and other chemical data, and we updated those data to reflect corrections for elevated temperaturemore » and pressure conditions of EGS reservoirs.« less
Tabak, Ying P.; Johannes, Richard S.; Sun, Xiaowu; Crosby, Cynthia T.
2016-01-01
The Centers for Medicare and Medicaid Services (CMS) Hospital Compare central line-associated bloodstream infection (CLABSI) data and private databases containing new-generation intravenous needleless connector (study NC) use at the hospital level were linked. The relative risk (RR) of CLABSI associated with the study NCs was estimated, adjusting for hospital characteristics. Among 3074 eligible hospitals in the 2013 CMS database, 758 (25%) hospitals used the study NCs. The study NC hospitals had a lower unadjusted CLABSI rate (1.03 vs 1.13 CLABSIs per 1000 central line days, P < .0001) compared with comparator hospitals. The adjusted RR for CLABSI was 0.94 (95% confidence interval: 0.86, 1.02; P = .11). PMID:27598072
PathNER: a tool for systematic identification of biological pathway mentions in the literature
2013-01-01
Background Biological pathways are central to many biomedical studies and are frequently discussed in the literature. Several curated databases have been established to collate the knowledge of molecular processes constituting pathways. Yet, there has been little focus on enabling systematic detection of pathway mentions in the literature. Results We developed a tool, named PathNER (Pathway Named Entity Recognition), for the systematic identification of pathway mentions in the literature. PathNER is based on soft dictionary matching and rules, with the dictionary generated from public pathway databases. The rules utilise general pathway-specific keywords, syntactic information and gene/protein mentions. Detection results from both components are merged. On a gold-standard corpus, PathNER achieved an F1-score of 84%. To illustrate its potential, we applied PathNER on a collection of articles related to Alzheimer's disease to identify associated pathways, highlighting cases that can complement an existing manually curated knowledgebase. Conclusions In contrast to existing text-mining efforts that target the automatic reconstruction of pathway details from molecular interactions mentioned in the literature, PathNER focuses on identifying specific named pathway mentions. These mentions can be used to support large-scale curation and pathway-related systems biology applications, as demonstrated in the example of Alzheimer's disease. PathNER is implemented in Java and made freely available online at http://sourceforge.net/projects/pathner/. PMID:24555844
Chatonnet, A; Hotelier, T; Cousin, X
1999-05-14
Cholinesterases are targets for organophosphorus compounds which are used as insecticides, chemical warfare agents and drugs for the treatment of disease such as glaucoma, or parasitic infections. The widespread use of these chemicals explains the growing of this area of research and the ever increasing number of sequences, structures, or biochemical data available. Future advances will depend upon effective management of existing information as well as upon creation of new knowledge. The ESTHER database goal is to facilitate retrieval and comparison of data about structure and function of proteins presenting the alpha/beta hydrolase fold. Protein engineering and in vitro production of enzymes allow direct comparison of biochemical parameters. Kinetic parameters of enzymatic reactions are now included in the database. These parameters can be searched and compared with a table construction tool. ESTHER can be reached through internet (http://www.ensam.inra.fr/cholinesterase). The full database or the specialised X-window Client-server system can be downloaded from our ftp server (ftp://ftp.toulouse.inra.fr./pub/esther). Forms can be used to send updates or corrections directly from the web.
DSSTOX WEBSITE LAUNCH: IMPROVING PUBLIC ACCESS ...
DSSTox Website Launch: Improving Public Access to Databases for Building Structure-Toxicity Prediction ModelsAnn M. RichardUS Environmental Protection Agency, Research Triangle Park, NC, USADistributed: Decentralized set of standardized, field-delimited databases, each separatelyauthored and maintained, that are able to accommodate diverse toxicity data content;Structure-Searchable: Standard format (SDF) structure-data files that can be readily imported into available chemical relational databases and structure-searched;Tox: Toxicity data as it exists in widely disparate forms in current public databases, spanning diverse toxicity endpoints, test systems, levels of biological content, degrees of summarization, and information content.INTRODUCTIONThe economic and social pressures to reduce the need for animal testing and to better anticipate the potential for human and eco-toxicity of environmental, industrial, or pharmaceutical chemicals are as pressing today as at any time prior. However, the goal of predicting chemical toxicity in its many manifestations, the `T' in 'ADMET' (adsorption, distribution, metabolism, elimination, toxicity), remains one of the most difficult and largely unmet challenges in a chemical screening paradigm [1]. It is widely acknowledged that the single greatest hurdle to improving structure-activity relationship (SAR) toxicity prediction capabilities, in both the pharmaceutical and environmental regulation arenas, is the lack of suffici
Telemedicine information analysis center.
Zajtchuk, Joan T; Zajtchuk, Russ; Petrovic, Joseph J; Gutz, Ryan P; Walrath, Benjamin D
2004-01-01
Congress mandated a pilot project to demonstrate the feasibility of establishing a Department of Defense (DoD) telemedicine information analysis center (TIAC). The project developed a medical information support system to show the core capabilities of a TIAC. The productivity and effectiveness of telemedicine researchers and clinical practitioners can be enhanced by the existence of an information analysis center (IACs) devoted to the collection, analysis, synthesis, and dissemination of worldwide scientific and technical information related to the field of telemedicine. The work conducted under the TIAC pilot project establishes the basic IAC functions and assesses the utility of the TIAC to the military medical departments. The pilot project capabilities are Web-based and include: (1) applying the science of classification (taxonomy) to telemedicine to identify key words; (2) creating a relational database of this taxonomy to a bibliographic database using these key words; (3) developing and disseminating information via a public TIAC Web site; (4) performing a specific baseline technical area task for the U.S. Army Medical Command; and (5) providing analyses by subject matter experts.
Piersol, Catherine Verrier; Canton, Kerry; Connor, Susan E; Giller, Ilana; Lipman, Stacy; Sager, Suzanne
The goal of the evidence review was to evaluate the effectiveness of interventions for caregivers of people with Alzheimer's disease and related major neurocognitive disorders that facilitate the ability to maintain participation in the caregiver role. Scientific literature published in English between January 2006 and April 2014 was reviewed. Databases included MEDLINE, PsycINFO, CINAHL, OTseeker, and the Cochrane Database of Systematic Reviews. Of 2,476 records screened, 43 studies met inclusion criteria. Strong evidence shows that multicomponent psychoeducational interventions improve caregiver quality of life (QOL), confidence, and self-efficacy and reduce burden; cognitive reframing reduces caregiver anxiety, depression, and stress; communication skills training improves caregiver skill and QOL in persons with dementia; mindfulness-based training improves caregiver mental health and reduces stress and burden; and professionally led support groups enhance caregiver QOL. Strong evidence exists for a spectrum of caregiver interventions. Translation of effective interventions into practice and evaluation of sustainability is necessary. Copyright © 2017 by the American Occupational Therapy Association, Inc.
Integration of air traffic databases : a case study
DOT National Transportation Integrated Search
1995-03-01
This report describes a case study to show the benefits from maximum utilization of existing air traffic databases. The study demonstrates the utility of integrating available data through developing and demonstrating a methodology addressing the iss...
Virtual Atomic and Molecular Data Center (VAMDC) and Stark-B Database
NASA Astrophysics Data System (ADS)
Dimitrijevic, M. S.; Sahal-Brechot, S.; Kovacevic, A.; Jevremovic, D.; Popovic, L. C.; VAMDC Consortium; Dubernet, Marie-Lise
2012-01-01
Virtual Atomic and Molecular Data Center (VAMDC) is an European FP7 project with aims to build a flexible and interoperable e-science environment based interface to the existing Atomic and Molecular data. The VAMDC will be built upon the expertise of existing Atomic and Molecular databases, data producers and service providers with the specific aim of creating an infrastructure that is easily tuned to the requirements of a wide variety of users in academic, governmental, industrial or public communities. In VAMDC will enter also STARK-B database, containing Stark broadening parameters for a large number of lines, obtained by the semiclassical perturbation method during more than 30 years of collaboration of authors of this work (MSD and SSB) and their co-workers. In this contribution we will review the VAMDC project, STARK-B database and discuss the benefits of both for the corresponding data users.
The making of a pan-European organ transplant registry.
Smits, Jacqueline M; Niesing, Jan; Breidenbach, Thomas; Collett, Dave
2013-03-01
A European patient registry to track the outcomes of organ transplant recipients does not exist. As knowledge gleaned from large registries has already led to the creation of standards of care that gained widespread support from patients and healthcare providers, the European Union initiated a project that would enable the creation of a European Registry linking currently existing national databases. This report contains a description of all functional, technical, and legal prerequisites, which upon fulfillment should allow for the seamless sharing of national longitudinal data across temporal, geographical, and subspecialty boundaries. To create a platform that can effortlessly link multiple databases and maintain the integrity of the existing national databases crucial elements were described during the project. These elements are: (i) use of a common dictionary, (ii) use of a common database and refined data uploading technology, (iii) use of standard methodology to allow uniform protocol driven and meaningful long-term follow-up analyses, (iv) use of a quality assurance mechanism to guarantee completeness and accuracy of the data collected, and (v) establishment of a solid legal framework that allows for safe data exchange. © 2012 The Authors Transplant International © 2012 European Society for Organ Transplantation. Published by Blackwell Publishing Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rupcich, Franco; Badal, Andreu; Kyprianou, Iacovos
Purpose: The purpose of this study was to develop a database for estimating organ dose in a voxelized patient model for coronary angiography and brain perfusion CT acquisitions with any spectra and angular tube current modulation setting. The database enables organ dose estimation for existing and novel acquisition techniques without requiring Monte Carlo simulations. Methods: The study simulated transport of monoenergetic photons between 5 and 150 keV for 1000 projections over 360 Degree-Sign through anthropomorphic voxelized female chest and head (0 Degree-Sign and 30 Degree-Sign tilt) phantoms and standard head and body CTDI dosimetry cylinders. The simulations resulted in tablesmore » of normalized dose deposition for several radiosensitive organs quantifying the organ dose per emitted photon for each incident photon energy and projection angle for coronary angiography and brain perfusion acquisitions. The values in a table can be multiplied by an incident spectrum and number of photons at each projection angle and then summed across all energies and angles to estimate total organ dose. Scanner-specific organ dose may be approximated by normalizing the database-estimated organ dose by the database-estimated CTDI{sub vol} and multiplying by a physical CTDI{sub vol} measurement. Two examples are provided demonstrating how to use the tables to estimate relative organ dose. In the first, the change in breast and lung dose during coronary angiography CT scans is calculated for reduced kVp, angular tube current modulation, and partial angle scanning protocols relative to a reference protocol. In the second example, the change in dose to the eye lens is calculated for a brain perfusion CT acquisition in which the gantry is tilted 30 Degree-Sign relative to a nontilted scan. Results: Our database provides tables of normalized dose deposition for several radiosensitive organs irradiated during coronary angiography and brain perfusion CT scans. Validation results indicate total organ doses calculated using our database are within 1% of those calculated using Monte Carlo simulations with the same geometry and scan parameters for all organs except red bone marrow (within 6%), and within 23% of published estimates for different voxelized phantoms. Results from the example of using the database to estimate organ dose for coronary angiography CT acquisitions show 2.1%, 1.1%, and -32% change in breast dose and 2.1%, -0.74%, and 4.7% change in lung dose for reduced kVp, tube current modulated, and partial angle protocols, respectively, relative to the reference protocol. Results show -19.2% difference in dose to eye lens for a tilted scan relative to a nontilted scan. The reported relative changes in organ doses are presented without quantification of image quality and are for the sole purpose of demonstrating the use of the proposed database. Conclusions: The proposed database and calculation method enable the estimation of organ dose for coronary angiography and brain perfusion CT scans utilizing any spectral shape and angular tube current modulation scheme by taking advantage of the precalculated Monte Carlo simulation results. The database can be used in conjunction with image quality studies to develop optimized acquisition techniques and may be particularly beneficial for optimizing dual kVp acquisitions for which numerous kV, mA, and filtration combinations may be investigated.« less
A first proposal for a general description model of forensic traces
NASA Astrophysics Data System (ADS)
Lindauer, Ina; Schäler, Martin; Vielhauer, Claus; Saake, Gunter; Hildebrandt, Mario
2012-06-01
In recent years, the amount of digitally captured traces at crime scenes increased rapidly. There are various kinds of such traces, like pick marks on locks, latent fingerprints on various surfaces as well as different micro traces. Those traces are different from each other not only in kind but also in which information they provide. Every kind of trace has its own properties (e.g., minutiae for fingerprints, or raking traces for locks) but there are also large amounts of metadata which all traces have in common like location, time and other additional information in relation to crime scenes. For selected types of crime scene traces, type-specific databases already exist, such as the ViCLAS for sexual offences, the IBIS for ballistic forensics or the AFIS for fingerprints. These existing forensic databases strongly differ in the trace description models. For forensic experts it would be beneficial to work with only one database capable of handling all possible forensic traces acquired at a crime scene. This is especially the case when different kinds of traces are interrelated (e.g., fingerprints and ballistic marks on a bullet casing). Unfortunately, current research on interrelated traces as well as general forensic data models and structures is not mature enough to build such an encompassing forensic database. Nevertheless, recent advances in the field of contact-less scanning make it possible to acquire different kinds of traces with the same device. Therefore the data of these traces is structured similarly what simplifies the design of a general forensic data model for different kinds of traces. In this paper we introduce a first common description model for different forensic trace types. Furthermore, we apply for selected trace types from the well established database schema development process the phases of transferring expert knowledge in the corresponding forensic fields into an extendible, database-driven, generalised forensic description model. The trace types considered here are fingerprint traces, traces at locks, micro traces and ballistic traces. Based on these basic trace types, also combined traces (multiple or overlapped fingerprints, fingerprints on bullet casings, etc) and partial traces are considered.
FARME DB: a functional antibiotic resistance element database
Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.
2017-01-01
Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates. Database URL: http://staff.washington.edu/jwallace/farme PMID:28077567
Geer, Lewis Y.; Marchler-Bauer, Aron; Geer, Renata C.; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H.
2010-01-01
The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI’s Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets. PMID:19854944
A Communication Framework for Collaborative Defense
2009-02-28
been able to provide sufficient automation to be able to build up the most extensive application signature database in the world with a fraction of...perceived. We have been able to provide sufficient automation to be able to build up the most extensive application signature database in the world with a...that are well understood in the context of databases . These techniques allow users to quickly scan for the existence of a key in a database . 8 To be
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Lozano-Rubí, Raimundo; Serrano-Balazote, Pablo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2017-08-18
The objective of this research is to compare the relational and non-relational (NoSQL) database systems approaches in order to store, recover, query and persist standardized medical information in the form of ISO/EN 13606 normalized Electronic Health Record XML extracts, both in isolation and concurrently. NoSQL database systems have recently attracted much attention, but few studies in the literature address their direct comparison with relational databases when applied to build the persistence layer of a standardized medical information system. One relational and two NoSQL databases (one document-based and one native XML database) of three different sizes have been created in order to evaluate and compare the response times (algorithmic complexity) of six different complexity growing queries, which have been performed on them. Similar appropriate results available in the literature have also been considered. Relational and non-relational NoSQL database systems show almost linear algorithmic complexity query execution. However, they show very different linear slopes, the former being much steeper than the two latter. Document-based NoSQL databases perform better in concurrency than in isolation, and also better than relational databases in concurrency. Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications). Document-based NoSQL databases perform in general better than native XML NoSQL databases. EHR extracts visualization and edition are also document-based tasks more appropriate to NoSQL database systems. However, the appropriate database solution much depends on each particular situation and specific problem.
Enhancing Geoscience Research Discovery Through the Semantic Web
NASA Astrophysics Data System (ADS)
Rowan, Linda R.; Gross, M. Benjamin; Mayernik, Matthew; Khan, Huda; Boler, Frances; Maull, Keith; Stott, Don; Williams, Steve; Corson-Rikert, Jon; Johns, Erica M.; Daniels, Michael; Krafft, Dean B.; Meertens, Charles
2016-04-01
UNAVCO, UCAR, and Cornell University are working together to leverage semantic web technologies to enable discovery of people, datasets, publications and other research products, as well as the connections between them. The EarthCollab project, a U.S. National Science Foundation EarthCube Building Block, is enhancing an existing open-source semantic web application, VIVO, to enhance connectivity across distributed networks of researchers and resources related to the following two geoscience-based communities: (1) the Bering Sea Project, an interdisciplinary field program whose data archive is hosted by NCAR's Earth Observing Laboratory (EOL), and (2) UNAVCO, a geodetic facility and consortium that supports diverse research projects informed by geodesy. People, publications, datasets and grant information have been mapped to an extended version of the VIVO-ISF ontology and ingested into VIVO's database. Much of the VIVO ontology was built for the life sciences, so we have added some components of existing geoscience-based ontologies and a few terms from a local ontology that we created. The UNAVCO VIVO instance, connect.unavco.org, utilizes persistent identifiers whenever possible; for example using ORCIDs for people, publication DOIs, data DOIs and unique NSF grant numbers. Data is ingested using a custom set of scripts that include the ability to perform basic automated and curated disambiguation. VIVO can display a page for every object ingested, including connections to other objects in the VIVO database. A dataset page, for example, includes the dataset type, time interval, DOI, related publications, and authors. The dataset type field provides a connection to all other datasets of the same type. The author's page shows, among other information, related datasets and co-authors. Information previously spread across several unconnected databases is now stored in a single location. In addition to VIVO's default display, the new database can be queried using SPARQL, a query language for semantic data. EarthCollab is extending the VIVO web application. One such extension is the ability to cross-link separate VIVO instances across institutions, allowing local display of externally curated information. For example, Cornell's VIVO faculty pages will display UNAVCO's dataset information and UNAVCO's VIVO will display Cornell faculty member contact and position information. About half of UNAVCO's membership is international and we hope to connect our data to institutions in other countries with a similar approach. Additional extensions, including enhanced geospatial capabilities, will be developed based on task-centered usability testing.
Marshall, Roger J; Zhang, Zhongqian; Broad, Joanna B; Wells, Sue
2007-06-01
To assess agreement between ethnicity as recorded by two independent databases in New Zealand, PREDICT and the National Health Index (NHI), and to assess sensitivity of ethnic-specific measures of health outcomes to either ethnicity record. Patients assessed using PREDICT form the study cohort. Ethnicity was recorded for PREDICT and an associated NHI ethnicity code was identified by merge-match linking on an encrypted NHI number. Agreement between ethnicity measures was assessed by kappa scores and scaled rectangle diagrams. A cohort of 18,239 individuals was linked in both PREDICT and NHI databases. The agreement between ethnicity classifications was reasonably good, with overall kappa coefficient of 0.82. There was better agreement for women than men and agreement improved with age and with time since the PREDICT system has been operational. Ethnic-specific cardiovascular (CVD) hospital admission rates were sensitive to ethnicity coding by NHI or PREDICT; rate ratios for ethnic groups, relative to European, based on PREDICT were attenuated towards the null relative to the NHI classification. Agreement between ethnicity was moderately good. Discordances that do exist do not have a substantial effect on prevalence-based measures of effect; however, they do on measurement of the admission of CVD. Different categorisations of ethnicity data from routine (and other) databases can lead to different ethnic-specific estimates of epidemiological effects. There is an imperative to record ethnicity in a rational, systematic and consistent way.
L.N. Hudson; T. Newbold; S. Contu
2014-01-01
Biodiversity continues to decline in the face of increasing anthropogenic pressures such as habitat destruction, exploitation, pollution and introduction of alien species. Existing global databases of speciesâ threat status or population time series are dominated by charismatic species. The collation of datasets with broad taxonomic and biogeographic extents, and that...
Solving Relational Database Problems with ORDBMS in an Advanced Database Course
ERIC Educational Resources Information Center
Wang, Ming
2011-01-01
This paper introduces how to use the object-relational database management system (ORDBMS) to solve relational database (RDB) problems in an advanced database course. The purpose of the paper is to provide a guideline for database instructors who desire to incorporate the ORDB technology in their traditional database courses. The paper presents…
HCE Research Coordination Directorate (ReCoorD Database)
2016-04-27
portfolio management is often hidden within broader mission scopes and visibility into those portfolio is often limited at best. Current specialty...specific tracking databases do not exist. Current broad-sweeping portfolio management tools do not exist (not true--define terms?). The HCE receives...requests from a variety of oversight bodies for reports on the current state of project-through- portfolio efforts. Tools such as NIH’s Reporter, while still in development, do not yet appear to meet HCE element requirements.
Fault displacement hazard assessment for nuclear installations based on IAEA safety standards
NASA Astrophysics Data System (ADS)
Fukushima, Y.
2016-12-01
In the IAEA Safety NS-R-3, surface fault displacement hazard assessment (FDHA) is required for the siting of nuclear installations. If any capable faults exist in the candidate site, IAEA recommends the consideration of alternative sites. However, due to the progress in palaeoseismological investigations, capable faults may be found in existing site. In such a case, IAEA recommends to evaluate the safety using probabilistic FDHA (PFDHA), which is an empirical approach based on still quite limited database. Therefore a basic and crucial improvement is to increase the database. In 2015, IAEA produced a TecDoc-1767 on Palaeoseismology as a reference for the identification of capable faults. Another IAEA Safety Report 85 on ground motion simulation based on fault rupture modelling provides an annex introducing recent PFDHAs and fault displacement simulation methodologies. The IAEA expanded the project of FDHA for the probabilistic approach and the physics based fault rupture modelling. The first approach needs a refinement of the empirical methods by building a world wide database, and the second approach needs to shift from kinematic to the dynamic scheme. Both approaches can complement each other, since simulated displacement can fill the gap of a sparse database and geological observations can be useful to calibrate the simulations. The IAEA already supported a workshop in October 2015 to discuss the existing databases with the aim of creating a common worldwide database. A consensus of a unified database was reached. The next milestone is to fill the database with as many fault rupture data sets as possible. Another IAEA work group had a WS in November 2015 to discuss the state-of-the-art PFDHA as well as simulation methodologies. Two groups jointed a consultancy meeting in February 2016, shared information, identified issues, discussed goals and outputs, and scheduled future meetings. Now we may aim at coordinating activities for the whole FDHA tasks jointly.
Dunbrack, Roland L.
2012-01-01
Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM–HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. Availability: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly. Contact: Roland.Dunbracks@fccc.edu PMID:22942020
Proteomics data exchange and storage: the need for common standards and public repositories.
Jiménez, Rafael C; Vizcaíno, Juan Antonio
2013-01-01
Both the existence of data standards and public databases or repositories have been key factors behind the development of the existing "omics" approaches. In this book chapter we first review the main existing mass spectrometry (MS)-based proteomics resources: PRIDE, PeptideAtlas, GPMDB, and Tranche. Second, we report on the current status of the different proteomics data standards developed by the Proteomics Standards Initiative (PSI): the formats mzML, mzIdentML, mzQuantML, TraML, and PSI-MI XML are then reviewed. Finally, we present an easy way to query and access MS proteomics data in the PRIDE database, as a representative of the existing repositories, using the workflow management system (WMS) tool Taverna. Two different publicly available workflows are explained and described.
Towards linked open gene mutations data
2012-01-01
Background With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework. In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data. Methods A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest. Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite. Results We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application. Conclusions This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development. The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine. PMID:22536974
Towards linked open gene mutations data.
Zappa, Achille; Splendiani, Andrea; Romano, Paolo
2012-03-28
With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework. In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data. A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest. Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite. We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application. This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development.The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine.
DECADE Web Portal: Integrating MaGa, EarthChem and GVP Will Further Our Knowledge on Earth Degassing
NASA Astrophysics Data System (ADS)
Cardellini, C.; Frigeri, A.; Lehnert, K. A.; Ash, J.; McCormick, B.; Chiodini, G.; Fischer, T. P.; Cottrell, E.
2014-12-01
The release of gases from the Earth's interior to the exosphere takes place in both volcanic and non-volcanic areas of the planet. Fully understanding this complex process requires the integration of geochemical, petrological and volcanological data. At present, major online data repositories relevant to studies of degassing are not linked and interoperable. We are developing interoperability between three of those, which will support more powerful synoptic studies of degassing. The three data systems that will make their data accessible via the DECADE portal are: (1) the Smithsonian Institution's Global Volcanism Program database (GVP) of volcanic activity data, (2) EarthChem databases for geochemical and geochronological data of rocks and melt inclusions, and (3) the MaGa database (Mapping Gas emissions) which contains compositional and flux data of gases released at volcanic and non-volcanic degassing sites. These databases are developed and maintained by institutions or groups of experts in a specific field, and data are archived in formats specific to these databases. In the framework of the Deep Earth Carbon Degassing (DECADE) initiative of the Deep Carbon Observatory (DCO), we are developing a web portal that will create a powerful search engine of these databases from a single entry point. The portal will return comprehensive multi-component datasets, based on the search criteria selected by the user. For example, a single geographic or temporal search will return data relating to compositions of emitted gases and erupted products, the age of the erupted products, and coincident activity at the volcano. The development of this level of capability for the DECADE Portal requires complete synergy between these databases, including availability of standard-based web services (WMS, WFS) at all data systems. Data and metadata can thus be extracted from each system without interfering with each database's local schema or being replicated to achieve integration at the DECADE web portal. The DECADE portal will enable new synoptic perspectives on the Earth degassing process. Other data systems can be easily plugged in using the existing framework. Our vision is to explore Earth degassing related datasets over previously unexplored spatial or temporal ranges.
Term Coverage of Dietary Supplements Ingredients in Product Labels.
Wang, Yefeng; Adam, Terrence J; Zhang, Rui
2016-01-01
As the clinical application and consumption of dietary supplements has grown, their side effects and possible interactions with prescribed medications has become a serious issue. Information extraction of dietary supplement related information is a critical need to support dietary supplement research. However, there currently is not an existing terminology for dietary supplements, placing a barrier for informatics research in this field. The terms related to dietary supplement ingredients should be collected and normalized before a terminology can be established to facilitate convenient search on safety information and control possible adverse effects of dietary supplements. In this study, the Dietary Supplement Label Database (DSLD) was chosen as the data source from which the ingredient information was extracted and normalized. The distribution based on the product type and the ingredient type of the dietary supplements were analyzed. The ingredient terms were then mapped to the existing terminologies, including UMLS, RxNorm and NDF-RT by using MetaMap and RxMix. The large gap between existing terminologies and ingredients were found: only 14.67%, 19.65%, and 12.88% of ingredient terms were covered by UMLS, RxNorm and NDF-RT, respectively.
User assumptions about information retrieval systems: Ethical concerns
DOE Office of Scientific and Technical Information (OSTI.GOV)
Froehlich, T.J.
Information professionals, whether designers, intermediaries, database producers or vendors, bear some responsibility for the information that they make available to users of information systems. The users of such systems may tend to make many assumptions about the information that a system provides, such as believing: that the data are comprehensive, current and accurate, that the information resources or databases have same degree of quality and consistency of indexing; that the abstracts, if they exist, correctly and adequate reflect the content of the article; that there is consistency informs of author names or journal titles or indexing within and across databases;more » that there is standardization in and across databases; that once errors are detected, they are corrected; that appropriate choices of databases or information resources are a relatively easy matter, etc. The truth is that few of these assumptions are valid in commercia or corporate or organizational databases. However, given these beliefs and assumptions by many users, often promoted by information providers, information professionals, impossible, should intervene to warn users about the limitations and constraints of the databases they are using. With the growth of the Internet and end-user products (e.g., CD-ROMs), such interventions have significantly declined. In such cases, information should be provided on start-up or through interface screens, indicating to users, the constraints and orientation of the system they are using. The principle of {open_quotes}caveat emptor{close_quotes} is naive and socially irresponsible: information professionals or systems have an obligation to provide some framework or context for the information that users are accessing.« less
Malin, Bradley; Karp, David; Scheuermann, Richard H
2010-01-01
Clinical researchers need to share data to support scientific validation and information reuse and to comply with a host of regulations and directives from funders. Various organizations are constructing informatics resources in the form of centralized databases to ensure reuse of data derived from sponsored research. The widespread use of such open databases is contingent on the protection of patient privacy. We review privacy-related problems associated with data sharing for clinical research from technical and policy perspectives. We investigate existing policies for secondary data sharing and privacy requirements in the context of data derived from research and clinical settings. In particular, we focus on policies specified by the US National Institutes of Health and the Health Insurance Portability and Accountability Act and touch on how these policies are related to current and future use of data stored in public database archives. We address aspects of data privacy and identifiability from a technical, although approachable, perspective and summarize how biomedical databanks can be exploited and seemingly anonymous records can be reidentified using various resources without hacking into secure computer systems. We highlight which clinical and translational data features, specified in emerging research models, are potentially vulnerable or exploitable. In the process, we recount a recent privacy-related concern associated with the publication of aggregate statistics from pooled genome-wide association studies that have had a significant impact on the data sharing policies of National Institutes of Health-sponsored databanks. Based on our analysis and observations we provide a list of recommendations that cover various technical, legal, and policy mechanisms that open clinical databases can adopt to strengthen data privacy protection as they move toward wider deployment and adoption.
Accessing and distributing EMBL data using CORBA (common object request broker architecture).
Wang, L; Rodriguez-Tomé, P; Redaschi, N; McNeil, P; Robinson, A; Lijnzaad, P
2000-01-01
The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data. A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism. The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems.
Accessing and distributing EMBL data using CORBA (common object request broker architecture)
Wang, Lichun; Rodriguez-Tomé, Patricia; Redaschi, Nicole; McNeil, Phil; Robinson, Alan; Lijnzaad, Philip
2000-01-01
Background: The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data. Results: A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism. Conclusions: The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems. PMID:11178259
Malin, Bradley; Karp, David; Scheuermann, Richard H.
2010-01-01
Clinical researchers need to share data to support scientific validation and information reuse, and to comply with a host of regulations and directives from funders. Various organizations are constructing informatics resources in the form of centralized databases to ensure widespread availability of data derived from sponsored research. The widespread use of such open databases is contingent on the protection of patient privacy. In this paper, we review several aspects of the privacy-related problems associated with data sharing for clinical research from technical and policy perspectives. We begin with a review of existing policies for secondary data sharing and privacy requirements in the context of data derived from research and clinical settings. In particular, we focus on policies specified by the U.S. National Institutes of Health and the Health Insurance Portability and Accountability Act and touch upon how these policies are related to current, as well as future, use of data stored in public database archives. Next, we address aspects of data privacy and “identifiability” from a more technical perspective, and review how biomedical databanks can be exploited and seemingly anonymous records can be “re-identified” using various resources without compromising or hacking into secure computer systems. We highlight which data features specified in clinical research data models are potentially vulnerable or exploitable. In the process, we recount a recent privacy-related concern associated with the publication of aggregate statistics from pooled genome-wide association studies that has had a significant impact on the data sharing policies of NIH-sponsored databanks. Finally, we conclude with a list of recommendations that cover various technical, legal, and policy mechanisms that open clinical databases can adopt to strengthen data privacy protections as they move toward wider deployment and adoption. PMID:20051768
BIOZON: a system for unification, management and analysis of heterogeneous biological data.
Birkland, Aaron; Yona, Golan
2006-02-15
Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. Here we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated. The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at http://biozon.org.
Yu, Kebing; Salomon, Arthur R.
2010-01-01
Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through tandem mass spectrometry (MS/MS). Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to a variety of experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our High Throughput Autonomous Proteomic Pipeline (HTAPP) used in the automated acquisition and post-acquisition analysis of proteomic data. PMID:19834895
Cruella: developing a scalable tissue microarray data management system.
Cowan, James D; Rimm, David L; Tuck, David P
2006-06-01
Compared with DNA microarray technology, relatively little information is available concerning the special requirements, design influences, and implementation strategies of data systems for tissue microarray technology. These issues include the requirement to accommodate new and different data elements for each new project as well as the need to interact with pre-existing models for clinical, biological, and specimen-related data. To design and implement a flexible, scalable tissue microarray data storage and management system that could accommodate information regarding different disease types and different clinical investigators, and different clinical investigation questions, all of which could potentially contribute unforeseen data types that require dynamic integration with existing data. The unpredictability of the data elements combined with the novelty of automated analysis algorithms and controlled vocabulary standards in this area require flexible designs and practical decisions. Our design includes a custom Java-based persistence layer to mediate and facilitate interaction with an object-relational database model and a novel database schema. User interaction is provided through a Java Servlet-based Web interface. Cruella has become an indispensable resource and is used by dozens of researchers every day. The system stores millions of experimental values covering more than 300 biological markers and more than 30 disease types. The experimental data are merged with clinical data that has been aggregated from multiple sources and is available to the researchers for management, analysis, and export. Cruella addresses many of the special considerations for managing tissue microarray experimental data and the associated clinical information. A metadata-driven approach provides a practical solution to many of the unique issues inherent in tissue microarray research, and allows relatively straightforward interoperability with and accommodation of new data models.
[Effects of soil data and map scale on assessment of total phosphorus storage in upland soils.
Li, Heng Rong; Zhang, Li Ming; Li, Xiao di; Yu, Dong Sheng; Shi, Xue Zheng; Xing, Shi He; Chen, Han Yue
2016-06-01
Accurate assessment of total phosphorus storage in farmland soils is of great significance to sustainable agricultural and non-point source pollution control. However, previous studies haven't considered the estimation errors from mapping scales and various databases with different sources of soil profile data. In this study, a total of 393×10 4 hm 2 of upland in the 29 counties (or cities) of North Jiangsu was cited as a case for study. Analysis was performed of how the four sources of soil profile data, namely, "Soils of County", "Soils of Prefecture", "Soils of Province" and "Soils of China", and the six scales, i.e. 1:50000, 1:250000, 1:500000, 1:1000000, 1:4000000 and1:10000000, used in the 24 soil databases established for the four soil journals, affected assessment of soil total phosphorus. Compared with the most detailed 1:50000 soil database established with 983 upland soil profiles, relative deviation of the estimates of soil total phosphorus density (STPD) and soil total phosphorus storage (STPS) from the other soil databases varied from 4.8% to 48.9% and from 1.6% to 48.4%, respectively. The estimated STPD and STPS based on the 1:50000 database of "Soils of County" and most of the estimates based on the databases of each scale in "Soils of County" and "Soils of Prefecture" were different, with the significance levels of P<0.001 or P<0.05. Extremely significant differences (P<0.001) existed between the estimates based on the 1:50000 database of "Soils of County" and the estimates based on the databases of each scale in "Soils of Province" and "Soils of China". This study demonstrated the significance of appropriate soil data sources and appropriate mapping scales in estimating STPS.
The use of a medico economic database as a part of French apheresis registry.
Kanouni, T; Aubas, P; Heshmati, F
2017-02-01
An apheresis registry is a part of each learned apheresis society. The interest in this is obvious, in terms of knowledge of the practice of apheresis, adverse events, and technical issues. However, because of the weight of data entry it could never be exhaustive and some data will be missing. While continuing our registry efforts and our efforts to match with other existing registries, we decided to extend the data collection to a medico-economic database that is available in France, the Programme de Médicalisation du Système d'Information (PMSI) that has covered reimbursement information for each public or private hospital since 2007. It contains almost all apheresis procedures in all apheresis fields, demographic patient data, and primary and related diagnoses, among other data. Although this data does not include technical apheresis issues or other complications of the procedures, its interest is great and it is complementary to the registry. From 2003-2014, we have recorded 250,585 apheresis procedures, for 48,428 patients. We showed that the data are reliable and exhaustive. The information shows a perfect real life practice in apheresis, regarding indications, the rhythm and the duration of apheresis treatment. This prospective data collection is sustainable and allows us to assess the impact of healthcare guidelines. Our objective is to extend the data collection and match it to other existing databases; this will allow us to conduct, for example, a cohort study specifically for ECP. Copyright © 2016 Elsevier Ltd. All rights reserved.
Identifying work-related motor vehicle crashes in multiple databases.
Thomas, Andrea M; Thygerson, Steven M; Merrill, Ray M; Cook, Lawrence J
2012-01-01
To compare and estimate the magnitude of work-related motor vehicle crashes in Utah using 2 probabilistically linked statewide databases. Data from 2006 and 2007 motor vehicle crash and hospital databases were joined through probabilistic linkage. Summary statistics and capture-recapture were used to describe occupants injured in work-related motor vehicle crashes and estimate the size of this population. There were 1597 occupants in the motor vehicle crash database and 1673 patients in the hospital database identified as being in a work-related motor vehicle crash. We identified 1443 occupants with at least one record from either the motor vehicle crash or hospital database indicating work-relatedness that linked to any record in the opposing database. We found that 38.7 percent of occupants injured in work-related motor vehicle crashes identified in the motor vehicle crash database did not have a primary payer code of workers' compensation in the hospital database and 40.0 percent of patients injured in work-related motor vehicle crashes identified in the hospital database did not meet our definition of a work-related motor vehicle crash in the motor vehicle crash database. Depending on how occupants injured in work-related motor crashes are identified, we estimate the population to be between 1852 and 8492 in Utah for the years 2006 and 2007. Research on single databases may lead to biased interpretations of work-related motor vehicle crashes. Combining 2 population based databases may still result in an underestimate of the magnitude of work-related motor vehicle crashes. Improved coding of work-related incidents is needed in current databases.
Performance assessment of EMR systems based on post-relational database.
Yu, Hai-Yan; Li, Jing-Song; Zhang, Xiao-Guang; Tian, Yu; Suzuki, Muneou; Araki, Kenji
2012-08-01
Post-relational databases provide high performance and are currently widely used in American hospitals. As few hospital information systems (HIS) in either China or Japan are based on post-relational databases, here we introduce a new-generation electronic medical records (EMR) system called Hygeia, which was developed with the post-relational database Caché and the latest platform Ensemble. Utilizing the benefits of a post-relational database, Hygeia is equipped with an "integration" feature that allows all the system users to access data-with a fast response time-anywhere and at anytime. Performance tests of databases in EMR systems were implemented in both China and Japan. First, a comparison test was conducted between a post-relational database, Caché, and a relational database, Oracle, embedded in the EMR systems of a medium-sized first-class hospital in China. Second, a user terminal test was done on the EMR system Izanami, which is based on the identical database Caché and operates efficiently at the Miyazaki University Hospital in Japan. The results proved that the post-relational database Caché works faster than the relational database Oracle and showed perfect performance in the real-time EMR system.
Privacy-preserving heterogeneous health data sharing.
Mohammed, Noman; Jiang, Xiaoqian; Chen, Rui; Fung, Benjamin C M; Ohno-Machado, Lucila
2013-05-01
Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversary's background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing relational and set-valued data in a privacy-preserving manner separately. In this paper, we propose an algorithm that considers both relational and set-valued data in differentially private disclosure of healthcare data. The proposed approach makes a simple yet fundamental switch in differentially private algorithm design: instead of listing all possible records (ie, a contingency table) for noise addition, records are generalized before noise addition. The algorithm first generalizes the raw data in a probabilistic way, and then adds noise to guarantee ε-differential privacy. We showed that the disclosed data could be used effectively to build a decision tree induction classifier. Experimental results demonstrated that the proposed algorithm is scalable and performs better than existing solutions for classification analysis. The resulting utility may degrade when the output domain size is very large, making it potentially inappropriate to generate synthetic data for large health databases. Unlike existing techniques, the proposed algorithm allows the disclosure of health data containing both relational and set-valued data in a differentially private manner, and can retain essential information for discriminative analysis.
Privacy-preserving heterogeneous health data sharing
Mohammed, Noman; Jiang, Xiaoqian; Chen, Rui; Fung, Benjamin C M; Ohno-Machado, Lucila
2013-01-01
Objective Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversary's background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing relational and set-valued data in a privacy-preserving manner separately. In this paper, we propose an algorithm that considers both relational and set-valued data in differentially private disclosure of healthcare data. Methods The proposed approach makes a simple yet fundamental switch in differentially private algorithm design: instead of listing all possible records (ie, a contingency table) for noise addition, records are generalized before noise addition. The algorithm first generalizes the raw data in a probabilistic way, and then adds noise to guarantee ε-differential privacy. Results We showed that the disclosed data could be used effectively to build a decision tree induction classifier. Experimental results demonstrated that the proposed algorithm is scalable and performs better than existing solutions for classification analysis. Limitation The resulting utility may degrade when the output domain size is very large, making it potentially inappropriate to generate synthetic data for large health databases. Conclusions Unlike existing techniques, the proposed algorithm allows the disclosure of health data containing both relational and set-valued data in a differentially private manner, and can retain essential information for discriminative analysis. PMID:23242630
Choosing a genome browser for a Model Organism Database: surveying the Maize community
Sen, Taner Z.; Harper, Lisa C.; Schaeffer, Mary L.; Andorf, Carson M.; Seigfried, Trent E.; Campbell, Darwin A.; Lawrence, Carolyn J.
2010-01-01
As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchers’ needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchers’ needs. Here, we document the survey’s outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly. Database URL: http://gbrowse.maizegdb.org/ PMID:20627860
Gupta, Priyanka; Schomburg, John; Krishna, Suprita; Adejoro, Oluwakayode; Wang, Qi; Marsh, Benjamin; Nguyen, Andrew; Genere, Juan Reyes; Self, Patrick; Lund, Erik; Konety, Badrinath R
2017-01-01
To examine the Manufacturer and User Facility Device Experience Database (MAUDE) database to capture adverse events experienced with the Da Vinci Surgical System. In addition, to design a standardized classification system to categorize the complications and machine failures associated with the device. Overall, 1,057,000 DaVinci procedures were performed in the United States between 2009 and 2012. Currently, no system exists for classifying and comparing device-related errors and complications with which to evaluate adverse events associated with the Da Vinci Surgical System. The MAUDE database was queried for events reports related to the DaVinci Surgical System between the years 2009 and 2012. A classification system was developed and tested among 14 robotic surgeons to associate a level of severity with each event and its relationship to the DaVinci Surgical System. Events were then classified according to this system and examined by using Chi-square analysis. Two thousand eight hundred thirty-seven events were identified, of which 34% were obstetrics and gynecology (Ob/Gyn); 19%, urology; 11%, other; and 36%, not specified. Our classification system had moderate agreement with a Kappa score of 0.52. Using our classification system, we identified 75% of the events as mild, 18% as moderate, 4% as severe, and 3% as life threatening or resulting in death. Seventy-seven percent were classified as definitely related to the device, 15% as possibly related, and 8% as not related. Urology procedures compared with Ob/Gyn were associated with more severe events (38% vs 26%, p < 0.0001). Energy instruments were associated with less severe events compared with the surgical system (8% vs 87%, p < 0.0001). Events that were definitely associated with the device tended to be less severe (81% vs 19%, p < 0.0001). Our classification system is a valid tool with moderate inter-rater agreement that can be used to better understand device-related adverse events. The majority of robotic related events were mild but associated with the device.
Saminsky, Michael
2017-12-01
Dental caries and periodontal disease are the most common oral diseases. Their link to disorders of endocrine system is of high interest. Most of the available data relates to the adult population, though its importance among children and adolescents is paramount. To review the existing evidence examining the link between these clinical conditions among children and adolescents. Electronic bibliographic databases and hand searches of relevant publications, based on prepared list of relevant key-words was performed. Paucity of existing data leaves the question of association between most endocrine disorders of the youth with dental caries and periodontal disease, inconclusive, apart from obesity and diabetes mellitus, where it seems to be elucidated. A profound research should be done in order to amend our understanding to what extent, if at all, exists the link between these oral maladies and different pediatric endocrine disorders. Copyright© of YS Medical Media ltd.
Rothwell, Joseph A.; Perez-Jimenez, Jara; Neveu, Vanessa; Medina-Remón, Alexander; M'Hiri, Nouha; García-Lobato, Paula; Manach, Claudine; Knox, Craig; Eisner, Roman; Wishart, David S.; Scalbert, Augustin
2013-01-01
Polyphenols are a major class of bioactive phytochemicals whose consumption may play a role in the prevention of a number of chronic diseases such as cardiovascular diseases, type II diabetes and cancers. Phenol-Explorer, launched in 2009, is the only freely available web-based database on the content of polyphenols in food and their in vivo metabolism and pharmacokinetics. Here we report the third release of the database (Phenol-Explorer 3.0), which adds data on the effects of food processing on polyphenol contents in foods. Data on >100 foods, covering 161 polyphenols or groups of polyphenols before and after processing, were collected from 129 peer-reviewed publications and entered into new tables linked to the existing relational design. The effect of processing on polyphenol content is expressed in the form of retention factor coefficients, or the proportion of a given polyphenol retained after processing, adjusted for change in water content. The result is the first database on the effects of food processing on polyphenol content and, following the model initially defined for Phenol-Explorer, all data may be traced back to original sources. The new update will allow polyphenol scientists to more accurately estimate polyphenol exposure from dietary surveys. Database URL: http://www.phenol-explorer.eu PMID:24103452
Development of a Life History Database for Upper Mississippi River Fishes
2007-05-01
prevailing ecological and river theories with existing empirical data, investigating anthropogenic controls on functional attributes of ecosystems...2001; 2005a). database closely reflect the ecological attributes Finally, the life history database will allow the of UMRS fish species. These...34 Functional Feeding Guilds attribute class provide information on reproductive capacity, timing and mode for UMRS fish species. Our first example used the
Hammond, Davyda; Conlon, Kathryn; Barzyk, Timothy; Chahine, Teresa; Zartarian, Valerie; Schultz, Brad
2011-03-01
Communities are concerned over pollution levels and seek methods to systematically identify and prioritize the environmental stressors in their communities. Geographic information system (GIS) maps of environmental information can be useful tools for communities in their assessment of environmental-pollution-related risks. Databases and mapping tools that supply community-level estimates of ambient concentrations of hazardous pollutants, risk, and potential health impacts can provide relevant information for communities to understand, identify, and prioritize potential exposures and risk from multiple sources. An assessment of existing databases and mapping tools was conducted as part of this study to explore the utility of publicly available databases, and three of these databases were selected for use in a community-level GIS mapping application. Queried data from the U.S. EPA's National-Scale Air Toxics Assessment, Air Quality System, and National Emissions Inventory were mapped at the appropriate spatial and temporal resolutions for identifying risks of exposure to air pollutants in two communities. The maps combine monitored and model-simulated pollutant and health risk estimates, along with local survey results, to assist communities with the identification of potential exposure sources and pollution hot spots. Findings from this case study analysis will provide information to advance the development of new tools to assist communities with environmental risk assessments and hazard prioritization. © 2010 Society for Risk Analysis.
CoReCG: a comprehensive database of genes associated with colon-rectal cancer
Agarwal, Rahul; Kumar, Binayak; Jayadev, Msk; Raghav, Dhwani; Singh, Ashutosh
2016-01-01
Cancer of large intestine is commonly referred as colorectal cancer, which is also the third most frequently prevailing neoplasm across the globe. Though, much of work is being carried out to understand the mechanism of carcinogenesis and advancement of this disease but, fewer studies has been performed to collate the scattered information of alterations in tumorigenic cells like genes, mutations, expression changes, epigenetic alteration or post translation modification, genetic heterogeneity. Earlier findings were mostly focused on understanding etiology of colorectal carcinogenesis but less emphasis were given for the comprehensive review of the existing findings of individual studies which can provide better diagnostics based on the suggested markers in discrete studies. Colon Rectal Cancer Gene Database (CoReCG), contains 2056 colon-rectal cancer genes information involved in distinct colorectal cancer stages sourced from published literature with an effective knowledge based information retrieval system. Additionally, interactive web interface enriched with various browsing sections, augmented with advance search facility for querying the database is provided for user friendly browsing, online tools for sequence similarity searches and knowledge based schema ensures a researcher friendly information retrieval mechanism. Colorectal cancer gene database (CoReCG) is expected to be a single point source for identification of colorectal cancer-related genes, thereby helping with the improvement of classification, diagnosis and treatment of human cancers. Database URL: lms.snu.edu.in/corecg PMID:27114494
SSER: Species specific essential reactions database.
Labena, Abraham A; Ye, Yuan-Nong; Dong, Chuan; Zhang, Fa-Z; Guo, Feng-Biao
2017-04-19
Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database. SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser .
Virus taxonomy: the database of the International Committee on Taxonomy of Viruses (ICTV).
Lefkowitz, Elliot J; Dempsey, Donald M; Hendrickson, Robert Curtis; Orton, Richard J; Siddell, Stuart G; Smith, Donald B
2018-01-04
The International Committee on Taxonomy of Viruses (ICTV) is charged with the task of developing, refining, and maintaining a universal virus taxonomy. This task encompasses the classification of virus species and higher-level taxa according to the genetic and biological properties of their members; naming virus taxa; maintaining a database detailing the currently approved taxonomy; and providing the database, supporting proposals, and other virus-related information from an open-access, public web site. The ICTV web site (http://ictv.global) provides access to the current taxonomy database in online and downloadable formats, and maintains a complete history of virus taxa back to the first release in 1971. The ICTV has also published the ICTV Report on Virus Taxonomy starting in 1971. This Report provides a comprehensive description of all virus taxa covering virus structure, genome structure, biology and phylogenetics. The ninth ICTV report, published in 2012, is available as an open-access online publication from the ICTV web site. The current, 10th report (http://ictv.global/report/), is being published online, and is replacing the previous hard-copy edition with a completely open access, continuously updated publication. No other database or resource exists that provides such a comprehensive, fully annotated compendium of information on virus taxa and taxonomy. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Monteiro, Pedro Tiago; Pais, Pedro; Costa, Catarina; Manna, Sauvagya; Sá-Correia, Isabel; Teixeira, Miguel Cacho
2017-01-04
We present the PATHOgenic YEAst Search for Transcriptional Regulators And Consensus Tracking (PathoYeastract - http://pathoyeastract.org) database, a tool for the analysis and prediction of transcription regulatory associations at the gene and genomic levels in the pathogenic yeasts Candida albicans and C. glabrata Upon data retrieval from hundreds of publications, followed by curation, the database currently includes 28 000 unique documented regulatory associations between transcription factors (TF) and target genes and 107 DNA binding sites, considering 134 TFs in both species. Following the structure used for the YEASTRACT database, PathoYeastract makes available bioinformatics tools that enable the user to exploit the existing information to predict the TFs involved in the regulation of a gene or genome-wide transcriptional response, while ranking those TFs in order of their relative importance. Each search can be filtered based on the selection of specific environmental conditions, experimental evidence or positive/negative regulatory effect. Promoter analysis tools and interactive visualization tools for the representation of TF regulatory networks are also provided. The PathoYeastract database further provides simple tools for the prediction of gene and genomic regulation based on orthologous regulatory associations described for other yeast species, a comparative genomics setup for the study of cross-species evolution of regulatory networks. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Differences among Major Taxa in the Extent of Ecological Knowledge across Four Major Ecosystems
Fisher, Rebecca; Knowlton, Nancy; Brainard, Russell E.; Caley, M. Julian
2011-01-01
Existing knowledge shapes our understanding of ecosystems and is critical for ecosystem-based management of the world's natural resources. Typically this knowledge is biased among taxa, with some taxa far better studied than others, but the extent of this bias is poorly known. In conjunction with the publically available World Registry of Marine Species database (WoRMS) and one of the world's premier electronic scientific literature databases (Web of Science®), a text mining approach is used to examine the distribution of existing ecological knowledge among taxa in coral reef, mangrove, seagrass and kelp bed ecosystems. We found that for each of these ecosystems, most research has been limited to a few groups of organisms. While this bias clearly reflects the perceived importance of some taxa as commercially or ecologically valuable, the relative lack of research of other taxonomic groups highlights the problem that some key taxa and associated ecosystem processes they affect may be poorly understood or completely ignored. The approach outlined here could be applied to any type of ecosystem for analyzing previous research effort and identifying knowledge gaps in order to improve ecosystem-based conservation and management. PMID:22073172
Intrusion Detection in Database Systems
NASA Astrophysics Data System (ADS)
Javidi, Mohammad M.; Sohrabi, Mina; Rafsanjani, Marjan Kuchaki
Data represent today a valuable asset for organizations and companies and must be protected. Ensuring the security and privacy of data assets is a crucial and very difficult problem in our modern networked world. Despite the necessity of protecting information stored in database systems (DBS), existing security models are insufficient to prevent misuse, especially insider abuse by legitimate users. One mechanism to safeguard the information in these databases is to use an intrusion detection system (IDS). The purpose of Intrusion detection in database systems is to detect transactions that access data without permission. In this paper several database Intrusion detection approaches are evaluated.
ProbOnto: ontology and knowledge base of probability distributions.
Swat, Maciej J; Grenon, Pierre; Wimalaratne, Sarala
2016-09-01
Probability distributions play a central role in mathematical and statistical modelling. The encoding, annotation and exchange of such models could be greatly simplified by a resource providing a common reference for the definition of probability distributions. Although some resources exist, no suitably detailed and complex ontology exists nor any database allowing programmatic access. ProbOnto, is an ontology-based knowledge base of probability distributions, featuring more than 80 uni- and multivariate distributions with their defining functions, characteristics, relationships and re-parameterization formulas. It can be used for model annotation and facilitates the encoding of distribution-based models, related functions and quantities. http://probonto.org mjswat@ebi.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Applying the archetype approach to the database of a biobank information management system.
Späth, Melanie Bettina; Grimson, Jane
2011-03-01
The purpose of this study is to investigate the feasibility of applying the openEHR archetype approach to modelling the data in the database of an existing proprietary biobank information management system. A biobank information management system stores the clinical/phenotypic data of the sample donor and sample related information. The clinical/phenotypic data is potentially sourced from the donor's electronic health record (EHR). The study evaluates the reuse of openEHR archetypes that have been developed for the creation of an interoperable EHR in the context of biobanking, and proposes a new set of archetypes specifically for biobanks. The ultimate goal of the research is the development of an interoperable electronic biomedical research record (eBMRR) to support biomedical knowledge discovery. The database of the prostate cancer biobank of the Irish Prostate Cancer Research Consortium (PCRC), which supports the identification of novel biomarkers for prostate cancer, was taken as the basis for the modelling effort. First the database schema of the biobank was analyzed and reorganized into archetype-friendly concepts. Then, archetype repositories were searched for matching archetypes. Some existing archetypes were reused without change, some were modified or specialized, and new archetypes were developed where needed. The fields of the biobank database schema were then mapped to the elements in the archetypes. Finally, the archetypes were arranged into templates specifically to meet the requirements of the PCRC biobank. A set of 47 archetypes was found to cover all the concepts used in the biobank. Of these, 29 (62%) were reused without change, 6 were modified and/or extended, 1 was specialized, and 11 were newly defined. These archetypes were arranged into 8 templates specifically required for this biobank. A number of issues were encountered in this research. Some arose from the immaturity of the archetype approach, such as immature modelling support tools, difficulties in defining high-quality archetypes and the problem of overlapping archetypes. In addition, the identification of suitable existing archetypes was time-consuming and many semantic conflicts were encountered during the process of mapping the PCRC BIMS database to existing archetypes. These include differences in the granularity of documentation, in metadata-level versus data-level modelling, in terminologies and vocabularies used, and in the amount of structure imposed on the information to be recorded. Furthermore, the current way of modelling the sample entity was found to be cumbersome in the sample-centric activity of biobanking. The archetype approach is a promising approach to create a shareable eBMRR based on the study participant/donor for biobanks. Many archetypes originally developed for the EHR domain can be reused to model the clinical/phenotypic and sample information in the biobank context, which validates the genericity of these archetypes and their potential for reuse in the context of biomedical research. However, finding suitable archetypes in the repositories and establishing an exact mapping between the fields in the PCRC BIMS database and the elements of existing archetypes that have been designed for clinical practice can be challenging and time-consuming and involves resolving many common system integration conflicts. These may be attributable to differences in the requirements for information documentation between clinical practice and biobanking. This research also recognized the need for better support tools, modelling guidelines and best practice rules and reconfirmed the need for better domain knowledge governance. Furthermore, the authors propose that the establishment of an independent sample record with the sample as record subject should be investigated. The research presented in this paper is limited by the fact that the new archetypes developed during this research are based on a single biobank instance. These new archetypes may not be complete, representing only those subsets of items required by this particular database. Nevertheless, this exercise exposes some of the gaps that exist in the archetype modelling landscape and highlights the concepts that need to be modelled with archetypes to enable the development of an eBMRR. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Ghiorso, M. S.
2014-12-01
Computational thermodynamics (CT) has now become an essential tool of petrologic and geochemical research. CT is the basis for the construction of phase diagrams, the application of geothermometers and geobarometers, the equilibrium speciation of solutions, the construction of pseudosections, calculations of mass transfer between minerals, melts and fluids, and, it provides a means of estimating materials properties for the evaluation of constitutive relations in fluid dynamical simulations. The practical application of CT to Earth science problems requires data. Data on the thermochemical properties and the equation of state of relevant materials, and data on the relative stability and partitioning of chemical elements between phases as a function of temperature and pressure. These data must be evaluated and synthesized into a self consistent collection of theoretical models and model parameters that is colloquially known as a thermodynamic database. Quantitative outcomes derived from CT reply on the existence, maintenance and integrity of thermodynamic databases. Unfortunately, the community is reliant on too few such databases, developed by a small number of research groups, and mostly under circumstances where refinement and updates to the database lag behind or are unresponsive to need. Given the increasing level of reliance on CT calculations, what is required is a paradigm shift in the way thermodynamic databases are developed, maintained and disseminated. They must become community resources, with flexible and assessable software interfaces that permit easy modification, while at the same time maintaining theoretical integrity and fidelity to the underlying experimental observations. Advances in computational and data science give us the tools and resources to address this problem, allowing CT results to be obtained at the speed of thought, and permitting geochemical and petrological intuition to play a key role in model development and calibration.
A reservoir morphology database for the conterminous United States
Rodgers, Kirk D.
2017-09-13
The U.S. Geological Survey, in cooperation with the Reservoir Fisheries Habitat Partnership, combined multiple national databases to create one comprehensive national reservoir database and to calculate new morphological metrics for 3,828 reservoirs. These new metrics include, but are not limited to, shoreline development index, index of basin permanence, development of volume, and other descriptive metrics based on established morphometric formulas. The new database also contains modeled chemical and physical metrics. Because of the nature of the existing databases used to compile the Reservoir Morphology Database and the inherent missing data, some metrics were not populated. One comprehensive database will assist water-resource managers in their understanding of local reservoir morphology and water chemistry characteristics throughout the continental United States.
Mu, Wei-Qiang; Huang, Xia-Yu; Zhang, Jiang; Liu, Xiao-Cong; Huang, Mao-Mao
2018-04-09
Osteoporosis (OP) has been defined as a degenerative bone disease characterised by low bone mass and microstructural deterioration of bone tissue, leading to fragility and an increased risk of fractures, especially of the hip, spine and wrist. Exercise has been shown to benefit the maintenance of bone health and improvement of muscle strength, balance and coordination, thereby reducing the risk of falls and fractures. However, prior findings regarding the optimal types and regimens of exercise for treating low bone mineral density (BMD) in elderly people are not consistent. As an important component of traditional Chinese Qigong exercises, Tai Chi (TC) is an ancient art and science of healthcare derived from the martial arts. The objective of this study is to attempt to conduct a systematic review and meta-analysis of the existing studies on TC exercise as an intervention for the prevention or treatment of OP in elderly adults and to draw more useful conclusions regarding the safety and the effectiveness of TC in preventing or treating OP. Eight electronic databases (Science Citation Index, PubMed Database, Embase (Ovid) Database, the Cochrane Central Register of Controlled Trials, and Chinese databases, including Chinese BioMedical Database, China National Knowledge Infrastructure, Wanfang database and the Chongqing VIP Chinese Science and Technology Periodical Database) will be searched from the beginning of each database to 1 April 2018. Potential outcomes of interest will include rates of fractures or falls, BMD at the total hip and the total spine, bone formation biomarkers, bone resorption biomarkers, bone biomarkers, health-related quality of life and adverse events. Only randomised controlled trials comparing TC exercise against each other or non-intervention will be included. The Cochrane risk of bias assessment tool will be used for quality assessment. Ethical approval is not required as the study will be a review of existing studies. This review may help to elucidate whether TC exercise is effective for the prevention or treatment of OP in elderly adults. The findings of the study will be published in a peer-reviewed publication and will be disseminated electronically or in print. We will share the findings in the fourth quarter of 2018. CRD42018084950. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
An algorithm of discovering signatures from DNA databases on a computer cluster.
Lee, Hsiao Ping; Sheu, Tzu-Fang
2014-10-05
Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
The STEP (Safety and Toxicity of Excipients for Paediatrics) database: part 2 - the pilot version.
Salunke, Smita; Brandys, Barbara; Giacoia, George; Tuleu, Catherine
2013-11-30
The screening and careful selection of excipients is a critical step in paediatric formulation development as certain excipients acceptable in adult formulations, may not be appropriate for paediatric use. While there is extensive toxicity data that could help in better understanding and highlighting the gaps in toxicity studies, the data are often scattered around the information sources and saddled with incompatible data types and formats. This paper is the second in a series that presents the update on the Safety and Toxicity of Excipients for Paediatrics ("STEP") database being developed by Eu-US PFIs, and describes the architecture data fields and functions of the database. The STEP database is a user designed resource that compiles the safety and toxicity data of excipients that is scattered over various sources and presents it in one freely accessible source. Currently, in the pilot database data from over 2000 references/10 excipients presenting preclinical, clinical, regulatory information and toxicological reviews, with references and source links. The STEP database allows searching "FOR" excipients and "BY" excipients. This dual nature of the STEP database, in which toxicity and safety information can be searched in both directions, makes it unique from existing sources. If the pilot is successful, the aim is to increase the number of excipients in the existing database so that a database large enough to be of practical research use will be available. It is anticipated that this source will prove to be a useful platform for data management and data exchange of excipient safety information. Copyright © 2013 Elsevier B.V. All rights reserved.
The Space Systems Environmental Test Facility Database (SSETFD), Website Development Status
NASA Technical Reports Server (NTRS)
Snyder, James M.
2008-01-01
The Aerospace Corporation has been developing a database of U.S. environmental test laboratory capabilities utilized by the space systems hardware development community. To date, 19 sites have been visited by The Aerospace Corporation and verbal agreements reached to include their capability descriptions in the database. A website is being developed to make this database accessible by all interested government, civil, university and industry personnel. The website will be accessible by all interested in learning more about the extensive collective capability that the US based space industry has to offer. The Environments, Test & Assessment Department within The Aerospace Corporation will be responsible for overall coordination and maintenance of the database. Several US government agencies are interested in utilizing this database to assist in the source selection process for future spacecraft programs. This paper introduces the website by providing an overview of its development, location and search capabilities. It will show how the aerospace community can apply this new tool as a way to increase the utilization of existing lab facilities, and as a starting point for capital expenditure/upgrade trade studies. The long term result is expected to be increased utilization of existing laboratory capability and reduced overall development cost of space systems hardware. Finally, the paper will present the process for adding new participants, and how the database will be maintained.
Short Fiction on Film: A Relational DataBase.
ERIC Educational Resources Information Center
May, Charles
Short Fiction on Film is a database that was created and will run on DataRelator, a relational database manager created by Bill Finzer for the California State Department of Education in 1986. DataRelator was designed for use in teaching students database management skills and to provide teachers with examples of how a database manager might be…
Development Of New Databases For Tsunami Hazard Analysis In California
NASA Astrophysics Data System (ADS)
Wilson, R. I.; Barberopoulou, A.; Borrero, J. C.; Bryant, W. A.; Dengler, L. A.; Goltz, J. D.; Legg, M.; McGuire, T.; Miller, K. M.; Real, C. R.; Synolakis, C.; Uslu, B.
2009-12-01
The California Geological Survey (CGS) has partnered with other tsunami specialists to produce two statewide databases to facilitate the evaluation of tsunami hazard products for both emergency response and land-use planning and development. A robust, State-run tsunami deposit database is being developed that compliments and expands on existing databases from the National Geophysical Data Center (global) and the USGS (Cascadia). Whereas these existing databases focus on references or individual tsunami layers, the new State-maintained database concentrates on the location and contents of individual borings/trenches that sample tsunami deposits. These data provide an important observational benchmark for evaluating the results of tsunami inundation modeling. CGS is collaborating with and sharing the database entry form with other states to encourage its continued development beyond California’s coastline so that historic tsunami deposits can be evaluated on a regional basis. CGS is also developing an internet-based, tsunami source scenario database and forum where tsunami source experts and hydrodynamic modelers can discuss the validity of tsunami sources and their contribution to hazard assessments for California and other coastal areas bordering the Pacific Ocean. The database includes all distant and local tsunami sources relevant to California starting with the forty scenarios evaluated during the creation of the recently completed statewide series of tsunami inundation maps for emergency response planning. Factors germane to probabilistic tsunami hazard analyses (PTHA), such as event histories and recurrence intervals, are also addressed in the database and discussed in the forum. Discussions with other tsunami source experts will help CGS determine what additional scenarios should be considered in PTHA for assessing the feasibility of generating products of value to local land-use planning and development.
77 FR 28391 - Announcement of Requirements and Registration for “Ocular Imaging Challenge”
Federal Register 2010, 2011, 2012, 2013, 2014
2012-05-14
..., color, zoom, pan) Integrate with existing EHRs (e.g. ``single sign-on'') Where applicable, leverage and... existing office hardware platforms, and to integrate with existing EHR systems (e.g. ``single sign-on... on the acquisition devices in proprietary databases and file formats, and therefore have limited...
Class dependency of fuzzy relational database using relational calculus and conditional probability
NASA Astrophysics Data System (ADS)
Deni Akbar, Mohammad; Mizoguchi, Yoshihiro; Adiwijaya
2018-03-01
In this paper, we propose a design of fuzzy relational database to deal with a conditional probability relation using fuzzy relational calculus. In the previous, there are several researches about equivalence class in fuzzy database using similarity or approximate relation. It is an interesting topic to investigate the fuzzy dependency using equivalence classes. Our goal is to introduce a formulation of a fuzzy relational database model using the relational calculus on the category of fuzzy relations. We also introduce general formulas of the relational calculus for the notion of database operations such as ’projection’, ’selection’, ’injection’ and ’natural join’. Using the fuzzy relational calculus and conditional probabilities, we introduce notions of equivalence class, redundant, and dependency in the theory fuzzy relational database.
Berthold, Michael R.; Hedrick, Michael P.; Gilson, Michael K.
2015-01-01
Today’s large, public databases of protein–small molecule interaction data are creating important new opportunities for data mining and integration. At the same time, new graphical user interface-based workflow tools offer facile alternatives to custom scripting for informatics and data analysis. Here, we illustrate how the large protein-ligand database BindingDB may be incorporated into KNIME workflows as a step toward the integration of pharmacological data with broader biomolecular analyses. Thus, we describe a collection of KNIME workflows that access BindingDB data via RESTful webservices and, for more intensive queries, via a local distillation of the full BindingDB dataset. We focus in particular on the KNIME implementation of knowledge-based tools to generate informed hypotheses regarding protein targets of bioactive compounds, based on notions of chemical similarity. A number of variants of this basic approach are tested for seven existing drugs with relatively ill-defined therapeutic targets, leading to replication of some previously confirmed results and discovery of new, high-quality hits. Implications for future development are discussed. Database URL: www.bindingdb.org PMID:26384374
Molecular Oxygen in the Thermosphere: Issues and Measurement Strategies
NASA Astrophysics Data System (ADS)
Picone, J. M.; Hedin, A. E.; Drob, D. P.; Meier, R. R.; Bishop, J.; Budzien, S. A.
2002-05-01
We review the state of empirical knowledge regarding the distribution of molecular oxygen in the lower thermosphere (100-200 km), as embodied by the new NRLMSISE-00 empirical atmospheric model, its predecessors, and the underlying databases. For altitudes above 120 km, the two major classes of data (mass spectrometer and solar ultraviolet [UV] absorption) disagree significantly regarding the magnitude of the O2 density and the dependence on solar activity. As a result, the addition of the Solar Maximum Mission (SMM) data set (based on solar UV absorption) to the NRLMSIS database has directly impacted the new model, increasing the complexity of the model's formulation and generally reducing the thermospheric O2 density relative to MSISE-90. Beyond interest in the thermosphere itself, this issue materially affects detailed models of ionospheric chemistry and dynamics as well as modeling of the upper atmospheric airglow. Because these are key elements of both experimental and operational systems which measure and forecast the near-Earth space environment, we present strategies for augmenting the database through analysis of existing data and through future measurements in order to resolve this issue.
English semantic word-pair norms and a searchable Web portal for experimental stimulus creation.
Buchanan, Erin M; Holmes, Jessica L; Teasley, Marilee L; Hutchison, Keith A
2013-09-01
As researchers explore the complexity of memory and language hierarchies, the need to expand normed stimulus databases is growing. Therefore, we present 1,808 words, paired with their features and concept-concept information, that were collected using previously established norming methods (McRae, Cree, Seidenberg, & McNorgan Behavior Research Methods 37:547-559, 2005). This database supplements existing stimuli and complements the Semantic Priming Project (Hutchison, Balota, Cortese, Neely, Niemeyer, Bengson, & Cohen-Shikora 2010). The data set includes many types of words (including nouns, verbs, adjectives, etc.), expanding on previous collections of nouns and verbs (Vinson & Vigliocco Journal of Neurolinguistics 15:317-351, 2008). We describe the relation between our and other semantic norms, as well as giving a short review of word-pair norms. The stimuli are provided in conjunction with a searchable Web portal that allows researchers to create a set of experimental stimuli without prior programming knowledge. When researchers use this new database in tandem with previous norming efforts, precise stimuli sets can be created for future research endeavors.
Affective norms for 720 French words rated by children and adolescents (FANchild).
Monnier, Catherine; Syssau, Arielle
2017-10-01
FANchild (French Affective Norms for Children) provides norms of valence and arousal for a large corpus of French words (N = 720) rated by 908 French children and adolescents (ages 7, 9, 11, and 13). The ratings were made using the Self-Assessment Manikin (Lang, 1980). Because it combines evaluations of arousal and valence and includes ratings provided by 7-, 9-, 11-, and 13-year-olds, this database complements and extends existing French-language databases. Good response reliability was observed in each of the four age groups. Despite a significant level of consensus, we found age differences in both the valence and arousal ratings: Seven- and 9-year-old children gave higher mean valence and arousal ratings than did the other age groups. Moreover, the tendency to judge words positively (i.e., positive bias) decreased with age. This age- and sex-related database will enable French-speaking researchers to study how the emotional character of words influences their cognitive processing, and how this influence evolves with age. FANchild is available at https://www.researchgate.net/profile/Catherine_Monnier/contributions .
Price, Ronald N; Chandrasekhar, Arcot J; Tamirisa, Balaji
1990-01-01
The Department of Medicine at Loyola University Medical Center (LUMC) of Chicago has implemented a local area network (LAN) based Patient Information Management System (PIMS) as part of its integrated departmental database management system. PIMS consists of related database applications encompassing demographic information, current medications, problem lists, clinical data, prior events, and on-line procedure results. Integration into the existing departmental database system permits PIMS to capture and manipulate data in other departmental applications. Standardization of clinical data is accomplished through three data tables that verify diagnosis codes, procedures codes and a standardized set of clinical data elements. The modularity of the system, coupled with standardized data formats, allowed the development of a Patient Information Protocol System (PIPS). PIPS, a userdefinable protocol processor, provides physicians with individualized data entry or review screens customized for their specific research protocols or practice habits. Physician feedback indicates that the PIMS/PIPS combination enhances their ability to collect and review specific patient information by filtering large amount of clinical data.
NASA Astrophysics Data System (ADS)
Yang, YuGuang; Liu, ZhiChao; Chen, XiuBo; Zhou, YiHua; Shi, WeiMin
2017-12-01
Quantum channel noise may cause the user to obtain a wrong answer and thus misunderstand the database holder for existing QKD-based quantum private query (QPQ) protocols. In addition, an outside attacker may conceal his attack by exploiting the channel noise. We propose a new, robust QPQ protocol based on four-qubit decoherence-free (DF) states. In contrast to existing QPQ protocols against channel noise, only an alternative fixed sequence of single-qubit measurements is needed by the user (Alice) to measure the received DF states. This property makes it easy to implement the proposed protocol by exploiting current technologies. Moreover, to retain the advantage of flexible database queries, we reconstruct Alice's measurement operators so that Alice needs only conditioned sequences of single-qubit measurements.
ERIC Educational Resources Information Center
Lamothe, Alain R.
2011-01-01
The purpose of this paper is to report the results of a quantitative analysis exploring the interaction and relationship between the online database and electronic journal collections at the J. N. Desmarais Library of Laurentian University. A very strong relationship exists between the number of searches and the size of the online database…
A Database for Decision-Making in Training and Distributed Learning Technology
1998-04-01
developer must answer these questions: ♦ Who will develop the courseware? Should we outsource ? ♦ What media should we use? How much will it cost? ♦ What...to develop , the database can be useful for answering staffing questions and planning transitions to technology- assisted courses. The database...of distributed learning curricula in com- parison to traditional methods. To develop a military-wide distributed learning plan, the existing course
Acoustic Propagation Modeling in Shallow Water
1996-10-01
Oceanography La Jolla, California 92093-0701 (Received April 15, 1996) This paper provides references for the Navy’s existing databases . Various...a compilation of many aspects of high-frequency (OAML) contains a description of Navy models and acoustics. databases . The Navy’s use of shallow...become significant because the propagation path may involve many tens of bounces. A description of a reflectivity database is (b) Geometry for the
37 CFR 1.105 - Requirements for information.
Code of Federal Regulations, 2010 CFR
2010-07-01
... databases: The existence of any particularly relevant commercial database known to any of the inventors that... improvement, identification of what is being improved. (vii) In use: Identification of any use of the claimed... the use. (viii) Technical information known to applicant. Technical information known to applicant...
Resource Purpose:The Watershed Information Network is a set of about 30 web pages that are organized by topic. These pages access existing databases like the American Heritage Rivers Services database and Surf Your Watershed. WIN in itself has no data or data sets.
L...
Heterogeneous distributed query processing: The DAVID system
NASA Technical Reports Server (NTRS)
Jacobs, Barry E.
1985-01-01
The objective of the Distributed Access View Integrated Database (DAVID) project is the development of an easy to use computer system with which NASA scientists, engineers and administrators can uniformly access distributed heterogeneous databases. Basically, DAVID will be a database management system that sits alongside already existing database and file management systems. Its function is to enable users to access the data in other languages and file systems without having to learn the data manipulation languages. Given here is an outline of a talk on the DAVID project and several charts.
Shah, Sachin D.; Maltby, David R.
2010-01-01
The U.S. Geological Survey, in cooperation with the U.S. Army Corps of Engineers, compiled salinity-related water-quality data and information in a geodatabase containing more than 6,000 sampling sites. The geodatabase was designed as a tool for water-resource management and includes readily available digital data sources from the U.S. Geological Survey, U.S. Environmental Protection Agency, New Mexico Interstate Stream Commission, Sustainability of semi-Arid Hydrology and Riparian Areas, Paso del Norte Watershed Council, numerous other State and local databases, and selected databases maintained by the University of Arizona and New Mexico State University. Salinity information was compiled for an approximately 26,000-square-mile area of the Rio Grande Basin from the Rio Arriba-Sandoval County line, New Mexico, to Presidio, Texas. The geodatabase relates the spatial location of sampling sites with salinity-related water-quality data reported by multiple agencies. The sampling sites are stored in a geodatabase feature class; each site is linked by a relationship class to the corresponding sample and results stored in data tables.
The Binding Database: data management and interface design.
Chen, Xi; Lin, Yuhmei; Liu, Ming; Gilson, Michael K
2002-01-01
The large and growing body of experimental data on biomolecular binding is of enormous value in developing a deeper understanding of molecular biology, in developing new therapeutics, and in various molecular design applications. However, most of these data are found only in the published literature and are therefore difficult to access and use. No existing public database has focused on measured binding affinities and has provided query capabilities that include chemical structure and sequence homology searches. We have created Binding DataBase (BindingDB), a public, web-accessible database of measured binding affinities. BindingDB is based upon a relational data specification for describing binding measurements via Isothermal Titration Calorimetry (ITC) and enzyme inhibition. A corresponding XML Document Type Definition (DTD) is used to create and parse intermediate files during the on-line deposition process and will also be used for data interchange, including collection of data from other sources. The on-line query interface, which is constructed with Java Servlet technology, supports standard SQL queries as well as searches for molecules by chemical structure and sequence homology. The on-line deposition interface uses Java Server Pages and JavaBean objects to generate dynamic HTML and to store intermediate results. The resulting data resource provides a range of functionality with brisk response-times, and lends itself well to continued development and enhancement.
Integration and management of massive remote-sensing data based on GeoSOT subdivision model
NASA Astrophysics Data System (ADS)
Li, Shuang; Cheng, Chengqi; Chen, Bo; Meng, Li
2016-07-01
Owing to the rapid development of earth observation technology, the volume of spatial information is growing rapidly; therefore, improving query retrieval speed from large, rich data sources for remote-sensing data management systems is quite urgent. A global subdivision model, geographic coordinate subdivision grid with one-dimension integer coding on 2n-tree, which we propose as a solution, has been used in data management organizations. However, because a spatial object may cover several grids, ample data redundancy will occur when data are stored in relational databases. To solve this redundancy problem, we first combined the subdivision model with the spatial array database containing the inverted index. We proposed an improved approach for integrating and managing massive remote-sensing data. By adding a spatial code column in an array format in a database, spatial information in remote-sensing metadata can be stored and logically subdivided. We implemented our method in a Kingbase Enterprise Server database system and compared the results with the Oracle platform by simulating worldwide image data. Experimental results showed that our approach performed better than Oracle in terms of data integration and time and space efficiency. Our approach also offers an efficient storage management system for existing storage centers and management systems.
Rothwell, Joseph A; Perez-Jimenez, Jara; Neveu, Vanessa; Medina-Remón, Alexander; M'hiri, Nouha; García-Lobato, Paula; Manach, Claudine; Knox, Craig; Eisner, Roman; Wishart, David S; Scalbert, Augustin
2013-01-01
Polyphenols are a major class of bioactive phytochemicals whose consumption may play a role in the prevention of a number of chronic diseases such as cardiovascular diseases, type II diabetes and cancers. Phenol-Explorer, launched in 2009, is the only freely available web-based database on the content of polyphenols in food and their in vivo metabolism and pharmacokinetics. Here we report the third release of the database (Phenol-Explorer 3.0), which adds data on the effects of food processing on polyphenol contents in foods. Data on >100 foods, covering 161 polyphenols or groups of polyphenols before and after processing, were collected from 129 peer-reviewed publications and entered into new tables linked to the existing relational design. The effect of processing on polyphenol content is expressed in the form of retention factor coefficients, or the proportion of a given polyphenol retained after processing, adjusted for change in water content. The result is the first database on the effects of food processing on polyphenol content and, following the model initially defined for Phenol-Explorer, all data may be traced back to original sources. The new update will allow polyphenol scientists to more accurately estimate polyphenol exposure from dietary surveys.
Disability Diversity Training in the Workplace: Systematic Review and Future Directions.
Phillips, Brian N; Deiches, Jon; Morrison, Blaise; Chan, Fong; Bezyak, Jill L
2016-09-01
Purpose Misinformation and negative attitudes toward disability contribute to lower employment rates among people with disabilities. Diversity training is an intervention intended to improve intergroup relations and reduce prejudice. We conducted a systematic review to determine the use and effectiveness of disability diversity training aimed at improving employment outcomes for employees with disabilities. Methods Five databases were searched for peer-reviewed studies of disability diversity training interventions provided within the workplace. Studies identified for inclusion were assessed for quality of methodology. Results Of the total of 1322 articles identified by the search, three studies met the criteria for inclusion. Two of the three articles focused specifically on training to improve outcomes related to workplace injuries among existing employees. The other study provided an initial test of a more general disability diversity training program. Conclusions There is currently a lack of empirically validated diversity training programs that focus specifically on disability. A number of disability diversity trainings and resources exist, but none have been well researched. Related literature on diversity training and disability awareness suggests the possibility for enhancing diversity training practices through training design, content, participant, and outcomes considerations. By integrating best practices in workplace diversity training with existing disability training resources, practitioners and researchers may be able to design effective disability diversity training programs.
Albreht, T; Paulin, M
1999-01-01
The article describes the possibilities of planning of the health care providers' network enabled by the use of information technology. The cornerstone of such planning is the development and establishment of a quality database on health care providers, health care professionals and their employment statuses. Based on the analysis of information needs, a new database was developed for various users in health care delivery as well as for those in health insurance. The method of information engineering was used in the standard four steps of the information system construction, while the whole project was run in accordance with the principles of two internationally approved project management methods. Special attention was dedicated to a careful analysis of the users' requirements and we believe the latter to be fulfilled to a very large degree. The new NHCPD is a relational database which is set up in two important state institutions, the National Institute of Public Health and the Health Insurance Institute of Slovenia. The former is responsible for updating the database, while the latter is responsible for the technological side as well as for the implementation of data security and protection. NHCPD will be inter linked with several other existing applications in the area of health care, public health and health insurance. Several important state institutions and professional chambers are users of the database in question, thus integrating various aspects of the health care system in Slovenia. The setting up of a completely revised health care providers' database in Slovenia is an important step in the development of a uniform and integrated information system that would support top decision-making processes at the national level.
PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome
Sarika; Arora, Vasu; Iquebal, M. A.; Rai, Anil; Kumar, Dinesh
2013-01-01
Molecular markers play a significant role for crop improvement in desirable characteristics, such as high yield, resistance to disease and others that will benefit the crop in long term. Pigeonpea (Cajanus cajan L.) is the recently sequenced legume by global consortium led by ICRISAT (Hyderabad, India) and been analysed for gene prediction, synteny maps, markers, etc. We present PIgeonPEa Microsatellite DataBase (PIPEMicroDB) with an automated primer designing tool for pigeonpea genome, based on chromosome wise as well as location wise search of primers. Total of 123 387 Short Tandem Repeats (STRs) were extracted from pigeonpea genome, available in public domain using MIcroSAtellite tool (MISA). The database is an online relational database based on ‘three-tier architecture’ that catalogues information of microsatellites in MySQL and user-friendly interface is developed using PHP. Search for STRs may be customized by limiting their location on chromosome as well as number of markers in that range. This is a novel approach and is not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of selected markers with left and right flankings of size up to 500 bp. This will enable researchers to select markers of choice at desired interval over the chromosome. Furthermore, one can use individual STRs of a targeted region over chromosome to narrow down location of gene of interest or linked Quantitative Trait Loci (QTLs). Although it is an in silico approach, markers’ search based on characteristics and location of STRs is expected to be beneficial for researchers. Database URL: http://cabindb.iasri.res.in/pigeonpea/ PMID:23396298
PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome.
Sarika; Arora, Vasu; Iquebal, M A; Rai, Anil; Kumar, Dinesh
2013-01-01
Molecular markers play a significant role for crop improvement in desirable characteristics, such as high yield, resistance to disease and others that will benefit the crop in long term. Pigeonpea (Cajanus cajan L.) is the recently sequenced legume by global consortium led by ICRISAT (Hyderabad, India) and been analysed for gene prediction, synteny maps, markers, etc. We present PIgeonPEa Microsatellite DataBase (PIPEMicroDB) with an automated primer designing tool for pigeonpea genome, based on chromosome wise as well as location wise search of primers. Total of 123 387 Short Tandem Repeats (STRs) were extracted from pigeonpea genome, available in public domain using MIcroSAtellite tool (MISA). The database is an online relational database based on 'three-tier architecture' that catalogues information of microsatellites in MySQL and user-friendly interface is developed using PHP. Search for STRs may be customized by limiting their location on chromosome as well as number of markers in that range. This is a novel approach and is not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of selected markers with left and right flankings of size up to 500 bp. This will enable researchers to select markers of choice at desired interval over the chromosome. Furthermore, one can use individual STRs of a targeted region over chromosome to narrow down location of gene of interest or linked Quantitative Trait Loci (QTLs). Although it is an in silico approach, markers' search based on characteristics and location of STRs is expected to be beneficial for researchers. Database URL: http://cabindb.iasri.res.in/pigeonpea/
Efthimiadis, E N; Afifi, M
1996-01-01
OBJECTIVES: This study examined methods of accessing (for indexing and retrieval purposes) medical research on population groups in the major abstracting and indexing services of the health sciences literature. DESIGN: The study of diseases in specific population groups is facilitated by the indexing of both diseases and populations in a database. The MEDLINE, PsycINFO, and Embase databases were selected for the study. The published thesauri for these databases were examined to establish the vocabulary in use. Indexing terms were identified and examined as to their representation in the current literature. Terms were clustered further into groups thought to reflect an end user's perspective and to facilitate subsequent analysis. The medical literature contained in the three online databases was searched with both controlled vocabulary and natural language terms. RESULTS: The three thesauri revealed shallow pre-coordinated hierarchical structures, rather difficult-to-use terms for post-coordination, and a blurring of cultural, genetic, and racial facets of populations. Post-coordination is difficult because of the system-oriented terminology, which is intended mostly for information professionals. The terminology unintentionally restricts access by the end users who lack the knowledge needed to use the thesauri effectively for information retrieval. CONCLUSIONS: Population groups are not represented adequately in the index languages of health sciences databases. Users of these databases need to be alerted to the difficulties that may be encountered in searching for information on population groups. Information and health professionals may not be able to access the literature if they are not familiar with the indexing policies on population groups. Consequently, the study points to a problem that needs to be addressed, through either the redesign of existing systems or the design of new ones to meet the goals of Healthy People 2000 and beyond. PMID:8883987
USDA-ARS?s Scientific Manuscript database
No comprehensive protocols exist for the collection, standardization, and storage of agronomic management information into a database that preserves privacy, maintains data uncertainty, and translates everyday decisions into quantitative values. This manuscript describes the development of a databas...
Readiness of food composition databases and food component analysis systems for nutrigenomics
USDA-ARS?s Scientific Manuscript database
The study objective was to discuss the international implications of using nutrigenomics as the basis for individualized health promotion and chronic disease prevention and the challenges it presents to existing nutrient databases and nutrient analysis systems. Definitions and research methods of nu...
USEPA is modifying and enhancing existing software for the depiction of metabolic maps to provide access via structures to metabolism information and associated data in EPA's Office of Pesticide Programs (OPP). The database includes information submitted to EPA in support of pest...
Database crime to crime match rate calculation.
Buckleton, John; Bright, Jo-Anne; Walsh, Simon J
2009-06-01
Guidance exists on how to count matches between samples in a crime sample database but we are unable to locate a definition of how to estimate a match rate. We propose a method that does not proceed from the match counting definition but which has a strong logic.
Development of stormwater utilities requires information on existing stormwater infrastructure and impervious cover as well as costs and benefits of stormwater management options. US EPA has developed a suite of databases and tools that can inform decision-making by regional sto...
Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng
2017-05-10
Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
Zhao, Lei; Guo, Yi; Wang, Wei; Yan, Li-juan
2011-08-01
To evaluate the effectiveness of acupuncture as a treatment for neurovascular headache and to analyze the current situation related to acupuncture treatment. PubMed database (1966-2010), EMBASE database (1986-2010), Cochrane Library (Issue 1, 2010), Chinese Biomedical Literature Database (1979-2010), China HowNet Knowledge Database (1979-2010), VIP Journals Database (1989-2010), and Wanfang database (1998-2010) were retrieved. Randomized or quasi-randomized controlled studies were included. The priority was given to high-quality randomized, controlled trials. Statistical outcome indicators were measured using RevMan 5.0.20 software. A total of 16 articles and 1 535 cases were included. Meta-analysis showed a significant difference between the acupuncture therapy and Western medicine therapy [combined RR (random efficacy model)=1.46, 95% CI (1.21, 1.75), Z=3.96, P<0.0001], indicating an obvious superior effect of the acupuncture therapy; significant difference also existed between the comprehensive acupuncture therapy and acupuncture therapy alone [combined RR (fixed efficacy model)=3.35, 95% CI (1.92, 5.82), Z=4.28, P<0.0001], indicating that acupuncture combined with other therapies, such as points injection, scalp acupuncture, auricular acupuncture, etc., were superior to the conventional body acupuncture therapy alone. The inclusion of limited clinical studies had verified the efficacy of acupuncture in the treatment of neurovascular headache. Although acupuncture or its combined therapies provides certain advantages, most clinical studies are of small sample sizes. Large sample size, randomized, controlled trials are needed in the future for more definitive results.
Relational Database for the Geology of the Northern Rocky Mountains - Idaho, Montana, and Washington
Causey, J. Douglas; Zientek, Michael L.; Bookstrom, Arthur A.; Frost, Thomas P.; Evans, Karl V.; Wilson, Anna B.; Van Gosen, Bradley S.; Boleneus, David E.; Pitts, Rebecca A.
2008-01-01
A relational database was created to prepare and organize geologic map-unit and lithologic descriptions for input into a spatial database for the geology of the northern Rocky Mountains, a compilation of forty-three geologic maps for parts of Idaho, Montana, and Washington in U.S. Geological Survey Open File Report 2005-1235. Not all of the information was transferred to and incorporated in the spatial database due to physical file limitations. This report releases that part of the relational database that was completed for that earlier product. In addition to descriptive geologic information for the northern Rocky Mountains region, the relational database contains a substantial bibliography of geologic literature for the area. The relational database nrgeo.mdb (linked below) is available in Microsoft Access version 2000, a proprietary database program. The relational database contains data tables and other tables used to define terms, relationships between the data tables, and hierarchical relationships in the data; forms used to enter data; and queries used to extract data.
Enhancing user privacy in SARG04-based private database query protocols
NASA Astrophysics Data System (ADS)
Yu, Fang; Qiu, Daowen; Situ, Haozhen; Wang, Xiaoming; Long, Shun
2015-11-01
The well-known SARG04 protocol can be used in a private query application to generate an oblivious key. By usage of the key, the user can retrieve one out of N items from a database without revealing which one he/she is interested in. However, the existing SARG04-based private query protocols are vulnerable to the attacks of faked data from the database since in its canonical form, the SARG04 protocol lacks means for one party to defend attacks from the other. While such attacks can cause significant loss of user privacy, a variant of the SARG04 protocol is proposed in this paper with new mechanisms designed to help the user protect its privacy in private query applications. In the protocol, it is the user who starts the session with the database, trying to learn from it bits of a raw key in an oblivious way. An honesty test is used to detect a cheating database who had transmitted faked data. The whole private query protocol has O( N) communication complexity for conveying at least N encrypted items. Compared with the existing SARG04-based protocols, it is efficient in communication for per-bit learning.
A web-based system architecture for ontology-based data integration in the domain of IT benchmarking
NASA Astrophysics Data System (ADS)
Pfaff, Matthias; Krcmar, Helmut
2018-03-01
In the domain of IT benchmarking (ITBM), a variety of data and information are collected. Although these data serve as the basis for business analyses, no unified semantic representation of such data yet exists. Consequently, data analysis across different distributed data sets and different benchmarks is almost impossible. This paper presents a system architecture and prototypical implementation for an integrated data management of distributed databases based on a domain-specific ontology. To preserve the semantic meaning of the data, the ITBM ontology is linked to data sources and functions as the central concept for database access. Thus, additional databases can be integrated by linking them to this domain-specific ontology and are directly available for further business analyses. Moreover, the web-based system supports the process of mapping ontology concepts to external databases by introducing a semi-automatic mapping recommender and by visualizing possible mapping candidates. The system also provides a natural language interface to easily query linked databases. The expected result of this ontology-based approach of knowledge representation and data access is an increase in knowledge and data sharing in this domain, which will enhance existing business analysis methods.
U.S. Geological Survey coal quality (COALQUAL) database; version 2.0
Bragg, L.J.; Oman, J.K.; Tewalt, S.J.; Oman, C.L.; Rega, N.H.; Washington, P.M.; Finkelman, R.B.
1997-01-01
The USGS Coal Quality database is an interactive, computerized component of the NCRDS. It contains comprehensive analyses of more than 13,000 samples of coal and associated rocks from every major coal-bearing basin and coal bed in the U.S. The data in the coal quality database represent analyses of the coal as it exists in the ground. The data commonly are presented on an as-received whole-coal basis.
Precipitation from the GPM Microwave Imager and Constellation Radiometers
NASA Astrophysics Data System (ADS)
Kummerow, Christian; Randel, David; Kirstetter, Pierre-Emmanuel; Kulie, Mark; Wang, Nai-Yu
2014-05-01
Satellite precipitation retrievals from microwave sensors are fundamentally underconstrained requiring either implicit or explicit a-priori information to constrain solutions. The radiometer algorithm designed for the GPM core and constellation satellites makes this a-priori information explicit in the form of a database of possible rain structures from the GPM core satellite and a Bayesian retrieval scheme. The a-priori database will eventually come from the GPM core satellite's combined radar/radiometer retrieval algorithm. That product is physically constrained to ensure radiometric consistency between the radars and radiometers and is thus ideally suited to create the a-priori databases for all radiometers in the GPM constellation. Until a robust product exists, however, the a-priori databases are being generated from the combination of existing sources over land and oceans. Over oceans, the Day-1 GPM radiometer algorithm uses the TRMM PR/TMI physically derived hydrometer profiles that are available from the tropics through sea surface temperatures of approximately 285K. For colder sea surface temperatures, the existing profiles are used with lower hydrometeor layers removed to correspond to colder conditions. While not ideal, the results appear to be reasonable placeholders until the full GPM database can be constructed. It is more difficult to construct physically consistent profiles over land due to ambiguities in surface emissivities as well as details of the ice scattering that dominates brightness temperature signatures over land. Over land, the a-priori databases have therefore been constructed by matching satellite overpasses to surface radar data derived from the WSR-88 network over the continental United States through the National Mosaic and Multi-Sensor QPE (NMQ) initiative. Databases are generated as a function of land type (4 categories of increasing vegetation cover as well as 4 categories of increasing snow depth), land surface temperature and total precipitable water. One year of coincident observations, generating 20 and 80 million database entries, depending upon the sensor, are used in the retrieval algorithm. The remaining areas such as sea ice and high latitude coastal zones are filled with a combination of CloudSat and AMSR-E plus MHS observations together with a model to create the equivalent databases for other radiometers in the constellation. The most noteworthy result from the Day-1 algorithm is the quality of the land products when compared to existing products. Unlike previous versions of land algorithms that depended upon complex screening routines to decide if pixels were precipitating or not, the current scheme is free of conditional rain statements and appears to produce rain rate with much greater fidelity than previous schemes. There results will be shown.
Automating Relational Database Design for Microcomputer Users.
ERIC Educational Resources Information Center
Pu, Hao-Che
1991-01-01
Discusses issues involved in automating the relational database design process for microcomputer users and presents a prototype of a microcomputer-based system (RA, Relation Assistant) that is based on expert systems technology and helps avoid database maintenance problems. Relational database design is explained and the importance of easy input…
Woo, Patrick C Y; Chung, Liliane M W; Teng, Jade L L; Tse, Herman; Pang, Sherby S Y; Lau, Veronica Y T; Wong, Vanessa W K; Kam, Kwok‐ling; Lau, Susanna K P; Yuen, Kwok‐Yung
2007-01-01
This study is the first study that provides useful guidelines to clinical microbiologists and technicians on the usefulness of full 16S rRNA sequencing, 5′‐end 527‐bp 16S rRNA sequencing and the existing MicroSeq full and 500 16S rDNA bacterial identification system (MicroSeq, Perkin‐Elmer Applied Biosystems Division, Foster City, California, USA) databases for the identification of all existing medically important anaerobic bacteria. Full and 527‐bp 16S rRNA sequencing are able to identify 52–63% of 130 Gram‐positive anaerobic rods, 72–73% of 86 Gram‐negative anaerobic rods and 78% of 23 anaerobic cocci. The existing MicroSeq databases are able to identify only 19–25% of 130 Gram‐positive anaerobic rods, 38% of 86 Gram‐negative anaerobic rods and 39% of 23 anaerobic cocci. These represent only 45–46% of those that should be confidently identified by full and 527‐bp 16S rRNA sequencing. To improve the usefulness of MicroSeq, bacterial species that should be confidently identified by full and/or 527‐bp 16S rRNA sequencing but not included in the existing MicroSeq databases should be included. PMID:17046845
An ontology for major histocompatibility restriction.
Vita, Randi; Overton, James A; Seymour, Emily; Sidney, John; Kaufman, Jim; Tallmadge, Rebecca L; Ellis, Shirley; Hammond, John; Butcher, Geoff W; Sette, Alessandro; Peters, Bjoern
2016-01-01
MHC molecules are a highly diverse family of proteins that play a key role in cellular immune recognition. Over time, different techniques and terminologies have been developed to identify the specific type(s) of MHC molecule involved in a specific immune recognition context. No consistent nomenclature exists across different vertebrate species. To correctly represent MHC related data in The Immune Epitope Database (IEDB), we built upon a previously established MHC ontology and created an ontology to represent MHC molecules as they relate to immunological experiments. This ontology models MHC protein chains from 16 species, deals with different approaches used to identify MHC, such as direct sequencing verses serotyping, relates engineered MHC molecules to naturally occurring ones, connects genetic loci, alleles, protein chains and multi-chain proteins, and establishes evidence codes for MHC restriction. Where available, this work is based on existing ontologies from the OBO foundry. Overall, representing MHC molecules provides a challenging and practically important test case for ontology building, and could serve as an example of how to integrate other ontology building efforts into web resources.
Mahakkanukrauh, Ajanee; Thavornpitak, Yupa; Foocharoen, Chingching; Suwannaroj, Siraphop; Nanagara, Ratanavadee
2013-08-01
Pyogenic arthritis (PA) is still a problematic arthritic disease that requires hospitalization. To study the epidemiological characteristics and predictors of treatment outcomes for Thai patients hospitalized with PA. The nationwide hospital database from the 2010 fiscal year was analyzed. Patients 18 years of age onward, who had primary diagnosis of pyogenic arthritis, were included in this study. There were a total of 6242 PA admissions during 2010. It was ranked third among hospitalized musculoskeletal patients after osteoarthritis (OA) and gouty arthritis. The estimated prevalence of PA was 13.5 per 100 000 adult population. Geographic distributions of PA was related to the population density of each region; however it seemed more frequent in the northern and northeastern regions of Thailand. The prevalence increased with age, 3.6 and 43.6 per 100 000 in young adults and the elderly, respectively. Among the 2877 co-morbidities coded, diabetes was the most common, followed by crystal-induced arthritis, existing other foci of infections (urinary tract infection, skin and soft tissue infections and pneumonia) and pre-existing chronic joint diseases (OA,rheumatoid arthritis), respectively. Overall hospital mortality rate was 2.6%. Poorer outcomes were found among patients with chronic liver disease and other existing foci of infections. The prevalence of hospitalized PA is still modest in Thailand, showing the highest prevalence in the advanced age group. Diabetes was the most commonly co-morbidity found; however, poorer outcomes were noted among patients with chronic liver disease and existing multiple sites of infections. © 2013 Asia Pacific League of Associations for Rheumatology and Wiley Publishing Asia Pty Ltd.
Thematic relatedness production norms for 100 object concepts.
Jouravlev, Olessia; McRae, Ken
2016-12-01
Knowledge of thematic relations is an area of increased interest in semantic memory research because it is crucial to many cognitive processes. One methodological issue that researchers face is how to identify pairs of thematically related concepts that are well-established in semantic memory for most people. In this article, we review existing methods of assessing thematic relatedness and provide thematic relatedness production norming data for 100 object concepts. In addition, 1,174 related concept pairs obtained from the production norms were classified as reflecting one of the five subtypes of relations: attributive, argument, coordinate, locative, and temporal. The database and methodology will be useful for researchers interested in the effects of thematic knowledge on language processing, analogical reasoning, similarity judgments, and memory. These data will also benefit researchers interested in investigating potential processing differences among the five types of semantic relations.
Hudson, Lawrence N; Newbold, Tim; Contu, Sara; Hill, Samantha L L; Lysenko, Igor; De Palma, Adriana; Phillips, Helen R P; Senior, Rebecca A; Bennett, Dominic J; Booth, Hollie; Choimes, Argyrios; Correia, David L P; Day, Julie; Echeverría-Londoño, Susy; Garon, Morgan; Harrison, Michelle L K; Ingram, Daniel J; Jung, Martin; Kemp, Victoria; Kirkpatrick, Lucinda; Martin, Callum D; Pan, Yuan; White, Hannah J; Aben, Job; Abrahamczyk, Stefan; Adum, Gilbert B; Aguilar-Barquero, Virginia; Aizen, Marcelo A; Ancrenaz, Marc; Arbeláez-Cortés, Enrique; Armbrecht, Inge; Azhar, Badrul; Azpiroz, Adrián B; Baeten, Lander; Báldi, András; Banks, John E; Barlow, Jos; Batáry, Péter; Bates, Adam J; Bayne, Erin M; Beja, Pedro; Berg, Åke; Berry, Nicholas J; Bicknell, Jake E; Bihn, Jochen H; Böhning-Gaese, Katrin; Boekhout, Teun; Boutin, Céline; Bouyer, Jérémy; Brearley, Francis Q; Brito, Isabel; Brunet, Jörg; Buczkowski, Grzegorz; Buscardo, Erika; Cabra-García, Jimmy; Calviño-Cancela, María; Cameron, Sydney A; Cancello, Eliana M; Carrijo, Tiago F; Carvalho, Anelena L; Castro, Helena; Castro-Luna, Alejandro A; Cerda, Rolando; Cerezo, Alexis; Chauvat, Matthieu; Clarke, Frank M; Cleary, Daniel F R; Connop, Stuart P; D'Aniello, Biagio; da Silva, Pedro Giovâni; Darvill, Ben; Dauber, Jens; Dejean, Alain; Diekötter, Tim; Dominguez-Haydar, Yamileth; Dormann, Carsten F; Dumont, Bertrand; Dures, Simon G; Dynesius, Mats; Edenius, Lars; Elek, Zoltán; Entling, Martin H; Farwig, Nina; Fayle, Tom M; Felicioli, Antonio; Felton, Annika M; Ficetola, Gentile F; Filgueiras, Bruno K C; Fonte, Steven J; Fraser, Lauchlan H; Fukuda, Daisuke; Furlani, Dario; Ganzhorn, Jörg U; Garden, Jenni G; Gheler-Costa, Carla; Giordani, Paolo; Giordano, Simonetta; Gottschalk, Marco S; Goulson, Dave; Gove, Aaron D; Grogan, James; Hanley, Mick E; Hanson, Thor; Hashim, Nor R; Hawes, Joseph E; Hébert, Christian; Helden, Alvin J; Henden, John-André; Hernández, Lionel; Herzog, Felix; Higuera-Diaz, Diego; Hilje, Branko; Horgan, Finbarr G; Horváth, Roland; Hylander, Kristoffer; Isaacs-Cubides, Paola; Ishitani, Masahiro; Jacobs, Carmen T; Jaramillo, Víctor J; Jauker, Birgit; Jonsell, Mats; Jung, Thomas S; Kapoor, Vena; Kati, Vassiliki; Katovai, Eric; Kessler, Michael; Knop, Eva; Kolb, Annette; Kőrösi, Ádám; Lachat, Thibault; Lantschner, Victoria; Le Féon, Violette; LeBuhn, Gretchen; Légaré, Jean-Philippe; Letcher, Susan G; Littlewood, Nick A; López-Quintero, Carlos A; Louhaichi, Mounir; Lövei, Gabor L; Lucas-Borja, Manuel Esteban; Luja, Victor H; Maeto, Kaoru; Magura, Tibor; Mallari, Neil Aldrin; Marin-Spiotta, Erika; Marshall, E J P; Martínez, Eliana; Mayfield, Margaret M; Mikusinski, Grzegorz; Milder, Jeffrey C; Miller, James R; Morales, Carolina L; Muchane, Mary N; Muchane, Muchai; Naidoo, Robin; Nakamura, Akihiro; Naoe, Shoji; Nates-Parra, Guiomar; Navarrete Gutierrez, Dario A; Neuschulz, Eike L; Noreika, Norbertas; Norfolk, Olivia; Noriega, Jorge Ari; Nöske, Nicole M; O'Dea, Niall; Oduro, William; Ofori-Boateng, Caleb; Oke, Chris O; Osgathorpe, Lynne M; Paritsis, Juan; Parra-H, Alejandro; Pelegrin, Nicolás; Peres, Carlos A; Persson, Anna S; Petanidou, Theodora; Phalan, Ben; Philips, T Keith; Poveda, Katja; Power, Eileen F; Presley, Steven J; Proença, Vânia; Quaranta, Marino; Quintero, Carolina; Redpath-Downing, Nicola A; Reid, J Leighton; Reis, Yana T; Ribeiro, Danilo B; Richardson, Barbara A; Richardson, Michael J; Robles, Carolina A; Römbke, Jörg; Romero-Duque, Luz Piedad; Rosselli, Loreta; Rossiter, Stephen J; Roulston, T'ai H; Rousseau, Laurent; Sadler, Jonathan P; Sáfián, Szabolcs; Saldaña-Vázquez, Romeo A; Samnegård, Ulrika; Schüepp, Christof; Schweiger, Oliver; Sedlock, Jodi L; Shahabuddin, Ghazala; Sheil, Douglas; Silva, Fernando A B; Slade, Eleanor M; Smith-Pardo, Allan H; Sodhi, Navjot S; Somarriba, Eduardo J; Sosa, Ramón A; Stout, Jane C; Struebig, Matthew J; Sung, Yik-Hei; Threlfall, Caragh G; Tonietto, Rebecca; Tóthmérész, Béla; Tscharntke, Teja; Turner, Edgar C; Tylianakis, Jason M; Vanbergen, Adam J; Vassilev, Kiril; Verboven, Hans A F; Vergara, Carlos H; Vergara, Pablo M; Verhulst, Jort; Walker, Tony R; Wang, Yanping; Watling, James I; Wells, Konstans; Williams, Christopher D; Willig, Michael R; Woinarski, John C Z; Wolf, Jan H D; Woodcock, Ben A; Yu, Douglas W; Zaitsev, Andrey S; Collen, Ben; Ewers, Rob M; Mace, Georgina M; Purves, Drew W; Scharlemann, Jörn P W; Purvis, Andy
2014-01-01
Biodiversity continues to decline in the face of increasing anthropogenic pressures such as habitat destruction, exploitation, pollution and introduction of alien species. Existing global databases of species’ threat status or population time series are dominated by charismatic species. The collation of datasets with broad taxonomic and biogeographic extents, and that support computation of a range of biodiversity indicators, is necessary to enable better understanding of historical declines and to project – and avert – future declines. We describe and assess a new database of more than 1.6 million samples from 78 countries representing over 28,000 species, collated from existing spatial comparisons of local-scale biodiversity exposed to different intensities and types of anthropogenic pressures, from terrestrial sites around the world. The database contains measurements taken in 208 (of 814) ecoregions, 13 (of 14) biomes, 25 (of 35) biodiversity hotspots and 16 (of 17) megadiverse countries. The database contains more than 1% of the total number of all species described, and more than 1% of the described species within many taxonomic groups – including flowering plants, gymnosperms, birds, mammals, reptiles, amphibians, beetles, lepidopterans and hymenopterans. The dataset, which is still being added to, is therefore already considerably larger and more representative than those used by previous quantitative models of biodiversity trends and responses. The database is being assembled as part of the PREDICTS project (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems – http://www.predicts.org.uk). We make site-level summary data available alongside this article. The full database will be publicly available in 2015. PMID:25558364
Hudson, Lawrence N; Newbold, Tim; Contu, Sara; Hill, Samantha L L; Lysenko, Igor; De Palma, Adriana; Phillips, Helen R P; Senior, Rebecca A; Bennett, Dominic J; Booth, Hollie; Choimes, Argyrios; Correia, David L P; Day, Julie; Echeverría-Londoño, Susy; Garon, Morgan; Harrison, Michelle L K; Ingram, Daniel J; Jung, Martin; Kemp, Victoria; Kirkpatrick, Lucinda; Martin, Callum D; Pan, Yuan; White, Hannah J; Aben, Job; Abrahamczyk, Stefan; Adum, Gilbert B; Aguilar-Barquero, Virginia; Aizen, Marcelo A; Ancrenaz, Marc; Arbeláez-Cortés, Enrique; Armbrecht, Inge; Azhar, Badrul; Azpiroz, Adrián B; Baeten, Lander; Báldi, András; Banks, John E; Barlow, Jos; Batáry, Péter; Bates, Adam J; Bayne, Erin M; Beja, Pedro; Berg, Åke; Berry, Nicholas J; Bicknell, Jake E; Bihn, Jochen H; Böhning-Gaese, Katrin; Boekhout, Teun; Boutin, Céline; Bouyer, Jérémy; Brearley, Francis Q; Brito, Isabel; Brunet, Jörg; Buczkowski, Grzegorz; Buscardo, Erika; Cabra-García, Jimmy; Calviño-Cancela, María; Cameron, Sydney A; Cancello, Eliana M; Carrijo, Tiago F; Carvalho, Anelena L; Castro, Helena; Castro-Luna, Alejandro A; Cerda, Rolando; Cerezo, Alexis; Chauvat, Matthieu; Clarke, Frank M; Cleary, Daniel F R; Connop, Stuart P; D'Aniello, Biagio; da Silva, Pedro Giovâni; Darvill, Ben; Dauber, Jens; Dejean, Alain; Diekötter, Tim; Dominguez-Haydar, Yamileth; Dormann, Carsten F; Dumont, Bertrand; Dures, Simon G; Dynesius, Mats; Edenius, Lars; Elek, Zoltán; Entling, Martin H; Farwig, Nina; Fayle, Tom M; Felicioli, Antonio; Felton, Annika M; Ficetola, Gentile F; Filgueiras, Bruno K C; Fonte, Steven J; Fraser, Lauchlan H; Fukuda, Daisuke; Furlani, Dario; Ganzhorn, Jörg U; Garden, Jenni G; Gheler-Costa, Carla; Giordani, Paolo; Giordano, Simonetta; Gottschalk, Marco S; Goulson, Dave; Gove, Aaron D; Grogan, James; Hanley, Mick E; Hanson, Thor; Hashim, Nor R; Hawes, Joseph E; Hébert, Christian; Helden, Alvin J; Henden, John-André; Hernández, Lionel; Herzog, Felix; Higuera-Diaz, Diego; Hilje, Branko; Horgan, Finbarr G; Horváth, Roland; Hylander, Kristoffer; Isaacs-Cubides, Paola; Ishitani, Masahiro; Jacobs, Carmen T; Jaramillo, Víctor J; Jauker, Birgit; Jonsell, Mats; Jung, Thomas S; Kapoor, Vena; Kati, Vassiliki; Katovai, Eric; Kessler, Michael; Knop, Eva; Kolb, Annette; Kőrösi, Ádám; Lachat, Thibault; Lantschner, Victoria; Le Féon, Violette; LeBuhn, Gretchen; Légaré, Jean-Philippe; Letcher, Susan G; Littlewood, Nick A; López-Quintero, Carlos A; Louhaichi, Mounir; Lövei, Gabor L; Lucas-Borja, Manuel Esteban; Luja, Victor H; Maeto, Kaoru; Magura, Tibor; Mallari, Neil Aldrin; Marin-Spiotta, Erika; Marshall, E J P; Martínez, Eliana; Mayfield, Margaret M; Mikusinski, Grzegorz; Milder, Jeffrey C; Miller, James R; Morales, Carolina L; Muchane, Mary N; Muchane, Muchai; Naidoo, Robin; Nakamura, Akihiro; Naoe, Shoji; Nates-Parra, Guiomar; Navarrete Gutierrez, Dario A; Neuschulz, Eike L; Noreika, Norbertas; Norfolk, Olivia; Noriega, Jorge Ari; Nöske, Nicole M; O'Dea, Niall; Oduro, William; Ofori-Boateng, Caleb; Oke, Chris O; Osgathorpe, Lynne M; Paritsis, Juan; Parra-H, Alejandro; Pelegrin, Nicolás; Peres, Carlos A; Persson, Anna S; Petanidou, Theodora; Phalan, Ben; Philips, T Keith; Poveda, Katja; Power, Eileen F; Presley, Steven J; Proença, Vânia; Quaranta, Marino; Quintero, Carolina; Redpath-Downing, Nicola A; Reid, J Leighton; Reis, Yana T; Ribeiro, Danilo B; Richardson, Barbara A; Richardson, Michael J; Robles, Carolina A; Römbke, Jörg; Romero-Duque, Luz Piedad; Rosselli, Loreta; Rossiter, Stephen J; Roulston, T'ai H; Rousseau, Laurent; Sadler, Jonathan P; Sáfián, Szabolcs; Saldaña-Vázquez, Romeo A; Samnegård, Ulrika; Schüepp, Christof; Schweiger, Oliver; Sedlock, Jodi L; Shahabuddin, Ghazala; Sheil, Douglas; Silva, Fernando A B; Slade, Eleanor M; Smith-Pardo, Allan H; Sodhi, Navjot S; Somarriba, Eduardo J; Sosa, Ramón A; Stout, Jane C; Struebig, Matthew J; Sung, Yik-Hei; Threlfall, Caragh G; Tonietto, Rebecca; Tóthmérész, Béla; Tscharntke, Teja; Turner, Edgar C; Tylianakis, Jason M; Vanbergen, Adam J; Vassilev, Kiril; Verboven, Hans A F; Vergara, Carlos H; Vergara, Pablo M; Verhulst, Jort; Walker, Tony R; Wang, Yanping; Watling, James I; Wells, Konstans; Williams, Christopher D; Willig, Michael R; Woinarski, John C Z; Wolf, Jan H D; Woodcock, Ben A; Yu, Douglas W; Zaitsev, Andrey S; Collen, Ben; Ewers, Rob M; Mace, Georgina M; Purves, Drew W; Scharlemann, Jörn P W; Purvis, Andy
2014-12-01
Biodiversity continues to decline in the face of increasing anthropogenic pressures such as habitat destruction, exploitation, pollution and introduction of alien species. Existing global databases of species' threat status or population time series are dominated by charismatic species. The collation of datasets with broad taxonomic and biogeographic extents, and that support computation of a range of biodiversity indicators, is necessary to enable better understanding of historical declines and to project - and avert - future declines. We describe and assess a new database of more than 1.6 million samples from 78 countries representing over 28,000 species, collated from existing spatial comparisons of local-scale biodiversity exposed to different intensities and types of anthropogenic pressures, from terrestrial sites around the world. The database contains measurements taken in 208 (of 814) ecoregions, 13 (of 14) biomes, 25 (of 35) biodiversity hotspots and 16 (of 17) megadiverse countries. The database contains more than 1% of the total number of all species described, and more than 1% of the described species within many taxonomic groups - including flowering plants, gymnosperms, birds, mammals, reptiles, amphibians, beetles, lepidopterans and hymenopterans. The dataset, which is still being added to, is therefore already considerably larger and more representative than those used by previous quantitative models of biodiversity trends and responses. The database is being assembled as part of the PREDICTS project (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems - http://www.predicts.org.uk). We make site-level summary data available alongside this article. The full database will be publicly available in 2015.
Compatibility between livestock databases used for quantitative biosecurity response in New Zealand.
Jewell, C P; van Andel, M; Vink, W D; McFadden, A M J
2016-05-01
To characterise New Zealand's livestock biosecurity databases, and investigate their compatibility and capacity to provide a single integrated data source for quantitative outbreak analysis. Contemporary snapshots of the data in three national livestock biosecurity databases, AgriBase, FarmsOnLine (FOL) and the National Animal Identification and Tracing Scheme (NAIT), were obtained on 16 September, 1 September and 30 April 2014, respectively, and loaded into a relational database. A frequency table of animal numbers per farm was calculated for the AgriBase and FOL datasets. A two dimensional kernel density estimate was calculated for farms reporting the presence of cattle, pigs, deer, and small ruminants in each database and the ratio of farm densities for AgriBase versus FOL calculated. The extent to which records in the three databases could be matched and linked was quantified, and the level of agreement amongst them for the presence of different species on properties assessed using Cohen's kappa statistic. AgriBase contained fewer records than FOL, but recorded animal numbers present on each farm, whereas FOL contained more records, but captured only presence/absence of animals. The ratio of farm densities in AgriBase relative to FOL for pigs and deer was reasonably homogeneous across New Zealand, with AgriBase having a farm density approximately 80% of FOL. For cattle and small ruminants, there was considerable heterogeneity, with AgriBase showing a density of cattle farms in the Central Otago region that was 20% of FOL, and a density of small ruminant farms in the central West Coast area that was twice that of FOL. Only 37% of records in FOL could be linked to AgriBase, but the level of agreement for the presence of different species between these databases was substantial (kappa>0.6). Both NAIT and FOL shared common farm identifiers which could be used to georeference animal movements, and there was a fair to substantial agreement (kappa 0.32-0.69) between these databases for the presence of cattle and deer on properties. The three databases broadly agreed with each other, but important differences existed in both species composition and spatial coverage which raises concern over their accuracy. Importantly, they cannot be reliably linked together to provide a single picture of New Zealand's livestock industry, limiting the ability to use advanced quantitative techniques to provide effective decision support during disease outbreaks. We recommend that a single integrated database be developed, with alignment of resources and legislation for its upkeep.
Kochanov, R. V.; Gordon, I. E.; Rothman, L. S.; ...
2015-08-25
In the recent article by Byrne and Goldblatt, "Radiative forcing for 28 potential Archean greenhouse gases", Clim. Past. 10, 1779–1801 (2014), the authors employ the HITRAN2012 spectroscopic database to evaluate the radiative forcing of 28 Archean gases. As part of the evaluation of the status of the spectroscopy of these gases in the selected spectral region (50–1800 cm -1), the cross sections generated from the HITRAN line-by-line parameters were compared with those of the PNNL database of experimental cross sections recorded at moderate resolution. The authors claimed that for NO 2, HNO 3, H 2CO, H 2O 2, HCOOH, Cmore » 2H 4, CH 3OH and CH 3Br there exist large or sometimes severe disagreements between the databases. In this work we show that for only three of these eight gases a modest discrepancy does exist between the two databases and we explain the origin of the differences. For the other five gases, the disagreements are not nearly at the scale suggested by the authors, while we explain some of the differences that do exist. In summary, the agreement between the HITRAN and PNNL databases is very good, although not perfect. Typically differences do not exceed 10 %, provided that HITRAN data exist for the bands/wavelengths of interest. It appears that a molecule-dependent combination of errors has affected the conclusions of the authors. In at least one case it appears that they did not take the correct file from PNNL (N 2O 4 (dimer)+ NO 2 was used in place of the monomer). Finally, cross sections of HO 2 from HITRAN (which do not have a PNNL counterpart) were not calculated correctly in BG, while in the case of HF misleading discussion was presented there based on the confusion by foreign or noise features in the experimental PNNL spectra.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kochanov, R. V.; Gordon, I. E.; Rothman, L. S.
In the recent article by Byrne and Goldblatt, "Radiative forcing for 28 potential Archean greenhouse gases", Clim. Past. 10, 1779–1801 (2014), the authors employ the HITRAN2012 spectroscopic database to evaluate the radiative forcing of 28 Archean gases. As part of the evaluation of the status of the spectroscopy of these gases in the selected spectral region (50–1800 cm -1), the cross sections generated from the HITRAN line-by-line parameters were compared with those of the PNNL database of experimental cross sections recorded at moderate resolution. The authors claimed that for NO 2, HNO 3, H 2CO, H 2O 2, HCOOH, Cmore » 2H 4, CH 3OH and CH 3Br there exist large or sometimes severe disagreements between the databases. In this work we show that for only three of these eight gases a modest discrepancy does exist between the two databases and we explain the origin of the differences. For the other five gases, the disagreements are not nearly at the scale suggested by the authors, while we explain some of the differences that do exist. In summary, the agreement between the HITRAN and PNNL databases is very good, although not perfect. Typically differences do not exceed 10 %, provided that HITRAN data exist for the bands/wavelengths of interest. It appears that a molecule-dependent combination of errors has affected the conclusions of the authors. In at least one case it appears that they did not take the correct file from PNNL (N 2O 4 (dimer)+ NO 2 was used in place of the monomer). Finally, cross sections of HO 2 from HITRAN (which do not have a PNNL counterpart) were not calculated correctly in BG, while in the case of HF misleading discussion was presented there based on the confusion by foreign or noise features in the experimental PNNL spectra.« less
Scale effects of STATSGO and SSURGO databases on flow and water quality predictions
USDA-ARS?s Scientific Manuscript database
Soil information is one of the crucial inputs needed to assess the impacts of existing and alternative agricultural management practices on water quality. Therefore, it is important to understand the effects of spatial scale at which soil databases are developed on water quality evaluations. In the ...
Expanding Academic Vocabulary with an Interactive On-Line Database
ERIC Educational Resources Information Center
Horst, Marlise; Cobb, Tom; Nicolae, Ioana
2005-01-01
University students used a set of existing and purpose-built on-line tools for vocabulary learning in an experimental ESL course. The resources included concordance, dictionary, cloze-builder, hypertext, and a database with interactive self-quizzing feature (all freely available at www.lextutor.ca). The vocabulary targeted for learning consisted…
Data, Data Everywhere but Not a Byte to Read: Managing Monitoring Information.
ERIC Educational Resources Information Center
Stafford, Susan G.
1993-01-01
Describes the Forest Science Data Bank that contains 2,400 data sets from over 350 existing ecological studies. Database features described include involvement of the scientific community; database documentation; data quality assurance; security; data access and retrieval; and data import/export flexibility. Appendices present the Quantitative…
Teaching Database Design with Constraint-Based Tutors
ERIC Educational Resources Information Center
Mitrovic, Antonija; Suraweera, Pramuditha
2016-01-01
Design tasks are difficult to teach, due to large, unstructured solution spaces, underspecified problems, non-existent problem solving algorithms and stopping criteria. In this paper, we comment on our approach to develop KERMIT, a constraint-based tutor that taught database design. In later work, we re-implemented KERMIT as EER-Tutor, and…
Getting Skills Right: Skills for Jobs Indicators
ERIC Educational Resources Information Center
OECD Publishing, 2017
2017-01-01
This report describes the construction of the database of skill needs indicators, i.e. the OECD Skills for Jobs Database, and presents initial results and analysis. It identifies the existing knowledge gaps concerning skills imbalances, providing the rationale for the development of the new skill needs and mismatch indicators. Moreover, it…
50 CFR 600.1415 - Procedures for designating exempted states-general provisions.
Code of Federal Regulations, 2014 CFR
2014-10-01
... MANAGEMENT, NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION, DEPARTMENT OF COMMERCE MAGNUSON-STEVENS ACT... database; or (2) Participate in regional surveys of recreational catch and effort and make the data from.... 1417; (iii) A description of the database in which the data exists and will be transmitted; and (iv...
50 CFR 600.1415 - Procedures for designating exempted states-general provisions.
Code of Federal Regulations, 2011 CFR
2011-10-01
... MANAGEMENT, NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION, DEPARTMENT OF COMMERCE MAGNUSON-STEVENS ACT... database; or (2) Participate in regional surveys of recreational catch and effort and make the data from.... 1417; (iii) A description of the database in which the data exists and will be transmitted; and (iv...
50 CFR 600.1415 - Procedures for designating exempted states-general provisions.
Code of Federal Regulations, 2012 CFR
2012-10-01
... MANAGEMENT, NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION, DEPARTMENT OF COMMERCE MAGNUSON-STEVENS ACT... database; or (2) Participate in regional surveys of recreational catch and effort and make the data from.... 1417; (iii) A description of the database in which the data exists and will be transmitted; and (iv...
50 CFR 600.1415 - Procedures for designating exempted states-general provisions.
Code of Federal Regulations, 2013 CFR
2013-10-01
... MANAGEMENT, NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION, DEPARTMENT OF COMMERCE MAGNUSON-STEVENS ACT... database; or (2) Participate in regional surveys of recreational catch and effort and make the data from.... 1417; (iii) A description of the database in which the data exists and will be transmitted; and (iv...
50 CFR 600.1415 - Procedures for designating exempted states-general provisions.
Code of Federal Regulations, 2010 CFR
2010-10-01
..., NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION, DEPARTMENT OF COMMERCE MAGNUSON-STEVENS ACT PROVISIONS...-hire vessel license holder data to NMFS for inclusion in a national or regional registry database; or.... 1417; (iii) A description of the database in which the data exists and will be transmitted; and (iv...
USDA-ARS?s Scientific Manuscript database
Our objective is to discuss the implications internationally of the increased focus on nutrigenomics as the underlying basis for individualized health promotion and chronic disease prevention and the challenges presented to existing nutrient database and nutrient analysis systems by these trends. De...
Newer Antibacterials in Therapy and Clinical Trials
Paknikar, Simi S; Narayana, Sarala
2012-01-01
In order to deal with the rising problem of antibiotic resistance, newer antibacterials are being discovered and added to existing pool. Since the year 2000, however, only four new classes of antibacterials have been discovered. These include the oxazolidinones, glycolipopeptides, glycolipodepepsipeptide and pleuromutilins. Newer drugs were added to existing classes of antibiotics, such as streptogramins, quinolones, beta-lactam antibiotics, and macrolide-, tetracycline- and trimethoprim-related drugs. Most of the antibacterials are directed against resistant S. aureus infections, with very few against resistant gram-negative infections. The following article reviews the antibacterials approved by the FDA after the year 2000 as well as some of those in clinical trials. Data was obtained through a literature search via Pubmed and google as well as a detailed search of our library database. PMID:23181224
Content-level deduplication on mobile internet datasets
NASA Astrophysics Data System (ADS)
Hou, Ziyu; Chen, Xunxun; Wang, Yang
2017-06-01
Various systems and applications involve a large volume of duplicate items. Based on high data redundancy in real world datasets, data deduplication can reduce storage capacity and improve the utilization of network bandwidth. However, chunks of existing deduplications range in size from 4KB to over 16KB, existing systems are not applicable to the datasets consisting of short records. In this paper, we propose a new framework called SF-Dedup which is able to implement the deduplication process on a large set of Mobile Internet records, the size of records can be smaller than 100B, or even smaller than 10B. SF-Dedup is a short fingerprint, in-line, hash-collisions-resolved deduplication. Results of experimental applications illustrate that SH-Dedup is able to reduce storage capacity and shorten query time on relational database.
Multi-Sensor Scene Synthesis and Analysis
1981-09-01
Quad Trees for Image Representation and Processing ...... ... 126 2.6.2 Databases ..... ..... ... ..... ... ..... ..... 138 2.6.2.1 Definitions and...Basic Concepts ....... 138 2.6.3 Use of Databases in Hierarchical Scene Analysis ...... ... ..................... 147 2.6.4 Use of Relational Tables...Multisensor Image Database Systems (MIDAS) . 161 2.7.2 Relational Database System for Pictures .... ..... 168 2.7.3 Relational Pictorial Database
Ahmetovic, Dragan; Manduchi, Roberto; Coughlan, James M.; Mascetti, Sergio
2016-01-01
In this paper we propose a computer vision-based technique that mines existing spatial image databases for discovery of zebra crosswalks in urban settings. Knowing the location of crosswalks is critical for a blind person planning a trip that includes street crossing. By augmenting existing spatial databases (such as Google Maps or OpenStreetMap) with this information, a blind traveler may make more informed routing decisions, resulting in greater safety during independent travel. Our algorithm first searches for zebra crosswalks in satellite images; all candidates thus found are validated against spatially registered Google Street View images. This cascaded approach enables fast and reliable discovery and localization of zebra crosswalks in large image datasets. While fully automatic, our algorithm could also be complemented by a final crowdsourcing validation stage for increased accuracy. PMID:26824080
Enhanced DIII-D Data Management Through a Relational Database
NASA Astrophysics Data System (ADS)
Burruss, J. R.; Peng, Q.; Schachter, J.; Schissel, D. P.; Terpstra, T. B.
2000-10-01
A relational database is being used to serve data about DIII-D experiments. The database is optimized for queries across multiple shots, allowing for rapid data mining by SQL-literate researchers. The relational database relates different experiments and datasets, thus providing a big picture of DIII-D operations. Users are encouraged to add their own tables to the database. Summary physics quantities about DIII-D discharges are collected and stored in the database automatically. Meta-data about code runs, MDSplus usage, and visualization tool usage are collected, stored in the database, and later analyzed to improve computing. Documentation on the database may be accessed through programming languages such as C, Java, and IDL, or through ODBC compliant applications such as Excel and Access. A database-driven web page also provides a convenient means for viewing database quantities through the World Wide Web. Demonstrations will be given at the poster.
Memory and long-range correlations in chess games
NASA Astrophysics Data System (ADS)
Schaigorodsky, Ana L.; Perotti, Juan I.; Billoni, Orlando V.
2014-01-01
In this paper we report the existence of long-range memory in the opening moves of a chronologically ordered set of chess games using an extensive chess database. We used two mapping rules to build discrete time series and analyzed them using two methods for detecting long-range correlations; rescaled range analysis and detrended fluctuation analysis. We found that long-range memory is related to the level of the players. When the database is filtered according to player levels we found differences in the persistence of the different subsets. For high level players, correlations are stronger at long time scales; whereas in intermediate and low level players they reach the maximum value at shorter time scales. This can be interpreted as a signature of the different strategies used by players with different levels of expertise. These results are robust against the assignation rules and the method employed in the analysis of the time series.
Thermodynamic assessment of the LiF-NaF-BeF2-ThF4-UF4 system
NASA Astrophysics Data System (ADS)
Capelli, E.; Beneš, O.; Konings, R. J. M.
2014-06-01
The present study describes the full thermodynamic assessment of the LiF-NaF-BeF2-ThF4-UF4 system which is one of the key systems considered for a molten salt reactor fuel. The work is an extension of the previously assessed LiF-NaF-ThF4-UF4 system with addition of BeF2 which is characterized by very low neutron capture cross section and a relatively low melting point. To extend the database the binary BeF2-ThF4 and BeF2-UF4 systems were optimized and the novel data were used for the thermodynamic assessment of BeF2 containing ternary systems for which experimental data exist in the literature. The obtained database is used to optimize the molten salt reactor fuel composition and to assess its properties with the emphasis on the melting behaviour.
Multiple Image Arrangement for Subjective Quality Assessment
NASA Astrophysics Data System (ADS)
Wang, Yan; Zhai, Guangtao
2017-12-01
Subjective quality assessment serves as the foundation for almost all visual quality related researches. Size of the image quality databases has expanded from dozens to thousands in the last decades. Since each subjective rating therein has to be averaged over quite a few participants, the ever-increasing overall size of those databases calls for an evolution of existing subjective test methods. Traditional single/double stimulus based approaches are being replaced by multiple image tests, where several distorted versions of the original one are displayed and rated at once. And this naturally brings upon the question of how to arrange those multiple images on screen during the test. In this paper, we answer this question by performing subjective viewing test with eye tracker for different types arrangements. Our research indicates that isometric arrangement imposes less duress on participants and has more uniform distribution of eye fixations and movements and therefore is expected to generate more reliable subjective ratings.
Bourassa, Dominic; Gauthier, François; Abdul-Nour, Georges
2016-01-01
Accidental events in manufacturing industries can be caused by many factors, including work methods, lack of training, equipment design, maintenance and reliability. This study is aimed at determining the contribution of failures of commonly used industrial equipment, such as machines, tools and material handling equipment, to the chain of causality of industrial accidents and incidents. Based on a case study which aimed at the analysis of an existing pulp and paper company's accident database, this paper examines the number, type and gravity of the failures involved in these events and their causes. Results from this study show that equipment failures had a major effect on the number and severity of accidents accounted for in the database: 272 out of 773 accidental events were related to equipment failure, where 13 of them had direct human consequences. Failures that contributed directly or indirectly to these events are analyzed.
Toward a Bio-Medical Thesaurus: Building the Foundation of the UMLS
Tuttle, Mark S.; Blois, Marsden S.; Erlbaum, Mark S.; Nelson, Stuart J.; Sherertz, David D.
1988-01-01
The Unified Medical Language System (UMLS) is being designed to provide a uniform user interface to heterogeneous machine-readable bio-medical information resources, such as bibliographic databases, genetic databases, expert systems and patient records.1 Such an interface will have to recognize different ways of saying the same thing, and provide links to ways of saying related things. One way to represent the necessary associations is via a domain thesaurus. As no such thesaurus exists, and because, once built, it will be both sizable and in need of continuous maintenance, its design should include a methodology for building and maintaining it. We propose a methodology, utilizing lexically expanded schema inversion, and a design, called T. Lex, which together form one approach to the problem of defining and building a bio-medical thesaurus. We argue that the semantic locality implicit in such a thesaurus will support model-based reasoning in bio-medicine.2
Major technology issues in surgical data collection.
Kirschenbaum, I H
1995-10-01
Surgical scheduling and data collection is a field that has a long history as well as a bright future. Historically, surgical cases have always involved some amount of data collection. Surgical cases are scheduled and then reviewed. The classic method, that large black surgical log, actually still exists in many hospitals. In fact, there is nothing new about the recording or reporting of surgical cases. If we only needed to record the information and produce a variety of reports on the data, then modern electronic technology would function as a glorified fast index card box--or, in computer database terms, a simple flat file database. But, this is not the future of technology in surgical case management. This article makes the general case for integrating surgical data systems. Instead of reviewing specific software, it essentially addresses the issues of strategic planning related to this important aspect of medical information systems.
Assistive technology for ultrasound-guided central venous catheter placement.
Ikhsan, Mohammad; Tan, Kok Kiong; Putra, Andi Sudjana
2018-01-01
This study evaluated the existing technology used to improve the safety and ease of ultrasound-guided central venous catheterization. Electronic database searches were conducted in Scopus, IEEE, Google Patents, and relevant conference databases (SPIE, MICCAI, and IEEE conferences) for related articles on assistive technology for ultrasound-guided central venous catheterization. A total of 89 articles were examined and pointed to several fields that are currently the focus of improvements to ultrasound-guided procedures. These include improving needle visualization, needle guides and localization technology, image processing algorithms to enhance and segment important features within the ultrasound image, robotic assistance using probe-mounted manipulators, and improving procedure ergonomics through in situ projections of important information. Probe-mounted robotic manipulators provide a promising avenue for assistive technology developed for freehand ultrasound-guided percutaneous procedures. However, there is currently a lack of clinical trials to validate the effectiveness of these devices.
Multi-source and ontology-based retrieval engine for maize mutant phenotypes
Green, Jason M.; Harnsomburana, Jaturon; Schaeffer, Mary L.; Lawrence, Carolyn J.; Shyu, Chi-Ren
2011-01-01
Model Organism Databases, including the various plant genome databases, collect and enable access to massive amounts of heterogeneous information, including sequence data, gene product information, images of mutant phenotypes, etc, as well as textual descriptions of many of these entities. While a variety of basic browsing and search capabilities are available to allow researchers to query and peruse the names and attributes of phenotypic data, next-generation search mechanisms that allow querying and ranking of text descriptions are much less common. In addition, the plant community needs an innovative way to leverage the existing links in these databases to search groups of text descriptions simultaneously. Furthermore, though much time and effort have been afforded to the development of plant-related ontologies, the knowledge embedded in these ontologies remains largely unused in available plant search mechanisms. Addressing these issues, we have developed a unique search engine for mutant phenotypes from MaizeGDB. This advanced search mechanism integrates various text description sources in MaizeGDB to aid a user in retrieving desired mutant phenotype information. Currently, descriptions of mutant phenotypes, loci and gene products are utilized collectively for each search, though expansion of the search mechanism to include other sources is straightforward. The retrieval engine, to our knowledge, is the first engine to exploit the content and structure of available domain ontologies, currently the Plant and Gene Ontologies, to expand and enrich retrieval results in major plant genomic databases. Database URL: http:www.PhenomicsWorld.org/QBTA.php PMID:21558151
A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database.
Huang, Zhiwu; Shan, Shiguang; Wang, Ruiping; Zhang, Haihong; Lao, Shihong; Kuerban, Alifu; Chen, Xilin
2015-12-01
Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Videoto-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX(1) Face DB. Specifically, we make three contributions. First, we collect and release a largescale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation.
2005-09-01
aid this thesis would not have come to existence. First, we would like to thank Eric Chaum for his enthusiasm and recommendation for doing this...by David Hina [HINA 00], discusses the use of available Commercial-Off-The-Shelf (COTS) XML technology to provide for the exchange of data between...air and maritime components and therefore forms the basis of a joint command and control data model [ CHAUM 04]. The data model is one of the two
Collaborative Data Publication Utilizing the Open Data Repository's (ODR) Data Publisher
NASA Technical Reports Server (NTRS)
Stone, N.; Lafuente, B.; Bristow, T.; Keller, R. M.; Downs, R. T.; Blake, D.; Fonda, M.; Dateo, C.; Pires, A.
2017-01-01
Introduction: For small communities in diverse fields such as astrobiology, publishing and sharing data can be a difficult challenge. While large, homogenous fields often have repositories and existing data standards, small groups of independent researchers have few options for publishing standards and data that can be utilized within their community. In conjunction with teams at NASA Ames and the University of Arizona, the Open Data Repository's (ODR) Data Publisher has been conducting ongoing pilots to assess the needs of diverse research groups and to develop software to allow them to publish and share their data collaboratively. Objectives: The ODR's Data Publisher aims to provide an easy-to-use and implement software tool that will allow researchers to create and publish database templates and related data. The end product will facilitate both human-readable interfaces (web-based with embedded images, files, and charts) and machine-readable interfaces utilizing semantic standards. Characteristics: The Data Publisher software runs on the standard LAMP (Linux, Apache, MySQL, PHP) stack to provide the widest server base available. The software is based on Symfony (www.symfony.com) which provides a robust framework for creating extensible, object-oriented software in PHP. The software interface consists of a template designer where individual or master database templates can be created. A master database template can be shared by many researchers to provide a common metadata standard that will set a compatibility standard for all derivative databases. Individual researchers can then extend their instance of the template with custom fields, file storage, or visualizations that may be unique to their studies. This allows groups to create compatible databases for data discovery and sharing purposes while still providing the flexibility needed to meet the needs of scientists in rapidly evolving areas of research. Research: As part of this effort, a number of ongoing pilot and test projects are currently in progress. The Astrobiology Habitable Environments Database Working Group is developing a shared database standard using the ODR's Data Publisher and has a number of example databases where astrobiology data are shared. Soon these databases will be integrated via the template-based standard. Work with this group helps determine what data researchers in these diverse fields need to share and archive. Additionally, this pilot helps determine what standards are viable for sharing these types of data from internally developed standards to existing open standards such as the Dublin Core (http://dublincore.org) and Darwin Core (http://rs.twdg.org) metadata standards. Further studies are ongoing with the University of Arizona Department of Geosciences where a number of mineralogy databases are being constructed within the ODR Data Publisher system. Conclusions: Through the ongoing pilots and discussions with individual researchers and small research teams, a definition of the tools desired by these groups is coming into focus. As the software development moves forward, the goal is to meet the publication and collaboration needs of these scientists in an unobtrusive and functional way.
A survey of commercial object-oriented database management systems
NASA Technical Reports Server (NTRS)
Atkins, John
1992-01-01
The object-oriented data model is the culmination of over thirty years of database research. Initially, database research focused on the need to provide information in a consistent and efficient manner to the business community. Early data models such as the hierarchical model and the network model met the goal of consistent and efficient access to data and were substantial improvements over simple file mechanisms for storing and accessing data. However, these models required highly skilled programmers to provide access to the data. Consequently, in the early 70's E.F. Codd, an IBM research computer scientists, proposed a new data model based on the simple mathematical notion of the relation. This model is known as the Relational Model. In the relational model, data is represented in flat tables (or relations) which have no physical or internal links between them. The simplicity of this model fostered the development of powerful but relatively simple query languages that now made data directly accessible to the general database user. Except for large, multi-user database systems, a database professional was in general no longer necessary. Database professionals found that traditional data in the form of character data, dates, and numeric data were easily represented and managed via the relational model. Commercial relational database management systems proliferated and performance of relational databases improved dramatically. However, there was a growing community of potential database users whose needs were not met by the relational model. These users needed to store data with data types not available in the relational model and who required a far richer modelling environment than that provided by the relational model. Indeed, the complexity of the objects to be represented in the model mandated a new approach to database technology. The Object-Oriented Model was the result.
Development of the Tensoral Computer Language
NASA Technical Reports Server (NTRS)
Ferziger, Joel; Dresselhaus, Eliot
1996-01-01
The research scientist or engineer wishing to perform large scale simulations or to extract useful information from existing databases is required to have expertise in the details of the particular database, the numerical methods and the computer architecture to be used. This poses a significant practical barrier to the use of simulation data. The goal of this research was to develop a high-level computer language called Tensoral, designed to remove this barrier. The Tensoral language provides a framework in which efficient generic data manipulations can be easily coded and implemented. First of all, Tensoral is general. The fundamental objects in Tensoral represent tensor fields and the operators that act on them. The numerical implementation of these tensors and operators is completely and flexibly programmable. New mathematical constructs and operators can be easily added to the Tensoral system. Tensoral is compatible with existing languages. Tensoral tensor operations co-exist in a natural way with a host language, which may be any sufficiently powerful computer language such as Fortran, C, or Vectoral. Tensoral is very-high-level. Tensor operations in Tensoral typically act on entire databases (i.e., arrays) at one time and may, therefore, correspond to many lines of code in a conventional language. Tensoral is efficient. Tensoral is a compiled language. Database manipulations are simplified optimized and scheduled by the compiler eventually resulting in efficient machine code to implement them.
Simons, Johannes WIM
2009-01-01
Background We have previously shown that deviations from the average transcription profile of a group of functionally related genes are not only heritable, but also demonstrate specific patterns associated with age, gender and differentiation, thereby implicating genome-wide nuclear programming as the cause. To determine whether these results could be reproduced, a different micro-array database (obtained from two types of muscle tissue, derived from 81 human donors aged between 16 to 89 years) was studied. Results This new database also revealed the existence of age, gender and tissue-specific features in a small group of functionally related genes. In order to further analyze this phenomenon, a method was developed for quantifying the contribution of different factors to the variability in gene expression, and for generating a database limited to residual values reflecting constitutional differences between individuals. These constitutional differences, presumably epigenetic in origin, contribute to about 50% of the observed residual variance which is connected with a network of interrelated changes in gene expression with some genes displaying a decrease or increase in residual variation with age. Conclusion Epigenetic variation in gene expression without a clear concomitant relation to gene function appears to be a widespread phenomenon. This variation is connected with interactions between genes, is gender and tissue specific and is related to cellular aging. This finding, together with the method developed for analysis, might contribute to the elucidation of the role of nuclear programming in differentiation, aging and carcinogenesis Reviewers This article was reviewed by Thiago M. Venancio (nominated by Aravind Iyer), Hua Li (nominated by Arcady Mushegian) and Arcady Mushegian and J.P.de Magelhaes (nominated by G. Church). PMID:19796384
Cataloging the biomedical world of pain through semi-automated curation of molecular interactions
Jamieson, Daniel G.; Roberts, Phoebe M.; Robertson, David L.; Sidders, Ben; Nenadic, Goran
2013-01-01
The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie ‘pain’, a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can facilitate rapid curation of molecular interactions to create a custom database. Database URL: ••• PMID:23707966
Cataloging the biomedical world of pain through semi-automated curation of molecular interactions.
Jamieson, Daniel G; Roberts, Phoebe M; Robertson, David L; Sidders, Ben; Nenadic, Goran
2013-01-01
The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie 'pain', a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can facilitate rapid curation of molecular interactions to create a custom database. Database URL: •••
2013-01-01
commercial NoSQL database system. The results show that In-dexedHBase provides a data loading speed that is 6 times faster than Riak, and is...compare it with Riak, a widely adopted commercial NoSQL database system. The results show that In- dexedHBase provides a data loading speed that is 6...events. This chapter describes our research towards building an efficient and scalable storage platform for Truthy. Many existing NoSQL databases
Application GIS on university planning: building a spatial database aided spatial decision
NASA Astrophysics Data System (ADS)
Miao, Lei; Wu, Xiaofang; Wang, Kun; Nong, Yu
2007-06-01
With the development of university and its size enlarging, kinds of resource need to effective management urgently. Spacial database is the right tool to assist administrator's spatial decision. And it's ready for digital campus with integrating existing OMS. It's researched about the campus planning in detail firstly. Following instanced by south china agriculture university it is practiced that how to build the geographic database of the campus building and house for university administrator's spatial decision.
Reference System of DNA and Protein Sequences on CD-ROM
NASA Astrophysics Data System (ADS)
Nasu, Hisanori; Ito, Toshiaki
DNASIS-DBREF31 is a database for DNA and Protein sequences in the form of optical Compact Disk (CD) ROM, developed and commercialized by Hitachi Software Engineering Co., Ltd. Both nucleic acid base sequences and protein amino acid sequences can be retrieved from a single CD-ROM. Existing database is offered in the form of on-line service, floppy disks, or magnetic tape, all of which have some problems or other, such as usability or storage capacity. DNASIS-DBREF31 newly adopt a CD-ROM as a database device to realize a mass storage and personal use of the database.
The LSST Data Mining Research Agenda
NASA Astrophysics Data System (ADS)
Borne, K.; Becla, J.; Davidson, I.; Szalay, A.; Tyson, J. A.
2008-12-01
We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night) multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.
Technical Aspects of Interfacing MUMPS to an External SQL Relational Database Management System
Kuzmak, Peter M.; Walters, Richard F.; Penrod, Gail
1988-01-01
This paper describes an interface connecting InterSystems MUMPS (M/VX) to an external relational DBMS, the SYBASE Database Management System. The interface enables MUMPS to operate in a relational environment and gives the MUMPS language full access to a complete set of SQL commands. MUMPS generates SQL statements as ASCII text and sends them to the RDBMS. The RDBMS executes the statements and returns ASCII results to MUMPS. The interface suggests that the language features of MUMPS make it an attractive tool for use in the relational database environment. The approach described in this paper separates MUMPS from the relational database. Positioning the relational database outside of MUMPS promotes data sharing and permits a number of different options to be used for working with the data. Other languages like C, FORTRAN, and COBOL can access the RDBMS database. Advanced tools provided by the relational database vendor can also be used. SYBASE is an advanced high-performance transaction-oriented relational database management system for the VAX/VMS and UNIX operating systems. SYBASE is designed using a distributed open-systems architecture, and is relatively easy to interface with MUMPS.
Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja
2014-01-01
Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http://hive.biochemistry.gwu.edu/tools/biomuta/index.php; CSR: http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr; HIVE: http://hive.biochemistry.gwu.edu.
Degli Esposti, Luca; Saragoni, Stefania; Buda, Stefano; Sturani, Alessandra; Degli Esposti, Ezio
2013-01-01
Diabetes is one of the most prevalent chronic diseases, and its prevalence is predicted to increase in the next two decades. Diabetes imposes a staggering financial burden on the health care system, so information about the costs and experiences of collecting and reporting quality measures of data is vital for practices deciding whether to adopt quality improvements or monitor existing initiatives. The aim of this study was to quantify the association between health care costs and level of glycemic control in patients with type 2 diabetes using clinical and administrative databases. A retrospective analysis using a large administrative database and a clinical registry containing laboratory results was performed. Patients were subdivided according to their glycated hemoglobin level. Multivariate analyses were used to control for differences in potential confounding factors, including age, gender, Charlson comorbidity index, presence of dyslipidemia, hypertension, or cardiovascular disease, and degree of adherence with antidiabetic drugs among the study groups. Of the total population of 700,000 subjects, 31,022 were identified as being diabetic (4.4% of the entire population). Of these, 21,586 met the study inclusion criteria. In total, 31.5% of patients had very poor glycemic control and 25.7% had excellent control. Over 2 years, the mean diabetes-related cost per person was: €1291.56 in patients with excellent control; €1545.99 in those with good control; €1584.07 in those with fair control; €1839.42 in those with poor control; and €1894.80 in those with very poor control. After adjustment, compared with the group having excellent control, the estimated excess cost per person associated with the groups with good control, fair control, poor control, and very poor control was €219.28, €264.65, €513.18, and €564.79, respectively. Many patients showed suboptimal glycemic control. Lower levels of glycated hemoglobin were associated with lower diabetes-related health care costs. Integration of administrative databases and a laboratory database appears to be suitable for showing that appropriate management of diabetes can help to achieve better resource allocation.
Pharmacokinetic interactions of herbal medicines for the treatment of chronic hepatitis.
Hsueh, Tun-Pin; Lin, Wan-Ling; Tsai, Tung-Hu
2017-04-01
Chronic liver disease is a serious global health problem, and an increasing number of patients are seeking alternative medicines or complementary treatment. Herbal medicines account for 16.8% of patients with chronic liver disease who use complementary and alternative therapies. A survey of the National Health Insurance Research Database in Taiwan reported that Long-Dan-Xie-Gan-Tang, Jia-Wei-Xia-Yao-San, and Xiao-Chai-Hu-Tang (Sho-saiko-to) were the most frequent formula prescriptions for chronic hepatitis used by traditional Chinese medicine physicians. Bioanalytical methods of herbal medicines for the treatment of chronic hepatitis were developed to investigate pharmacokinetics properties, but multicomponent herbal formulas have been seldom discussed. The pharmacokinetics of herbal formulas is closely related to efficacy, efficiency, and patient safety of traditional herbal medicines. Potential herbal formula-drug interactions are another essential issue during herbal formula administration in chronic hepatitis patients. In a survey with the PubMed database, this review article evaluates the existing evidence-based data associated with the documented pharmacokinetics profiles and potential herbal-drug interactions of herbal formulas for the treatment of chronic hepatitis. In addition, the existing pharmacokinetic profiles were further linked with clinical practice to provide insight for the safety and specific use of traditional herbal medicines. Copyright © 2016. Published by Elsevier B.V.
Sellami-Kaaniche, Emna; de Gouvello, Bernard; Gromaire, Marie-Christine; Chebbo, Ghassan
2014-04-01
Today, urban runoff is considered as an important source of environmental pollution. Roofing materials, in particular, the metallic ones, are considered as a major source of urban runoff metal contaminations. In the context of the European Water Directive (2000/60 CE), an accurate evaluation of contaminant flows from roofs is thus required on the city scale, and therefore the development of assessment tools is needed. However, on this scale, there is an important diversity of roofing materials. In addition, given the size of a city, a complete census of the materials of the different roofing elements represents a difficult task. Information relating roofing materials and their surfaces on an urban district do not currently exist in urban databases. The objective of this paper is to develop a new method of evaluating annual contaminant flow emissions from the different roofing material elements (e.g., gutter, rooftop) on the city scale. This method is based on using and adapting existing urban databases combined with a statistical approach. Different rules for identifying the materials of the different roofing elements on the city scale have been defined. The methodology is explained through its application to the evaluation of zinc emissions on the scale of the city of Créteil.
An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases.
Yang, Jin Ok; Hwang, Sohyun; Oh, Jeongsu; Bhak, Jong; Sohn, Tae-Kwon
2008-12-12
Studies on the relationship between disease and genetic variations such as single nucleotide polymorphisms (SNPs) are important. Genetic variations can cause disease by influencing important biological regulation processes. Despite the needs for analyzing SNP and disease correlation, most existing databases provide information only on functional variants at specific locations on the genome, or deal with only a few genes associated with disease. There is no combined resource to widely support gene-, SNP-, and disease-related information, and to capture relationships among such data. Therefore, we developed an integrated database-pipeline system for studying SNPs and diseases. To implement the pipeline system for the integrated database, we first unified complicated and redundant disease terms and gene names using the Unified Medical Language System (UMLS) for classification and noun modification, and the HUGO Gene Nomenclature Committee (HGNC) and NCBI gene databases. Next, we collected and integrated representative databases for three categories of information. For genes and proteins, we examined the NCBI mRNA, UniProt, UCSC Table Track and MitoDat databases. For genetic variants we used the dbSNP, JSNP, ALFRED, and HGVbase databases. For disease, we employed OMIM, GAD, and HGMD databases. The database-pipeline system provides a disease thesaurus, including genes and SNPs associated with disease. The search results for these categories are available on the web page http://diseasome.kobic.re.kr/, and a genome browser is also available to highlight findings, as well as to permit the convenient review of potentially deleterious SNPs among genes strongly associated with specific diseases and clinical phenotypes. Our system is designed to capture the relationships between SNPs associated with disease and disease-causing genes. The integrated database-pipeline provides a list of candidate genes and SNP markers for evaluation in both epidemiological and molecular biological approaches to diseases-gene association studies. Furthermore, researchers then can decide semi-automatically the data set for association studies while considering the relationships between genetic variation and diseases. The database can also be economical for disease-association studies, as well as to facilitate an understanding of the processes which cause disease. Currently, the database contains 14,674 SNP records and 109,715 gene records associated with human diseases and it is updated at regular intervals.
The CSB Incident Screening Database: description, summary statistics and uses.
Gomez, Manuel R; Casper, Susan; Smith, E Allen
2008-11-15
This paper briefly describes the Chemical Incident Screening Database currently used by the CSB to identify and evaluate chemical incidents for possible investigations, and summarizes descriptive statistics from this database that can potentially help to estimate the number, character, and consequences of chemical incidents in the US. The report compares some of the information in the CSB database to roughly similar information available from databases operated by EPA and the Agency for Toxic Substances and Disease Registry (ATSDR), and explores the possible implications of these comparisons with regard to the dimension of the chemical incident problem. Finally, the report explores in a preliminary way whether a system modeled after the existing CSB screening database could be developed to serve as a national surveillance tool for chemical incidents.
DOE technology information management system database study report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Widing, M.A.; Blodgett, D.W.; Braun, M.D.
1994-11-01
To support the missions of the US Department of Energy (DOE) Special Technologies Program, Argonne National Laboratory is defining the requirements for an automated software system that will search electronic databases on technology. This report examines the work done and results to date. Argonne studied existing commercial and government sources of technology databases in five general areas: on-line services, patent database sources, government sources, aerospace technology sources, and general technology sources. First, it conducted a preliminary investigation of these sources to obtain information on the content, cost, frequency of updates, and other aspects of their databases. The Laboratory then performedmore » detailed examinations of at least one source in each area. On this basis, Argonne recommended which databases should be incorporated in DOE`s Technology Information Management System.« less
Mycobacteriophage genome database.
Joseph, Jerrine; Rajendran, Vasanthi; Hassan, Sameer; Kumar, Vanaja
2011-01-01
Mycobacteriophage genome database (MGDB) is an exclusive repository of the 64 completely sequenced mycobacteriophages with annotated information. It is a comprehensive compilation of the various gene parameters captured from several databases pooled together to empower mycobacteriophage researchers. The MGDB (Version No.1.0) comprises of 6086 genes from 64 mycobacteriophages classified into 72 families based on ACLAME database. Manual curation was aided by information available from public databases which was enriched further by analysis. Its web interface allows browsing as well as querying the classification. The main objective is to collect and organize the complexity inherent to mycobacteriophage protein classification in a rational way. The other objective is to browse the existing and new genomes and describe their functional annotation. The database is available for free at http://mpgdb.ibioinformatics.org/mpgdb.php.
If you build it, they will come: unintended future uses of organised health data collections.
O'Doherty, Kieran C; Christofides, Emily; Yen, Jeffery; Bentzen, Heidi Beate; Burke, Wylie; Hallowell, Nina; Koenig, Barbara A; Willison, Donald J
2016-09-06
Health research increasingly relies on organized collections of health data and biological samples. There are many types of sample and data collections that are used for health research, though these are collected for many purposes, not all of which are health-related. These collections exist under different jurisdictional and regulatory arrangements and include: 1) Population biobanks, cohort studies, and genome databases 2) Clinical and public health data 3) Direct-to-consumer genetic testing 4) Social media 5) Fitness trackers, health apps, and biometric data sensors Ethical, legal, and social challenges of such collections are well recognized, but there has been limited attention to the broader societal implications of the existence of these collections. Although health research conducted using these collections is broadly recognized as beneficent, secondary uses of these data and samples may be controversial. We examine both documented and hypothetical scenarios of secondary uses of health data and samples. In particular, we focus on the use of health data for purposes of: Forensic investigations Civil lawsuits Identification of victims of mass casualty events Denial of entry for border security and immigration Making health resource rationing decisions Facilitating human rights abuses in autocratic regimes Current safeguards relating to the use of health data and samples include research ethics oversight and privacy laws. These safeguards have a strong focus on informed consent and anonymization, which are aimed at the protection of the individual research subject. They are not intended to address broader societal implications of health data and sample collections. As such, existing arrangements are insufficient to protect against subversion of health databases for non-sanctioned secondary uses, or to provide guidance for reasonable but controversial secondary uses. We are concerned that existing debate in the scholarly literature and beyond has not sufficiently recognized the secondary data uses we outline in this paper. Our main purpose, therefore, is to raise awareness of the potential for unforeseen and unintended consequences, in particular negative consequences, of the increased availability and development of health data collections for research, by providing a comprehensive review of documented and hypothetical non-health research uses of such data.
Managing Heterogeneous Information Systems through Discovery and Retrieval of Generic Concepts.
ERIC Educational Resources Information Center
Srinivasan, Uma; Ngu, Anne H. H.; Gedeon, Tom
2000-01-01
Introduces a conceptual integration approach to heterogeneous databases or information systems that exploits the similarity in metalevel information and performs metadata mining on database objects to discover a set of concepts that serve as a domain abstraction and provide a conceptual layer above existing legacy systems. Presents results of…
Using sampling theory as the basis for a conceptual data model
Fred C. Martin; Tonya Baggett; Tom Wolfe
2000-01-01
Greater demands on forest resources require that larger amounts of information be readily available to decisionmakers. To provide more information faster, databases must be developed that are more comprehensive and easier to use. Data modeling is a process for building more complete and flexible databases by emphasizing fundamental relationships over existing or...
76 FR 19524 - Privacy Act of 1974; Deletion of System of Records
Federal Register 2010, 2011, 2012, 2013, 2014
2011-04-07
... Affairs (VA) is deleting a system of records entitled ``PROS/KEYS User Permissions Database-VA'' (67VA30... requirement for VA to maintain this system of records no longer exists because the PROS/ KEYS Database was... DEPARTMENT OF VETERANS AFFAIRS Privacy Act of 1974; Deletion of System of Records AGENCY...
Heterogeneous database integration in biomedicine.
Sujansky, W
2001-08-01
The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.
Spatial Designation of Critical Habitats for Endangered and Threatened Species in the United States
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuttle, Mark A; Singh, Nagendra; Sabesan, Aarthy
Establishing biological reserves or "hot spots" for endangered and threatened species is critical to support real-world species regulatory and management problems. Geographic data on the distribution of endangered and threatened species can be used to improve ongoing efforts for species conservation in the United States. At present no spatial database exists which maps out the location endangered species for the US. However, spatial descriptions do exists for the habitat associated with all endangered species, but in a form not readily suitable to use in a geographic information system (GIS). In our study, the principal challenge was extracting spatial data describingmore » these critical habitats for 472 species from over 1000 pages of the federal register. In addition, an appropriate database schema was designed to accommodate the different tiers of information associated with the species along with the confidence of designation; the interpreted location data was geo-referenced to the county enumeration unit producing a spatial database of endangered species for the whole of US. The significance of these critical habitat designations, database scheme and methodologies will be discussed.« less
Computer-Aided Systems Engineering for Flight Research Projects Using a Workgroup Database
NASA Technical Reports Server (NTRS)
Mizukami, Masahi
2004-01-01
An online systems engineering tool for flight research projects has been developed through the use of a workgroup database. Capabilities are implemented for typical flight research systems engineering needs in document library, configuration control, hazard analysis, hardware database, requirements management, action item tracking, project team information, and technical performance metrics. Repetitive tasks are automated to reduce workload and errors. Current data and documents are instantly available online and can be worked on collaboratively. Existing forms and conventional processes are used, rather than inventing or changing processes to fit the tool. An integrated tool set offers advantages by automatically cross-referencing data, minimizing redundant data entry, and reducing the number of programs that must be learned. With a simplified approach, significant improvements are attained over existing capabilities for minimal cost. By using a workgroup-level database platform, personnel most directly involved in the project can develop, modify, and maintain the system, thereby saving time and money. As a pilot project, the system has been used to support an in-house flight experiment. Options are proposed for developing and deploying this type of tool on a more extensive basis.
Using decision-tree classifier systems to extract knowledge from databases
NASA Technical Reports Server (NTRS)
St.clair, D. C.; Sabharwal, C. L.; Hacke, Keith; Bond, W. E.
1990-01-01
One difficulty in applying artificial intelligence techniques to the solution of real world problems is that the development and maintenance of many AI systems, such as those used in diagnostics, require large amounts of human resources. At the same time, databases frequently exist which contain information about the process(es) of interest. Recently, efforts to reduce development and maintenance costs of AI systems have focused on using machine learning techniques to extract knowledge from existing databases. Research is described in the area of knowledge extraction using a class of machine learning techniques called decision-tree classifier systems. Results of this research suggest ways of performing knowledge extraction which may be applied in numerous situations. In addition, a measurement called the concept strength metric (CSM) is described which can be used to determine how well the resulting decision tree can differentiate between the concepts it has learned. The CSM can be used to determine whether or not additional knowledge needs to be extracted from the database. An experiment involving real world data is presented to illustrate the concepts described.
A bibliometric analysis of systematic reviews on vaccines and immunisation.
Fernandes, Silke; Jit, Mark; Bozzani, Fiammetta; Griffiths, Ulla K; Scott, J Anthony G; Burchett, Helen E D
2018-04-19
SYSVAC is an online bibliographic database of systematic reviews and systematic review protocols on vaccines and immunisation compiled by the London School of Hygiene & Tropical Medicine and hosted by the World Health Organization (WHO) through their National Immunization Technical Advisory Groups (NITAG) resource centre (www.nitag-resource.org). Here the development of the database and a bibliometric review of its content is presented, describing trends in the publication of policy-relevant systematic reviews on vaccines and immunisation from 2008 to 2016. Searches were conducted in seven scientific databases according to a standardized search protocol, initially in 2014 with the most recent update in January 2017. Abstracts and titles were screened according to specific inclusion criteria. All included publications were coded into relevant categories based on a standardized protocol and subsequently analysed to look at trends in time, topic, area of focus, population and geographic location. After screening for inclusion criteria, 1285 systematic reviews were included in the database. While in 2008 there were only 34 systematic reviews on a vaccine-related topic, this increased to 322 in 2016. The most frequent pathogens/diseases studied were influenza, human papillomavirus and pneumococcus. There were several areas of duplication and overlap. As more systematic reviews are published it becomes increasingly time-consuming for decision-makers to identify relevant information among the ever-increasing volume available. The risk of duplication also increases, particularly given the current lack of coordination of systematic reviews on vaccine-related questions, both in terms of their commissioning and their execution. The SYSVAC database offers an accessible catalogue of vaccine-relevant systematic reviews with, where possible access or a link to the full-text. SYSVAC provides a freely searchable platform to identify existing vaccine-policy-relevant systematic reviews. Systematic reviews will need to be assessed adequately for each specific question and quality. Copyright © 2018. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Dabiru, L.; O'Hara, C. G.; Shaw, D.; Katragadda, S.; Anderson, D.; Kim, S.; Shrestha, B.; Aanstoos, J.; Frisbie, T.; Policelli, F.; Keblawi, N.
2006-12-01
The Research Project Knowledge Base (RPKB) is currently being designed and will be implemented in a manner that is fully compatible and interoperable with enterprise architecture tools developed to support NASA's Applied Sciences Program. Through user needs assessment, collaboration with Stennis Space Center, Goddard Space Flight Center, and NASA's DEVELOP Staff personnel insight to information needs for the RPKB were gathered from across NASA scientific communities of practice. To enable efficient, consistent, standard, structured, and managed data entry and research results compilation a prototype RPKB has been designed and fully integrated with the existing NASA Earth Science Systems Components database. The RPKB will compile research project and keyword information of relevance to the six major science focus areas, 12 national applications, and the Global Change Master Directory (GCMD). The RPKB will include information about projects awarded from NASA research solicitations, project investigator information, research publications, NASA data products employed, and model or decision support tools used or developed as well as new data product information. The RPKB will be developed in a multi-tier architecture that will include a SQL Server relational database backend, middleware, and front end client interfaces for data entry. The purpose of this project is to intelligently harvest the results of research sponsored by the NASA Applied Sciences Program and related research program results. We present various approaches for a wide spectrum of knowledge discovery of research results, publications, projects, etc. from the NASA Systems Components database and global information systems and show how this is implemented in SQL Server database. The application of knowledge discovery is useful for intelligent query answering and multiple-layered database construction. Using advanced EA tools such as the Earth Science Architecture Tool (ESAT), RPKB will enable NASA and partner agencies to efficiently identify the significant results for new experiment directions and principle investigators to formulate experiment directions for new proposals.
NASA Astrophysics Data System (ADS)
Cervato, C.; Fils, D.; Bohling, G.; Diver, P.; Greer, D.; Reed, J.; Tang, X.
2006-12-01
The federation of databases is not a new endeavor. Great strides have been made e.g. in the health and astrophysics communities. Reviews of those successes indicate that they have been able to leverage off key cross-community core concepts. In its simplest implementation, a federation of databases with identical base schemas that can be extended to address individual efforts, is relatively easy to accomplish. Efforts of groups like the Open Geospatial Consortium have shown methods to geospatially relate data between different sources. We present here a summary of CHRONOS's (http://www.chronos.org) experience with highly heterogeneous data. Our experience with the federation of very diverse databases shows that the wide variety of encoding options for items like locality, time scale, taxon ID, and other key parameters makes it difficult to effectively join data across them. However, the response to this is not to develop one large, monolithic database, which will suffer growth pains due to social, national, and operational issues, but rather to systematically develop the architecture that will enable cross-resource (database, repository, tool, interface) interaction. CHRONOS has accomplished the major hurdle of federating small IT database efforts with service-oriented and XML-based approaches. The application of easy-to-use procedures that allow groups of all sizes to implement and experiment with searches across various databases and to use externally created tools is vital. We are sharing with the geoinformatics community the difficulties with application frameworks, user authentication, standards compliance, and data storage encountered in setting up web sites and portals for various science initiatives (e.g., ANDRILL, EARTHTIME). The ability to incorporate CHRONOS data, services, and tools into the existing framework of a group is crucial to the development of a model that supports and extends the vitality of the small- to medium-sized research effort that is essential for a vibrant scientific community. This presentation will directly address issues of portal development related to JSR-168 and other portal API's as well as issues related to both federated and local directory-based authentication. The application of service-oriented architecture in connection with ReST-based approaches is vital to facilitate service use by experienced and less experienced information technology groups. Application of these services with XML- based schemas allows for the connection to third party tools such a GIS-based tools and software designed to perform a specific scientific analysis. The connection of all these capabilities into a combined framework based on the standard XHTML Document object model and CSS 2.0 standards used in traditional web development will be demonstrated. CHRONOS also utilizes newer client techniques such as AJAX and cross- domain scripting along with traditional server-side database, application, and web servers. The combination of the various components of this architecture creates an environment based on open and free standards that allows for the discovery, retrieval, and integration of tools and data.
Psychology's struggle for existence: Second edition, 1913.
Wundt, Wilhelm; Lamiell, James T
2013-08-01
Presents an English translation of Wilhelm Wundt's Psychology's struggle for existence: Second edition, 1913, by James T. Lamiell in August, 2012. In his essay, Wundt advised against the impending divorce of psychology from philosophy. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Hahn, Lars; Leimeister, Chris-André; Ounit, Rachid; Lonardi, Stefano; Morgenstern, Burkhard
2016-10-01
Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/.
NASA Astrophysics Data System (ADS)
Weatherill, G. A.; Pagani, M.; Garcia, J.
2016-09-01
The creation of a magnitude-homogenized catalogue is often one of the most fundamental steps in seismic hazard analysis. The process of homogenizing multiple catalogues of earthquakes into a single unified catalogue typically requires careful appraisal of available bulletins, identification of common events within multiple bulletins and the development and application of empirical models to convert from each catalogue's native scale into the required target. The database of the International Seismological Center (ISC) provides the most exhaustive compilation of records from local bulletins, in addition to its reviewed global bulletin. New open-source tools are developed that can utilize this, or any other compiled database, to explore the relations between earthquake solutions provided by different recording networks, and to build and apply empirical models in order to harmonize magnitude scales for the purpose of creating magnitude-homogeneous earthquake catalogues. These tools are described and their application illustrated in two different contexts. The first is a simple application in the Sub-Saharan Africa region where the spatial coverage and magnitude scales for different local recording networks are compared, and their relation to global magnitude scales explored. In the second application the tools are used on a global scale for the purpose of creating an extended magnitude-homogeneous global earthquake catalogue. Several existing high-quality earthquake databases, such as the ISC-GEM and the ISC Reviewed Bulletins, are harmonized into moment magnitude to form a catalogue of more than 562 840 events. This extended catalogue, while not an appropriate substitute for a locally calibrated analysis, can help in studying global patterns in seismicity and hazard, and is therefore released with the accompanying software.
Acromegaly at diagnosis in 3173 patients from the Liège Acromegaly Survey (LAS) Database.
Petrossians, Patrick; Daly, Adrian F; Natchev, Emil; Maione, Luigi; Blijdorp, Karin; Sahnoun-Fathallah, Mona; Auriemma, Renata; Diallo, Alpha M; Hulting, Anna-Lena; Ferone, Diego; Hana, Vaclav; Filipponi, Silvia; Sievers, Caroline; Nogueira, Claudia; Fajardo-Montañana, Carmen; Carvalho, Davide; Hana, Vaclav; Stalla, Günter K; Jaffrain-Réa, Marie-Lise; Delemer, Brigitte; Colao, Annamaria; Brue, Thierry; Neggers, Sebastian J C M M; Zacharieva, Sabina; Chanson, Philippe; Beckers, Albert
2017-10-01
Acromegaly is a rare disorder caused by chronic growth hormone (GH) hypersecretion. While diagnostic and therapeutic methods have advanced, little information exists on trends in acromegaly characteristics over time. The Liège Acromegaly Survey (LAS) Database , a relational database, is designed to assess the profile of acromegaly patients at diagnosis and during long-term follow-up at multiple treatment centers. The following results were obtained at diagnosis. The study population consisted of 3173 acromegaly patients from ten countries; 54.5% were female. Males were significantly younger at diagnosis than females (43.5 vs 46.4 years; P < 0.001). The median delay from first symptoms to diagnosis was 2 years longer in females ( P = 0.015). Ages at diagnosis and first symptoms increased significantly over time ( P < 0.001). Tumors were larger in males than females ( P < 0.001); tumor size and invasion were inversely related to patient age ( P < 0.001). Random GH at diagnosis correlated with nadir GH levels during OGTT ( P < 0.001). GH was inversely related to age in both sexes ( P < 0.001). Diabetes mellitus was present in 27.5%, hypertension in 28.8%, sleep apnea syndrome in 25.5% and cardiac hypertrophy in 15.5%. Serious cardiovascular outcomes like stroke, heart failure and myocardial infarction were present in <5% at diagnosis. Erythrocyte levels were increased and correlated with IGF-1 values. Thyroid nodules were frequent (34.0%); 820 patients had colonoscopy at diagnosis and 13% had polyps. Osteoporosis was present at diagnosis in 12.3% and 0.6-4.4% had experienced a fracture. In conclusion, this study of >3100 patients is the largest international acromegaly database and shows clinically relevant trends in the characteristics of acromegaly at diagnosis. © 2017 The authors.
Martín-González, Sofía; Navarro-Mesa, Juan L; Juliá-Serdá, Gabriel; Ramírez-Ávila, G Marcelo; Ravelo-García, Antonio G
2018-01-01
Our contribution focuses on the characterization of sleep apnea from a cardiac rate point of view, using Recurrence Quantification Analysis (RQA), based on a Heart Rate Variability (HRV) feature selection process. Three parameters are crucial in RQA: those related to the embedding process (dimension and delay) and the threshold distance. There are no overall accepted parameters for the study of HRV using RQA in sleep apnea. We focus on finding an overall acceptable combination, sweeping a range of values for each of them simultaneously. Together with the commonly used RQA measures, we include features related to recurrence times, and features originating in the complex network theory. To the best of our knowledge, no author has used them all for sleep apnea previously. The best performing feature subset is entered into a Linear Discriminant classifier. The best results in the "Apnea-ECG Physionet database" and the "HuGCDN2014 database" are, according to the area under the receiver operating characteristic curve, 0.93 (Accuracy: 86.33%) and 0.86 (Accuracy: 84.18%), respectively. Our system outperforms, using a relatively small set of features, previously existing studies in the context of sleep apnea. We conclude that working with dimensions around 7-8 and delays about 4-5, and using for the threshold distance the Fixed Amount of Nearest Neighbours (FAN) method with 5% of neighbours, yield the best results. Therefore, we would recommend these reference values for future work when applying RQA to the analysis of HRV in sleep apnea. We also conclude that, together with the commonly used vertical and diagonal RQA measures, there are newly used features that contribute valuable information for apnea minutes discrimination. Therefore, they are especially interesting for characterization purposes. Using two different databases supports that the conclusions reached are potentially generalizable, and are not limited by database variability.
NASA Astrophysics Data System (ADS)
Gross, M. B.; Mayernik, M. S.; Rowan, L. R.; Khan, H.; Boler, F. M.; Maull, K. E.; Stott, D.; Williams, S.; Corson-Rikert, J.; Johns, E. M.; Daniels, M. D.; Krafft, D. B.
2015-12-01
UNAVCO, UCAR, and Cornell University are working together to leverage semantic web technologies to enable discovery of people, datasets, publications and other research products, as well as the connections between them. The EarthCollab project, an EarthCube Building Block, is enhancing an existing open-source semantic web application, VIVO, to address connectivity gaps across distributed networks of researchers and resources related to the following two geoscience-based communities: (1) the Bering Sea Project, an interdisciplinary field program whose data archive is hosted by NCAR's Earth Observing Laboratory (EOL), and (2) UNAVCO, a geodetic facility and consortium that supports diverse research projects informed by geodesy. People, publications, datasets and grant information have been mapped to an extended version of the VIVO-ISF ontology and ingested into VIVO's database. Data is ingested using a custom set of scripts that include the ability to perform basic automated and curated disambiguation. VIVO can display a page for every object ingested, including connections to other objects in the VIVO database. A dataset page, for example, includes the dataset type, time interval, DOI, related publications, and authors. The dataset type field provides a connection to all other datasets of the same type. The author's page will show, among other information, related datasets and co-authors. Information previously spread across several unconnected databases is now stored in a single location. In addition to VIVO's default display, the new database can also be queried using SPARQL, a query language for semantic data. EarthCollab will also extend the VIVO web application. One such extension is the ability to cross-link separate VIVO instances across institutions, allowing local display of externally curated information. For example, Cornell's VIVO faculty pages will display UNAVCO's dataset information and UNAVCO's VIVO will display Cornell faculty member contact and position information. Additional extensions, including enhanced geospatial capabilities, will be developed following task-centered usability testing.
Information system of mineral deposits in Slovenia
NASA Astrophysics Data System (ADS)
Hribernik, K.; Rokavec, D.; Šinigioj, J.; Šolar, S.
2010-03-01
At the Geologic Survey of Slovenia the need for complex overview and control of the deposits of available non-metallic mineral raw materials and of their exploitations became urgent. In the framework of the Geologic Information System we established the Database of non-metallic mineral deposits comprising all important data of deposits and concessionars. Relational database is built with program package MS Access, but in year 2008 we plan to transfer it on SQL server. In the evidence there is 272 deposits and 200 concessionars. The mineral resources information system of Slovenia, which was started back in 2002, consists of two integrated parts, mentioned relational database of mineral deposits, which relates information in tabular way so that rules of relational algebra can be applied, and geographic information system (GIS), which relates spatial information of deposits. . The complex relationships between objects and the concepts of normalized data structures, lead to the practical informative and useful data model, transparent to the user and to better decision-making by allowing future scenarios to be developed and inspected. Computerized storage, and display system is as already said, developed and managed under the support of Geological Survey of Slovenia, which conducts research on the occurrence, quality, quantity, and availability of mineral resources in order to help the Nation make informed decisions using earth-science information. Information about deposit is stored in records in approximately hundred data fields. A numeric record number uniquely identifies each site. The data fields are grouped under principal categories. Each record comprise elementary data of deposit (name, type, location, prospect, rock), administrative data (concessionar, number of decree in official paper, object of decree, number of contract and its duration) and data of mineral resource produced amount and size of exploration area). The data can also be searched, sorted and printed using any of these fields. New records are being added annually, and existing records updated or upgraded. Relational database is connected with scanned exploration/exploitation areas of deposits, defined on the base of digital ortofoto. Register of those areas is indispensable because of spatial planning and spatial municipal and regional strategy development. Database is also part of internet application for quick search and review of data and part of web page of mineral resources of Slovenia. The technology chosen for internet application is ESRI's ArcIMS Internet Map Server. ArcIMS allows users to readily and easily display, analyze, and interpret spatial data from desktop using a Web browser connected to the Internet. We believe that there is an opportunity for cooperation within this activity. We can offer a single location where users can come to browse relatively simply for geoscience-related digital data sets.
Nonparametric Bayesian Modeling for Automated Database Schema Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferragut, Erik M; Laska, Jason A
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
yStreX: yeast stress expression database
Wanichthanarak, Kwanjeera; Nookaew, Intawat; Petranovic, Dina
2014-01-01
Over the past decade genome-wide expression analyses have been often used to study how expression of genes changes in response to various environmental stresses. Many of these studies (such as effects of oxygen concentration, temperature stress, low pH stress, osmotic stress, depletion or limitation of nutrients, addition of different chemical compounds, etc.) have been conducted in the unicellular Eukaryal model, yeast Saccharomyces cerevisiae. However, the lack of a unifying or integrated, bioinformatics platform that would permit efficient and rapid use of all these existing data remain an important issue. To facilitate research by exploiting existing transcription data in the field of yeast physiology, we have developed the yStreX database. It is an online repository of analyzed gene expression data from curated data sets from different studies that capture genome-wide transcriptional changes in response to diverse environmental transitions. The first aim of this online database is to facilitate comparison of cross-platform and cross-laboratory gene expression data. Additionally, we performed different expression analyses, meta-analyses and gene set enrichment analyses; and the results are also deposited in this database. Lastly, we constructed a user-friendly Web interface with interactive visualization to provide intuitive access and to display the queried data for users with no background in bioinformatics. Database URL: http://www.ystrexdb.com PMID:25024351
Integrating In Silico Resources to Map a Signaling Network
Liu, Hanqing; Beck, Tim N.; Golemis, Erica A.; Serebriiskii, Ilya G.
2013-01-01
The abundance of publicly available life science databases offer a wealth of information that can support interpretation of experimentally derived data and greatly enhance hypothesis generation. Protein interaction and functional networks are not simply new renditions of existing data: they provide the opportunity to gain insights into the specific physical and functional role a protein plays as part of the biological system. In this chapter, we describe different in silico tools that can quickly and conveniently retrieve data from existing data repositories and discuss how the available tools are best utilized for different purposes. While emphasizing protein-protein interaction databases (e.g., BioGrid and IntAct), we also introduce metasearch platforms such as STRING and GeneMANIA, pathway databases (e.g., BioCarta and Pathway Commons), text mining approaches (e.g., PubMed and Chilibot), and resources for drug-protein interactions, genetic information for model organisms and gene expression information based on microarray data mining. Furthermore, we provide a simple step-by-step protocol to building customized protein-protein interaction networks in Cytoscape, a powerful network assembly and visualization program, integrating data retrieved from these various databases. As we illustrate, generation of composite interaction networks enables investigators to extract significantly more information about a given biological system than utilization of a single database or sole reliance on primary literature. PMID:24233784
Maier, Dieter; Kalus, Wenzel; Wolff, Martin; Kalko, Susana G; Roca, Josep; Marin de Mas, Igor; Turan, Nil; Cascante, Marta; Falciani, Francesco; Hernandez, Miguel; Villà-Freixa, Jordi; Losko, Sascha
2011-03-05
To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype-phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene--disease and gene--compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development.
2011-01-01
Background To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype - phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. Results To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. Conclusions We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene - disease and gene - compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development. PMID:21375767
Processing SPARQL queries with regular expressions in RDF databases
2011-01-01
Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225
Processing SPARQL queries with regular expressions in RDF databases.
Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon
2011-03-29
As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.
Yang, Wei; Xie, Yanming; Zhuang, Yan
2011-10-01
There are many kinds of Chinese traditional patent medicine used in clinical practice and many adverse events have been reported by clinical professionals. Chinese patent medicine's safety problems are the most concerned by patients and physicians. At present, many researchers have studied re-evaluation methods about post marketing Chinese medicine safety inside and outside China. However, it is rare that using data from hospital information system (HIS) to re-evaluating post marketing Chinese traditional patent medicine safety problems. HIS database in real world is a good resource with rich information to research medicine safety. This study planed to analyze HIS data selected from ten top general hospitals in Beijing, formed a large HIS database in real world with a capacity of 1 000 000 cases in total after a series of data cleaning and integrating procedures. This study could be a new project that using information to evaluate traditional Chinese medicine safety based on HIS database. A clear protocol has been completed as for the first step for the whole study. The protocol is as follows. First of all, separate each of the Chinese traditional patent medicines existing in the total HIS database as a single database. Secondly, select some related laboratory tests indexes as the safety evaluating outcomes, such as routine blood, routine urine, feces routine, conventional coagulation, liver function, kidney function and other tests. Thirdly, use the data mining method to analyze those selected safety outcomes which had abnormal change before and after using Chinese patent medicines. Finally, judge the relationship between those abnormal changing and Chinese patent medicine. We hope this method could imply useful information to Chinese medicine researchers interested in safety evaluation of traditional Chinese medicine.
[Establishment of a comprehensive database for laryngeal cancer related genes and the miRNAs].
Li, Mengjiao; E, Qimin; Liu, Jialin; Huang, Tingting; Liang, Chuanyu
2015-09-01
By collecting and analyzing the laryngeal cancer related genes and the miRNAs, to build a comprehensive laryngeal cancer-related gene database, which differs from the current biological information database with complex and clumsy structure and focuses on the theme of gene and miRNA, and it could make the research and teaching more convenient and efficient. Based on the B/S architecture, using Apache as a Web server, MySQL as coding language of database design and PHP as coding language of web design, a comprehensive database for laryngeal cancer-related genes was established, providing with the gene tables, protein tables, miRNA tables and clinical information tables of the patients with laryngeal cancer. The established database containsed 207 laryngeal cancer related genes, 243 proteins, 26 miRNAs, and their particular information such as mutations, methylations, diversified expressions, and the empirical references of laryngeal cancer relevant molecules. The database could be accessed and operated via the Internet, by which browsing and retrieval of the information were performed. The database were maintained and updated regularly. The database for laryngeal cancer related genes is resource-integrated and user-friendly, providing a genetic information query tool for the study of laryngeal cancer.
Nelson, Michelle L A; McKellar, Kaileah A; Yi, Juliana; Kelloway, Linda; Munce, Sarah; Cott, Cheryl; Hall, Ruth; Fortin, Martin; Teasell, Robert; Lyons, Renee
2017-07-01
Most strokes occur in the context of other medical diagnoses. Currently, stroke rehabilitation evidence reviews have not synthesized or presented evidence with a focus on comorbidities and correspondingly may not align with current patient population. The purpose of this review was to determine the extent and nature of randomized controlled trial stroke rehabilitation evidence that included patients with multimorbidity. A systematic scoping review was conducted. Electronic databases were searched using a combination of terms related to "stroke" and "rehabilitation." Selection criteria captured inpatient rehabilitation studies. Methods were modified to account for the amount of literature, classified by study design, and randomized controlled trials (RCTs) were abstracted. The database search yielded 10771 unique articles. Screening resulted in 428 included RCTs. Three studies explicitly included patients with a comorbid condition. Fifteen percent of articles did not specify additional conditions that were excluded. Impaired cognition was the most commonly excluded condition. Approximately 37% of articles excluded patients who had experienced a previous stroke. Twenty-four percent excluded patients one or more Charlson Index condition, and 83% excluded patients with at least one other medical condition. This review represents a first attempt to map literature on stroke rehabilitation related to co/multimorbidity and identify gaps in existing research. Existing evidence on stroke rehabilitation often excluded individuals with comorbidities. This is problematic as the evidence that is used to generate clinical guidelines may not match the patient typically seen in practice. The use of alternate research methods are therefore needed for studying the care of individuals with stroke and multimorbidity.
NASA Astrophysics Data System (ADS)
Lebedeva, Liudmila; Semenova, Olga
2013-04-01
One of widely claimed problems in modern modelling hydrology is lack of available information to investigate hydrological processes and improve their representation in the models. In spite of this, one hardly might confidently say that existing "traditional" data sources have been already fully analyzed and made use of. There existed the network of research watersheds in USSR called water-balance stations where comprehensive and extensive hydrometeorological measurements were conducted according to more or less single program during the last 40-60 years. The program (where not ceased) includes observations of discharges in several, often nested and homogeneous, small watersheds, meteorological elements, evaporation, soil temperature and moisture, snow depths, etc. The network covered different climatic and landscape zones and was established in the middle of the last century with the aim of investigation of the runoff formation in different conditions. Until recently the long-term observational data accompanied by descriptions and maps had existed only in hard copies. It partly explains why these datasets are not enough exploited yet and very rarely or even never were used for the purposes of hydrological modelling although they seem to be much more promising than implementation of the completely new measuring techniques not detracting from its importance. The goal of the presented work is development of a database of observational data and supportive materials from small research watersheds across the territory of the former Soviet Union. The first version of the database will include the following information for 12 water-balance stations across Russia, Ukraine, Kazahstan and Turkmenistan: daily values of discharges (one or several watersheds), air temperature, humidity, precipitation (one or several gauges), soil and snow state variables, soil and snow evaporation. The stations will cover desert and semi desert, steppe and forest steppe, forest, permafrost and mountainous zones. Supportive material will include maps of watershed boundaries and location of observational sites. Text descriptions of the data, measuring techniques and hydrometeorological conditions related to each of the water-balance station will accompany the datasets. The database is supposed to be expanded with time in number of the stations (by 20) and available data series for each of them. It will be uploaded to the internet with open access to everyone interested in. Such a database allows one to test hydrological models and separate modules for their adequacy and workability in different conditions and can serve as a base for models comparison and evaluation. Special profit of the database will gain models that don't rely on calibration but on the adequate process representation and use of the observable parameters. One of such models, process-based Hydrograph model, will be tested against the data from every watershed from the developed database. The aim of the Hydrograph model application to the as many as possible number of research data-rich watersheds in different climatic zones is both amending the algorithms and creation and adjustment of the model parameters that allow using the model across the geographic spectrum.
Federated Web-accessible Clinical Data Management within an Extensible NeuroImaging Database
Keator, David B.; Wei, Dingying; Fennema-Notestine, Christine; Pease, Karen R.; Bockholt, Jeremy; Grethe, Jeffrey S.
2010-01-01
Managing vast datasets collected throughout multiple clinical imaging communities has become critical with the ever increasing and diverse nature of datasets. Development of data management infrastructure is further complicated by technical and experimental advances that drive modifications to existing protocols and acquisition of new types of research data to be incorporated into existing data management systems. In this paper, an extensible data management system for clinical neuroimaging studies is introduced: The Human Clinical Imaging Database (HID) and Toolkit. The database schema is constructed to support the storage of new data types without changes to the underlying schema. The complex infrastructure allows management of experiment data, such as image protocol and behavioral task parameters, as well as subject-specific data, including demographics, clinical assessments, and behavioral task performance metrics. Of significant interest, embedded clinical data entry and management tools enhance both consistency of data reporting and automatic entry of data into the database. The Clinical Assessment Layout Manager (CALM) allows users to create on-line data entry forms for use within and across sites, through which data is pulled into the underlying database via the generic clinical assessment management engine (GAME). Importantly, the system is designed to operate in a distributed environment, serving both human users and client applications in a service-oriented manner. Querying capabilities use a built-in multi-database parallel query builder/result combiner, allowing web-accessible queries within and across multiple federated databases. The system along with its documentation is open-source and available from the Neuroimaging Informatics Tools and Resource Clearinghouse (NITRC) site. PMID:20567938
Jordan, Kelvin; Clarke, Alexandra M; Symmons, Deborah PM; Fleming, Douglas; Porcheret, Mark; Kadam, Umesh T; Croft, Peter
2007-01-01
Background Primary care consultation data are an important source of information on morbidity prevalence. It is not known how reliable such figures are. Aim To compare annual consultation prevalence estimates for musculoskeletal conditions derived from four general practice consultation databases. Design of study Retrospective study of general practice consultation records. Setting Three national general practice consultation databases: i) Fourth Morbidity Statistics from General Practice (MSGP4, 1991/92), ii) Royal College of General Practitioners Weekly Returns Service (RCGP WRS, 2001), and iii) General Practice Research Database (GPRD, 1991 and 2001); and one regional database (Consultations in Primary Care Archive, 2001). Method Age-sex standardised persons consulting annual prevalence rates for musculoskeletal conditions overall, rheumatoid arthritis, osteoarthritis and arthralgia were derived for patients aged 15 years and over. Results GPRD prevalence of any musculoskeletal condition, rheumatoid arthritis and osteoarthritis was lower than that of the other databases. This is likely to be due to GPs not needing to record every consultation made for a chronic condition. MSGP4 gave the highest prevalence for osteoarthritis but low prevalence of arthralgia which reflects encouragement for GPs to use diagnostic rather than symptom codes. Conclusion Considerable variation exists in consultation prevalence estimates for musculoskeletal conditions. Researchers and health service planners should be aware that estimates of disease occurrence based on consultation will be influenced by choice of database. This is likely to be true for other chronic diseases and where alternative symptom labels exist for a disease. RCGP WRS may give the most reliable prevalence figures for musculoskeletal and other chronic diseases. PMID:17244418
Federated web-accessible clinical data management within an extensible neuroimaging database.
Ozyurt, I Burak; Keator, David B; Wei, Dingying; Fennema-Notestine, Christine; Pease, Karen R; Bockholt, Jeremy; Grethe, Jeffrey S
2010-12-01
Managing vast datasets collected throughout multiple clinical imaging communities has become critical with the ever increasing and diverse nature of datasets. Development of data management infrastructure is further complicated by technical and experimental advances that drive modifications to existing protocols and acquisition of new types of research data to be incorporated into existing data management systems. In this paper, an extensible data management system for clinical neuroimaging studies is introduced: The Human Clinical Imaging Database (HID) and Toolkit. The database schema is constructed to support the storage of new data types without changes to the underlying schema. The complex infrastructure allows management of experiment data, such as image protocol and behavioral task parameters, as well as subject-specific data, including demographics, clinical assessments, and behavioral task performance metrics. Of significant interest, embedded clinical data entry and management tools enhance both consistency of data reporting and automatic entry of data into the database. The Clinical Assessment Layout Manager (CALM) allows users to create on-line data entry forms for use within and across sites, through which data is pulled into the underlying database via the generic clinical assessment management engine (GAME). Importantly, the system is designed to operate in a distributed environment, serving both human users and client applications in a service-oriented manner. Querying capabilities use a built-in multi-database parallel query builder/result combiner, allowing web-accessible queries within and across multiple federated databases. The system along with its documentation is open-source and available from the Neuroimaging Informatics Tools and Resource Clearinghouse (NITRC) site.
HTAPP: High-Throughput Autonomous Proteomic Pipeline
Yu, Kebing; Salomon, Arthur R.
2011-01-01
Recent advances in the speed and sensitivity of mass spectrometers and in analytical methods, the exponential acceleration of computer processing speeds, and the availability of genomic databases from an array of species and protein information databases have led to a deluge of proteomic data. The development of a lab-based automated proteomic software platform for the automated collection, processing, storage, and visualization of expansive proteomic datasets is critically important. The high-throughput autonomous proteomic pipeline (HTAPP) described here is designed from the ground up to provide critically important flexibility for diverse proteomic workflows and to streamline the total analysis of a complex proteomic sample. This tool is comprised of software that controls the acquisition of mass spectral data along with automation of post-acquisition tasks such as peptide quantification, clustered MS/MS spectral database searching, statistical validation, and data exploration within a user-configurable lab-based relational database. The software design of HTAPP focuses on accommodating diverse workflows and providing missing software functionality to a wide range of proteomic researchers to accelerate the extraction of biological meaning from immense proteomic data sets. Although individual software modules in our integrated technology platform may have some similarities to existing tools, the true novelty of the approach described here is in the synergistic and flexible combination of these tools to provide an integrated and efficient analysis of proteomic samples. PMID:20336676
CMD: a Cotton Microsatellite Database resource for Gossypium genomics
Blenda, Anna; Scheffler, Jodi; Scheffler, Brian; Palmer, Michael; Lacape, Jean-Marc; Yu, John Z; Jesudurai, Christopher; Jung, Sook; Muthukumar, Sriram; Yellambalase, Preetham; Ficklin, Stephen; Staton, Margaret; Eshelman, Robert; Ulloa, Mauricio; Saha, Sukumar; Burr, Ben; Liu, Shaolin; Zhang, Tianzhen; Fang, Deqiu; Pepper, Alan; Kumpatla, Siva; Jacobs, John; Tomkins, Jeff; Cantrell, Roy; Main, Dorrie
2006-01-01
Background The Cotton Microsatellite Database (CMD) is a curated and integrated web-based relational database providing centralized access to publicly available cotton microsatellites, an invaluable resource for basic and applied research in cotton breeding. Description At present CMD contains publication, sequence, primer, mapping and homology data for nine major cotton microsatellite projects, collectively representing 5,484 microsatellites. In addition, CMD displays data for three of the microsatellite projects that have been screened against a panel of core germplasm. The standardized panel consists of 12 diverse genotypes including genetic standards, mapping parents, BAC donors, subgenome representatives, unique breeding lines, exotic introgression sources, and contemporary Upland cottons with significant acreage. A suite of online microsatellite data mining tools are accessible at CMD. These include an SSR server which identifies microsatellites, primers, open reading frames, and GC-content of uploaded sequences; BLAST and FASTA servers providing sequence similarity searches against the existing cotton SSR sequences and primers, a CAP3 server to assemble EST sequences into longer transcripts prior to mining for SSRs, and CMap, a viewer for comparing cotton SSR maps. Conclusion The collection of publicly available cotton SSR markers in a centralized, readily accessible and curated web-enabled database provides a more efficient utilization of microsatellite resources and will help accelerate basic and applied research in molecular breeding and genetic mapping in Gossypium spp. PMID:16737546
Pseudonymisation of radiology data for research purposes
NASA Astrophysics Data System (ADS)
Noumeir, Rita; Lemay, Alain; Lina, Jean-Marc
2005-04-01
Medical image processing methods and algorithms, developed by researchers, need to be validated and tested. Test data should ideally be real clinical data especially when that clinical data is varied and exists in large volume. In nowadays, clinical data is accessible electronically and has important value for researchers. However, the usage of clinical data for research purposes should respect data confidentiality, patient right to privacy and the patient consent. In fact, clinical data is nominative given that it contains information about the patient such as name, age and identification number. Evidently, clinical data should be de-identified to be exported to research databases. However, the same patient is usually followed during a long period of time. The disease progression and the diagnostic evolution represent extremely valuable information for researchers, as well. Our objective is to build a research database from de-identified clinical data while enabling the database to be easily incremented by exporting new pseudonymous data, acquired over a long period of time. Pseudonymisation is data de-identification such that data belonging to the same individual in the clinical environment bear the same relation to each other in the de-identified research version. In this paper, we propose a software architecture that enables the implementation of a research database that can be incremented in time. We also evaluate its security and discuss its security pitfalls.
A review of affecting factors on sexual satisfaction in women.
Shahhosseini, Zohreh; Gardeshi, Zeinab Hamzeh; Pourasghar, Mehdi; Salehi, Fariba
2014-12-01
Sex is a complex, important and sensitive issue in human being and interwoven with the whole of human existence. Given the serious changes in attitude, function and behavior in sex, the need to address sexual function, especially sexual satisfaction, is felt completely. Sexual satisfaction has a very important role in creating marital satisfaction and any defect in sexual satisfaction is significantly associated with risky sexual behaviors, serious mental illness, social crimes and ultimately divorce. The aim of this study was to explore affecting factors on sexual satisfaction in women based on an overview in scientific database. In this narrative review the researchers searched MEDLINE database, Google Scholar and Science Direct as well as Persian database like Scientific Information Database with search terms of sexual satisfaction and sexual function, restricted to English/ Persian language, during the 20 years ago. Then those articles written by renowned experts were selected. In this regard, 57 articles have been reviewed, which 30 articles related to this research have been extracted. The findings were divided in to four categories including: Demographic factors, Pathophysiological factors, Psychological factors and Sociocultural factors. Sexuality, especially sexual intimacy is sophisticated and yet elegant affair that the other persons has different definitions and different functions. Discrepancies in the results of the studies show that analysis of factors affecting sexual satisfaction regardless of the women's' sociocultural context, religious beliefs, and personal attitudes is undoubtedly inefficient, unscientific and irrational.
Time and Space Efficient Algorithms for Two-Party Authenticated Data Structures
NASA Astrophysics Data System (ADS)
Papamanthou, Charalampos; Tamassia, Roberto
Authentication is increasingly relevant to data management. Data is being outsourced to untrusted servers and clients want to securely update and query their data. For example, in database outsourcing, a client's database is stored and maintained by an untrusted server. Also, in simple storage systems, clients can store very large amounts of data but at the same time, they want to assure their integrity when they retrieve them. In this paper, we present a model and protocol for two-party authentication of data structures. Namely, a client outsources its data structure and verifies that the answers to the queries have not been tampered with. We provide efficient algorithms to securely outsource a skip list with logarithmic time overhead at the server and client and logarithmic communication cost, thus providing an efficient authentication primitive for outsourced data, both structured (e.g., relational databases) and semi-structured (e.g., XML documents). In our technique, the client stores only a constant amount of space, which is optimal. Our two-party authentication framework can be deployed on top of existing storage applications, thus providing an efficient authentication service. Finally, we present experimental results that demonstrate the practical efficiency and scalability of our scheme.
PhyloExplorer: a web server to validate, explore and query phylogenetic trees
Ranwez, Vincent; Clairon, Nicolas; Delsuc, Frédéric; Pourali, Saeed; Auberval, Nicolas; Diser, Sorel; Berry, Vincent
2009-01-01
Background Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees. Results We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database. Conclusion PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: and the source code can be downloaded from: . PMID:19450253
What do data used to develop ground-motion prediction equations tell us about motions near faults?
Boore, David M.
2014-01-01
A large database of ground motions from shallow earthquakes occurring in active tectonic regions around the world, recently developed in the Pacific Earthquake Engineering Center’s NGA-West2 project, has been used to investigate what such a database can say about the properties and processes of crustal fault zones. There are a relatively small number of near-rupture records, implying that few recordings in the database are within crustal fault zones, but the records that do exist emphasize the complexity of ground-motion amplitudes and polarization close to individual faults. On average over the whole data set, however, the scaling of ground motions with magnitude at a fixed distance, and the distance dependence of the ground motions, seem to be largely consistent with simple seismological models of source scaling, path propagation effects, and local site amplification. The data show that ground motions close to large faults, as measured by elastic response spectra, tend to saturate and become essentially constant for short periods. This saturation seems to be primarily a geometrical effect, due to the increasing size of the rupture surface with magnitude, and not due to a breakdown in self similarity.
How to explain variations in sea cliff erosion rate?
NASA Astrophysics Data System (ADS)
Prémaillon, Melody; Regard, Vincent; Dewez, Thomas
2017-04-01
Every rocky coast of the world is eroding at different rate (cliff retreat rates). Erosion is caused by a complex interaction of multiple sea weather factors. While numerous local studies exist and explain erosion processes on specific sites, global studies lack. We started to compile many of those local studies and analyse their results with a global point of view in order to quantify the various parameters influencing erosion rates. In other words: is erosion more important in energetic seas? Are chalk cliff eroding faster in rainy environment? etc. In order to do this, we built a database based on literature and national erosion databases. It now contains 80 publications which represents 2500 cliffs studied and more than 3500 erosion rate estimates. A statistical analysis was conducted on this database. On a first approximation, cliff lithology is the only clear signal explaining erosion rate variation: hard lithologies are eroding at 1cm/y or less, whereas unconsolidated lithologies commonly erode faster than 10cm/y. No clear statistical relation were found between erosion rate and external parameters such as sea energy (swell, tide) or weather condition, even on cliff with similar lithology.
Smith, W Brad; Cuenca Lara, Rubí Angélica; Delgado Caballero, Carina Edith; Godínez Valdivia, Carlos Isaías; Kapron, Joseph S; Leyva Reyes, Juan Carlos; Meneses Tovar, Carmen Lourdes; Miles, Patrick D; Oswalt, Sonja N; Ramírez Salgado, Mayra; Song, Xilong Alex; Stinson, Graham; Villela Gaytán, Sergio Armando
2018-05-21
Forests cannot be managed sustainably without reliable data to inform decisions. National Forest Inventories (NFI) tend to report national statistics, with sub-national stratification based on domestic ecological classification systems. It is becoming increasingly important to be able to report statistics on ecosystems that span international borders, as global change and globalization expand stakeholders' spheres of concern. The state of a transnational ecosystem can only be properly assessed by examining the entire ecosystem. In global forest resource assessments, it may be useful to break national statistics down by ecosystem, especially for large countries. The Inventory and Monitoring Working Group (IMWG) of the North American Forest Commission (NAFC) has begun developing a harmonized North American Forest Database (NAFD) for managing forest inventory data, enabling consistent, continental-scale forest assessment supporting ecosystem-level reporting and relational queries. The first iteration of the database contains data describing 1.9 billion ha, including 677.5 million ha of forest. Data harmonization is made challenging by the existence of definitions and methodologies tailored to suit national circumstances, emerging from each country's professional forestry development. This paper reports the methods used to synchronize three national forest inventories, starting with a small suite of variables and attributes.
Web application and database modeling of traffic impact analysis using Google Maps
NASA Astrophysics Data System (ADS)
Yulianto, Budi; Setiono
2017-06-01
Traffic impact analysis (TIA) is a traffic study that aims at identifying the impact of traffic generated by development or change in land use. In addition to identifying the traffic impact, TIA is also equipped with mitigation measurement to minimize the arising traffic impact. TIA has been increasingly important since it was defined in the act as one of the requirements in the proposal of Building Permit. The act encourages a number of TIA studies in various cities in Indonesia, including Surakarta. For that reason, it is necessary to study the development of TIA by adopting the concept Transportation Impact Control (TIC) in the implementation of the TIA standard document and multimodal modeling. It includes TIA's standardization for technical guidelines, database and inspection by providing TIA checklists, monitoring and evaluation. The research was undertaken by collecting the historical data of junctions, modeling of the data in the form of relational database, building a user interface for CRUD (Create, Read, Update and Delete) the TIA data in the form of web programming with Google Maps libraries. The result research is a system that provides information that helps the improvement and repairment of TIA documents that exist today which is more transparent, reliable and credible.
Automatic image database generation from CAD for 3D object recognition
NASA Astrophysics Data System (ADS)
Sardana, Harish K.; Daemi, Mohammad F.; Ibrahim, Mohammad K.
1993-06-01
The development and evaluation of Multiple-View 3-D object recognition systems is based on a large set of model images. Due to the various advantages of using CAD, it is becoming more and more practical to use existing CAD data in computer vision systems. Current PC- level CAD systems are capable of providing physical image modelling and rendering involving positional variations in cameras, light sources etc. We have formulated a modular scheme for automatic generation of various aspects (views) of the objects in a model based 3-D object recognition system. These views are generated at desired orientations on the unit Gaussian sphere. With a suitable network file sharing system (NFS), the images can directly be stored on a database located on a file server. This paper presents the image modelling solutions using CAD in relation to multiple-view approach. Our modular scheme for data conversion and automatic image database storage for such a system is discussed. We have used this approach in 3-D polyhedron recognition. An overview of the results, advantages and limitations of using CAD data and conclusions using such as scheme are also presented.
What Do Data Used to Develop Ground-Motion Prediction Equations Tell Us About Motions Near Faults?
NASA Astrophysics Data System (ADS)
Boore, David M.
2014-11-01
A large database of ground motions from shallow earthquakes occurring in active tectonic regions around the world, recently developed in the Pacific Earthquake Engineering Center's NGA-West2 project, has been used to investigate what such a database can say about the properties and processes of crustal fault zones. There are a relatively small number of near-rupture records, implying that few recordings in the database are within crustal fault zones, but the records that do exist emphasize the complexity of ground-motion amplitudes and polarization close to individual faults. On average over the whole data set, however, the scaling of ground motions with magnitude at a fixed distance, and the distance dependence of the ground motions, seem to be largely consistent with simple seismological models of source scaling, path propagation effects, and local site amplification. The data show that ground motions close to large faults, as measured by elastic response spectra, tend to saturate and become essentially constant for short periods. This saturation seems to be primarily a geometrical effect, due to the increasing size of the rupture surface with magnitude, and not due to a breakdown in self similarity.
LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC.
Allot, Alexis; Peng, Yifan; Wei, Chih-Hsuan; Lee, Kyubum; Phan, Lon; Lu, Zhiyong
2018-05-14
The identification and interpretation of genomic variants play a key role in the diagnosis of genetic diseases and related research. These tasks increasingly rely on accessing relevant manually curated information from domain databases (e.g. SwissProt or ClinVar). However, due to the sheer volume of medical literature and high cost of expert curation, curated variant information in existing databases are often incomplete and out-of-date. In addition, the same genetic variant can be mentioned in publications with various names (e.g. 'A146T' versus 'c.436G>A' versus 'rs121913527'). A search in PubMed using only one name usually cannot retrieve all relevant articles for the variant of interest. Hence, to help scientists, healthcare professionals, and database curators find the most up-to-date published variant research, we have developed LitVar for the search and retrieval of standardized variant information. In addition, LitVar uses advanced text mining techniques to compute and extract relationships between variants and other associated entities such as diseases and chemicals/drugs. LitVar is publicly available at https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar.
A Relational Database System for Student Use.
ERIC Educational Resources Information Center
Fertuck, Len
1982-01-01
Describes an APL implementation of a relational database system suitable for use in a teaching environment in which database development and database administration are studied, and discusses the functions of the user and the database administrator. An appendix illustrating system operation and an eight-item reference list are attached. (Author/JL)
Historical hydrology and database on flood events (Apulia, southern Italy)
NASA Astrophysics Data System (ADS)
Lonigro, Teresa; Basso, Alessia; Gentile, Francesco; Polemio, Maurizio
2014-05-01
Historical data about floods represent an important tool for the comprehension of the hydrological processes, the estimation of hazard scenarios as a basis for Civil Protection purposes, as a basis of the rational land use management, especially in karstic areas, where time series of river flows are not available and the river drainage is rare. The research shows the importance of the improvement of existing flood database with an historical approach, finalized to collect past or historical floods event, in order to better assess the occurrence trend of floods, in the case for the Apulian region (south Italy). The main source of records of flood events for Apulia was the AVI (the acronym means Italian damaged areas) database, an existing Italian database that collects data concerning damaging floods from 1918 to 1996. The database was expanded consulting newspapers, publications, and technical reports from 1996 to 2006. In order to expand the temporal range further data were collected searching in the archives of regional libraries. About 700 useful news from 17 different local newspapers were found from 1876 to 1951. From a critical analysis of the 700 news collected since 1876 to 1952 only 437 were useful for the implementation of the Apulia database. The screening of these news showed the occurrence of about 122 flood events in the entire region. The district of Bari, the regional main town, represents the area in which the great number of events occurred; the historical analysis confirms this area as flood-prone. There is an overlapping period (from 1918 to 1952) between old AVI database and new historical dataset obtained by newspapers. With regard to this period, the historical research has highlighted new flood events not reported in the existing AVI database and it also allowed to add more details to the events already recorded. This study shows that the database is a dynamic instrument, which allows a continuous implementation of data, even in real time. More details on previous results of this research activity were recently published (Polemio, 2010; Basso et al., 2012; Lonigro et al., 2013) References Basso A., Lonigro T. and Polemio M. (2012) "The improvement of historical database on damaging hydrogeological events in the case of Apulia (Southern Italy)". Rendiconti online della Società Geologica Italiana, 21: 379-380; Lonigro T., Basso A. and Polemio M. (2013) "Historical database on damaging hydrogeological events in Apulia region (Southern Italy)". Rendiconti online della Società Geologica Italiana, 24: 196-198; Polemio M. (2010) "Historical floods and a recent extreme rainfall event in the Murgia karstic environment (Southern Italy)". Zeitschrift für Geomorphologie, 54(2): 195-219.
Popova, Svetlana; Yaltonskaya, Aleksandra; Yaltonsky, Vladimir; Kolpakov, Yaroslav; Abrosimov, Ilya; Pervakov, Kristina; Tanner, Valeria; Rehm, Jürgen
2014-01-01
Aims: Although Russia has one of the highest rates of alcohol consumption and alcohol-attributable burden of disease, little is known about the existing research on prenatal alcohol exposure (PAE) and Fetal Alcohol Spectrum Disorders (FASDs) in this country. The objective of this study was to locate and review published and unpublished studies related to any aspect of PAE and FASD conducted in or using study populations from Russia. Methods: A systematic literature search was conducted in multiple English and Russian electronic bibliographic databases. In addition, a manual search was conducted in several major libraries in Moscow. Results: The search revealed a small pool of existing research studies related to PAE and/or FASD in Russia (126: 22 in English and 104 in Russian). Existing epidemiological data indicate a high prevalence of PAE and FASD, which underlines the strong negative impact that alcohol has on mortality, morbidity and disability in Russia. High levels of alcohol consumption by women of childbearing age, low levels of contraception use, and low levels of knowledge by health and other professionals regarding the harmful effects of PAE put this country at great risk of further alcohol-affected pregnancies. Conclusions: Alcohol preventive measures in Russia warrant immediate attention. More research focused on alcohol prevention and policy is needed in order to reduce alcohol-related harm, especially in the field of FASD. PMID:24158024
Makridis, Kostas G; Tosounidis, Theodoros; Giannoudis, Peter V
2013-01-01
Implant related sepsis is a relatively unusual complication of intra-medullary nail fixation of long bone fractures. Depending on the extent of infection, timing of diagnosis and progress of fracture union, different treatment strategies have been developed. The aim of this review article is to collect and analyze the existing evidence about the incidence and management of infection following IM nailing of long bone fractures and to recommend treatment algorithms that could be valuable in everyday clinical practice. After searching the P u b M e d /Medline databases, 1270 articles were found related to the topic during the last 20 years. The final review included 28 articles that fulfilled the inclusion criteria. Only a few prospective studies exist to report on the management of infection following IM nailing of long-bone fractures. In general, stage I (early) infections only require antibiotic administration with/without debridement. Stage II (delayed) infections can be successfully treated with debridement, IM reaming, antibiotic nails, and administration of antibiotics. Infected non-unions are best treated with exchange nailing, antibiotic administration and when infection has been eradicated with graft implantation if it is needed. Debridement, exchange nailing and systemic administration of antibiotics is the best indication for stage III (late) infections, while stage III infected non-unions can successfully be treated with nail removal and Ilizarov frame, especially when large bone defects exist. PMID:23919097
Narayanan, Shrikanth; Toutios, Asterios; Ramanarayanan, Vikram; Lammert, Adam; Kim, Jangwon; Lee, Sungbok; Nayak, Krishna; Kim, Yoon-Chul; Zhu, Yinghua; Goldstein, Louis; Byrd, Dani; Bresch, Erik; Ghosh, Prasanta; Katsamanis, Athanasios; Proctor, Michael
2014-01-01
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community. PMID:25190403
Milc, Justyna; Sala, Antonio; Bergamaschi, Sonia; Pecchioni, Nicola
2011-01-01
The CEREALAB database aims to store genotypic and phenotypic data obtained by the CEREALAB project and to integrate them with already existing data sources in order to create a tool for plant breeders and geneticists. The database can help them in unravelling the genetics of economically important phenotypic traits; in identifying and choosing molecular markers associated to key traits; and in choosing the desired parentals for breeding programs. The database is divided into three sub-schemas corresponding to the species of interest: wheat, barley and rice; each sub-schema is then divided into two sub-ontologies, regarding genotypic and phenotypic data, respectively. Database URL: http://www.cerealab.unimore.it/jws/cerealab.jnlp PMID:21247929
NASA Astrophysics Data System (ADS)
Gentry, Jeffery D.
2000-05-01
A relational database is a powerful tool for collecting and analyzing the vast amounts of inner-related data associated with the manufacture of composite materials. A relational database contains many individual database tables that store data that are related in some fashion. Manufacturing process variables as well as quality assurance measurements can be collected and stored in database tables indexed according to lot numbers, part type or individual serial numbers. Relationships between manufacturing process and product quality can then be correlated over a wide range of product types and process variations. This paper presents details on how relational databases are used to collect, store, and analyze process variables and quality assurance data associated with the manufacture of advanced composite materials. Important considerations are covered including how the various types of data are organized and how relationships between the data are defined. Employing relational database techniques to establish correlative relationships between process variables and quality assurance measurements is then explored. Finally, the benefits of database techniques such as data warehousing, data mining and web based client/server architectures are discussed in the context of composite material manufacturing.
Is electroconvulsive therapy during pregnancy safe?
Jiménez-Cornejo, Magdalena; Zamorano-Levi, Natalia; Jeria, Álvaro
2016-12-07
Therapeutic options for psychiatric conditions are limited during pregnancy because many drugs are restricted or contraindicated. Electroconvulsive therapy constitutes an alternative, however there is controversy over its safety. Using the Epistemonikos database, which is maintained by searching multiple databases, we found five systematic reviews, including 81 studies overall describing case series or individual cases. Data were extracted from the identified reviews and summary tables of the results were prepared using the GRADE method. We concluded it is not clear what are the risks associated with electroconvulsive therapy during pregnancy because the certainty of the existing evidence is very low. Likewise, existing systematic reviews and international clinical guidelines differ in their conclusions and recommendations.
New meteor showers – yes or not?
NASA Astrophysics Data System (ADS)
Koukal, Jakub
2018-01-01
The development of meteor astronomy associated with the development of CCD technology is reflected in a huge increase in databases of meteor orbits. It has never been possible before in the history of meteor astronomy to examine properties of meteors or meteor showers. Existing methods for detecting new meteor showers seem to be inadequate in these circumstances. The spontaneous discovery of new meteor showers leads to ambiguous specifications of new meteor showers. There is a duplication of already discovered meteor showers and a division of existing meteor showers based on their own criteria. The analysis in this article considers some new meteor showers in the IAU MDC database.
Joseph‐Williams, Natalie; Edwards, Adrian; Elwyn, Glyn
2011-01-01
Abstract Background or context Regret is a common consequence of decisions, including those decisions related to individuals’ health. Several assessment instruments have been developed that attempt to measure decision regret. However, recent research has highlighted the complexity of regret. Given its relevance to shared decision making, it is important to understand its conceptualization and the instruments used to measure it. Objectives To review current conceptions of regret. To systematically identify instruments used to measure decision regret and assess whether they capture recent conceptualizations of regret. Search strategy Five electronic databases were searched in 2008. Search strategies used a combination of MeSH terms (or database equivalent) and free text searching under the following key headings: ‘Decision’ and ‘regret’ and ‘measurement’. Follow‐up manual searches were also performed. Inclusion criteria Articles were included if they reported the development and psychometric testing of an instrument designed to measure decision regret, or the use of a previously developed and tested instrument. Main results Thirty‐two articles were included: 10 report the development and validation of an instrument that measures decision regret and 22 report the use of a previously developed and tested instrument. Content analysis found that existing instruments for the measurement of regret do not capture current conceptualizations of regret and they do not enable the construct of regret to be measured comprehensively. Conclusions Existing instrumentation requires further development. There is also a need to clarify the purpose for using regret assessment instruments as this will, and should, focus their future application. PMID:20860776
Using the TIGR gene index databases for biological discovery.
Lee, Yuandan; Quackenbush, John
2003-11-01
The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Centralized database for interconnection system design. [for spacecraft
NASA Technical Reports Server (NTRS)
Billitti, Joseph W.
1989-01-01
A database application called DFACS (Database, Forms and Applications for Cabling and Systems) is described. The objective of DFACS is to improve the speed and accuracy of interconnection system information flow during the design and fabrication stages of a project, while simultaneously supporting both the horizontal (end-to-end wiring) and the vertical (wiring by connector) design stratagems used by the Jet Propulsion Laboratory (JPL) project engineering community. The DFACS architecture is centered around a centralized database and program methodology which emulates the manual design process hitherto used at JPL. DFACS has been tested and successfully applied to existing JPL hardware tasks with a resulting reduction in schedule time and costs.
Toward a view-oriented approach for aligning RDF-based biomedical repositories.
Anguita, A; García-Remesal, M; de la Iglesia, D; Graf, N; Maojo, V
2015-01-01
This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". The need for complementary access to multiple RDF databases has fostered new lines of research, but also entailed new challenges due to data representation disparities. While several approaches for RDF-based database integration have been proposed, those focused on schema alignment have become the most widely adopted. All state-of-the-art solutions for aligning RDF-based sources resort to a simple technique inherited from legacy relational database integration methods. This technique - known as element-to-element (e2e) mappings - is based on establishing 1:1 mappings between single primitive elements - e.g. concepts, attributes, relationships, etc. - belonging to the source and target schemas. However, due to the intrinsic nature of RDF - a representation language based on defining tuples < subject, predicate, object > -, one may find RDF elements whose semantics vary dramatically when combined into a view involving other RDF elements - i.e. they depend on their context. The latter cannot be adequately represented in the target schema by resorting to the traditional e2e approach. These approaches fail to properly address this issue without explicitly modifying the target ontology, thus lacking the required expressiveness for properly reflecting the intended semantics in the alignment information. To enhance existing RDF schema alignment techniques by providing a mechanism to properly represent elements with context-dependent semantics, thus enabling users to perform more expressive alignments, including scenarios that cannot be adequately addressed by the existing approaches. Instead of establishing 1:1 correspondences between single primitive elements of the schemas, we propose adopting a view-based approach. The latter is targeted at establishing mapping relationships between RDF subgraphs - that can be regarded as the equivalent of views in traditional databases -, rather than between single schema elements. This approach enables users to represent scenarios defined by context-dependent RDF elements that cannot be properly represented when adopting the currently existing approaches. We developed a software tool implementing our view-based strategy. Our tool is currently being used in the context of the European Commission funded p-medicine project, targeted at creating a technological framework to integrate clinical and genomic data to facilitate the development of personalized drugs and therapies for cancer, based on the genetic profile of the patient. We used our tool to integrate different RDF-based databases - including different repositories of clinical trials and DICOM images - using the Health Data Ontology Trunk (HDOT) ontology as the target schema. The importance of database integration methods and tools in the context of biomedical research has been widely recognized. Modern research in this area - e.g. identification of disease biomarkers, or design of personalized therapies - heavily relies on the availability of a technical framework to enable researchers to uniformly access disparate repositories. We present a method and a tool that implement a novel alignment method specifically designed to support and enhance the integration of RDF-based data sources at schema (metadata) level. This approach provides an increased level of expressiveness compared to other existing solutions, and allows solving heterogeneity scenarios that cannot be properly represented using other state-of-the-art techniques.
A Relational Algebra Query Language for Programming Relational Databases
ERIC Educational Resources Information Center
McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole
2011-01-01
In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…
ERIC Educational Resources Information Center
Brown, Cecelia
2003-01-01
Discusses the growth in use and acceptance of Web-based genomic and proteomic databases (GPD) in scholarly communication. Confirms the role of GPD in the scientific literature cycle, suggests GPD are a storage and retrieval mechanism for molecular biology information, and recommends that existing models of scientific communication be updated to…
Enhancements to the Redmine Database Metrics Plug in
2017-08-01
management web application has been adopted within the US Army Research Laboratory’s Computational and Information Sciences Directorate as a database...Metrics Plug-in by Terry C Jameson Computational and Information Sciences Directorate, ARL Approved for public... information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and
ERIC Educational Resources Information Center
Micco, Mary; Popp, Rich
Techniques for building a world-wide information infrastructure by reverse engineering existing databases to link them in a hierarchical system of subject clusters to create an integrated database are explored. The controlled vocabulary of the Library of Congress Subject Headings is used to ensure consistency and group similar items. Each database…
ERIC Educational Resources Information Center
Justice, Laura M.; Breit-Smith, Allison; Rogers, Margaret
2010-01-01
Purpose: This clinical forum was organized to provide a means for informing the research and clinical communities of one mechanism through which research capacity might be enhanced within the field of speech-language pathology. Specifically, forum authors describe the process of conducting secondary analyses of extant databases to answer questions…
Kim, Jong Hyun; Hong, Hyung Gil; Park, Kang Ryoung
2017-05-08
Because intelligent surveillance systems have recently undergone rapid growth, research on accurately detecting humans in videos captured at a long distance is growing in importance. The existing research using visible light cameras has mainly focused on methods of human detection for daytime hours when there is outside light, but human detection during nighttime hours when there is no outside light is difficult. Thus, methods that employ additional near-infrared (NIR) illuminators and NIR cameras or thermal cameras have been used. However, in the case of NIR illuminators, there are limitations in terms of the illumination angle and distance. There are also difficulties because the illuminator power must be adaptively adjusted depending on whether the object is close or far away. In the case of thermal cameras, their cost is still high, which makes it difficult to install and use them in a variety of places. Because of this, research has been conducted on nighttime human detection using visible light cameras, but this has focused on objects at a short distance in an indoor environment or the use of video-based methods to capture multiple images and process them, which causes problems related to the increase in the processing time. To resolve these problems, this paper presents a method that uses a single image captured at night on a visible light camera to detect humans in a variety of environments based on a convolutional neural network. Experimental results using a self-constructed Dongguk night-time human detection database (DNHD-DB1) and two open databases (Korea advanced institute of science and technology (KAIST) and computer vision center (CVC) databases), as well as high-accuracy human detection in a variety of environments, show that the method has excellent performance compared to existing methods.
NASA Astrophysics Data System (ADS)
Sailor, David J.; Georgescu, Matei; Milne, Jeffrey M.; Hart, Melissa A.
2015-10-01
Given increasing utility of numerical models to examine urban impacts on meteorology and climate, there exists an urgent need for accurate representation of seasonally and diurnally varying anthropogenic heating data, an important component of the urban energy budget for cities across the world. Incorporation of anthropogenic heating data as inputs to existing climate modeling systems has direct societal implications ranging from improved prediction of energy demand to health assessment, but such data are lacking for most cities. To address this deficiency we have applied a standardized procedure to develop a national database of seasonally and diurnally varying anthropogenic heating profiles for 61 of the largest cities in the United Stated (U.S.). Recognizing the importance of spatial scale, the anthropogenic heating database developed includes the city scale and the accompanying greater metropolitan area. Our analysis reveals that a single profile function can adequately represent anthropogenic heating during summer but two profile functions are required in winter, one for warm climate cities and another for cold climate cities. On average, although anthropogenic heating is 40% larger in winter than summer, the electricity sector contribution peaks during summer and is smallest in winter. Because such data are similarly required for international cities where urban climate assessments are also ongoing, we have made a simple adjustment accounting for different international energy consumption rates relative to the U.S. to generate seasonally and diurnally varying anthropogenic heating profiles for a range of global cities. The methodological approach presented here is flexible and straightforwardly applicable to cities not modeled because of presently unavailable data. Because of the anticipated increase in global urban populations for many decades to come, characterizing this fundamental aspect of the urban environment - anthropogenic heating - is an essential element toward continued progress in urban climate assessment.
A Novel Concept for the Search and Retrieval of the Derwent Markush Resource Database.
Barth, Andreas; Stengel, Thomas; Litterst, Edwin; Kraut, Hans; Matuszczyk, Henry; Ailer, Franz; Hajkowski, Steve
2016-05-23
The representation of and search for generic chemical structures (Markush) remains a continuing challenge. Several research groups have addressed this problem, and over time a limited number of practical solutions have been proposed. Today there are two large commercial providers of Markush databases: Chemical Abstracts Service (CAS) and Thomson Reuters. The Thomson Reuters "Derwent" Markush database is currently offered via the online services Questel and STN and as a data feed for in-house use. The aim of this paper is to briefly review the existing Markush systems (databases plus search engines) and to describe our new approach for the implementation of the Derwent Markush Resource on STN. Our new approach demonstrates the integration of the Derwent Markush Resource database into the existing chemistry-focused STN platform without loss of detail. This provides compatibility with other structure and Markush databases on STN and at the same time makes it possible to deploy the specific features and functions of the Derwent approach. It is shown that the different Markush languages developed by CAS and Derwent can be combined into a single general Markush description. In this concept the generic nodes are grouped together in a unique hierarchy where all chemical elements and fragments can be integrated. As a consequence, both systems are searchable using a single structure query. Moreover, the presented concept could serve as a promising starting point for a common generalized description of Markush structures.
A dynamic clinical dental relational database.
Taylor, D; Naguib, R N G; Boulton, S
2004-09-01
The traditional approach to relational database design is based on the logical organization of data into a number of related normalized tables. One assumption is that the nature and structure of the data is known at the design stage. In the case of designing a relational database to store historical dental epidemiological data from individual clinical surveys, the structure of the data is not known until the data is presented for inclusion into the database. This paper addresses the issues concerned with the theoretical design of a clinical dynamic database capable of adapting the internal table structure to accommodate clinical survey data, and presents a prototype database application capable of processing, displaying, and querying the dental data.
Macfarlane, P.A.
2009-01-01
Regional aquifers in thick sequences of continentally derived heterolithic deposits, such as the High Plains of the North American Great Plains, are difficult to characterize hydrostratigraphically because of their framework complexity and the lack of high-quality subsurface information from drill cores and geophysical logs. However, using a database of carefully evaluated drillers' and sample logs and commercially available visualization software, it is possible to qualitatively characterize these complex frameworks based on the concept of relative permeability. Relative permeability is the permeable fraction of a deposit expressed as a percentage of its total thickness. In this methodology, uncemented coarse and fine sediments are arbitrarily set at relative permeabilities of 100% and 0%, respectively, with allowances made for log entries containing descriptions of mixed lithologies, heterolithic strata, and cementation. To better understand the arrangement of high- and low-permeability domains within the High Plains aquifer, a pilot study was undertaken in southwest Kansas to create three-dimensional visualizations of relative permeability using a database of >3000 logs. Aggregate relative permeability ranges up to 99% with a mean of 51%. Laterally traceable, thick domains of >80% relative permeability embedded within a lower relative permeability matrix strongly suggest that preferred pathways for lateral and vertical water transmission exist within the aquifer. Similarly, domains with relative permeabilities of <45% are traceable laterally over appreciable distances in the sub-surface and probably act as leaky confining layers. This study shows that the aquifer does not consist solely of local, randomly distributed, hydrostratigraphic units, as suggested by previous studies. ?? 2009 Geological Society of America.
XML: James Webb Space Telescope Database Issues, Lessons, and Status
NASA Technical Reports Server (NTRS)
Detter, Ryan; Mooney, Michael; Fatig, Curtis
2003-01-01
This paper will present the current concept using extensible Markup Language (XML) as the underlying structure for the James Webb Space Telescope (JWST) database. The purpose of using XML is to provide a JWST database, independent of any portion of the ground system, yet still compatible with the various systems using a variety of different structures. The testing of the JWST Flight Software (FSW) started in 2002, yet the launch is scheduled for 2011 with a planned 5-year mission and a 5-year follow on option. The initial database and ground system elements, including the commands, telemetry, and ground system tools will be used for 19 years, plus post mission activities. During the Integration and Test (I&T) phases of the JWST development, 24 distinct laboratories, each geographically dispersed, will have local database tools with an XML database. Each of these laboratories database tools will be used for the exporting and importing of data both locally and to a central database system, inputting data to the database certification process, and providing various reports. A centralized certified database repository will be maintained by the Space Telescope Science Institute (STScI), in Baltimore, Maryland, USA. One of the challenges for the database is to be flexible enough to allow for the upgrade, addition or changing of individual items without effecting the entire ground system. Also, using XML should allow for the altering of the import and export formats needed by the various elements, tracking the verification/validation of each database item, allow many organizations to provide database inputs, and the merging of the many existing database processes into one central database structure throughout the JWST program. Many National Aeronautics and Space Administration (NASA) projects have attempted to take advantage of open source and commercial technology. Often this causes a greater reliance on the use of Commercial-Off-The-Shelf (COTS), which is often limiting. In our review of the database requirements and the COTS software available, only very expensive COTS software will meet 90% of requirements. Even with the high projected initial cost of COTS, the development and support for custom code over the 19-year mission period was forecasted to be higher than the total licensing costs. A group did look at reusing existing database tools and formats. If the JWST database was already in a mature state, the reuse made sense, but with the database still needing to handing the addition of different types of command and telemetry structures, defining new spacecraft systems, accept input and export to systems which has not been defined yet, XML provided the flexibility desired. It remains to be determined whether the XML database will reduce the over all cost for the JWST mission.
Yang, Mei; Wang, Danhua; Yu, Lingxiang; Guo, Chaonan; Guo, Xiaodong; Lin, Na
2013-01-01
Aim To screen novel markers for hepatocellular carcinoma (HCC) by a combination of expression profile, interaction network analysis and clinical validation. Methods HCC significant molecules which are differentially expressed or had genetic variations in HCC tissues were obtained from five existing HCC related databases (OncoDB.HCC, HCC.net, dbHCCvar, EHCO and Liverome). Then, the protein-protein interaction (PPI) network of these molecules was constructed. Three topological features of the network ('Degree', 'Betweenness', and 'Closeness') and the k-core algorithm were used to screen candidate HCC markers which play crucial roles in tumorigenesis of HCC. Furthermore, the clinical significance of two candidate HCC markers growth factor receptor-bound 2 (GRB2) and GRB2-associated-binding protein 1 (GAB1) was validated. Results In total, 6179 HCC significant genes and 977 HCC significant proteins were collected from existing HCC related databases. After network analysis, 331 candidate HCC markers were identified. Especially, GAB1 has the highest k-coreness suggesting its central localization in HCC related network, and the interaction between GRB2 and GAB1 has the largest edge-betweenness implying it may be biologically important to the function of HCC related network. As the results of clinical validation, the expression levels of both GRB2 and GAB1 proteins were significantly higher in HCC tissues than those in their adjacent nonneoplastic tissues. More importantly, the combined GRB2 and GAB1 protein expression was significantly associated with aggressive tumor progression and poor prognosis in patients with HCC. Conclusion This study provided an integrative analysis by combining expression profile and interaction network analysis to identify a list of biologically significant HCC related markers and pathways. Further experimental validation indicated that the aberrant expression of GRB2 and GAB1 proteins may be strongly related to tumor progression and prognosis in patients with HCC. The overexpression of GRB2 in combination with upregulation of GAB1 may be an unfavorable prognostic factor for HCC. PMID:24391994
Bioinformatic flowchart and database to investigate the origins and diversity of Clan AA peptidases
Llorens, Carlos; Futami, Ricardo; Renaud, Gabriel; Moya, Andrés
2009-01-01
Background Clan AA of aspartic peptidases relates the family of pepsin monomers evolutionarily with all dimeric peptidases encoded by eukaryotic LTR retroelements. Recent findings describing various pools of single-domain nonviral host peptidases, in prokaryotes and eukaryotes, indicate that the diversity of clan AA is larger than previously thought. The ensuing approach to investigate this enzyme group is by studying its phylogeny. However, clan AA is a difficult case to study due to the low similarity and different rates of evolution. This work is an ongoing attempt to investigate the different clan AA families to understand the cause of their diversity. Results In this paper, we describe in-progress database and bioinformatic flowchart designed to characterize the clan AA protein domain based on all possible protein families through ancestral reconstructions, sequence logos, and hidden markov models (HMMs). The flowchart includes the characterization of a major consensus sequence based on 6 amino acid patterns with correspondence with Andreeva's model, the structural template describing the clan AA peptidase fold. The set of tools is work in progress we have organized in a database within the GyDB project, referred to as Clan AA Reference Database . Conclusion The pre-existing classification combined with the evolutionary history of LTR retroelements permits a consistent taxonomical collection of sequence logos and HMMs. This set is useful for gene annotation but also a reference to evaluate the diversity of, and the relationships among, the different families. Comparisons among HMMs suggest a common ancestor for all dimeric clan AA peptidases that is halfway between single-domain nonviral peptidases and those coded by Ty3/Gypsy LTR retroelements. Sequence logos reveal how all clan AA families follow similar protein domain architecture related to the peptidase fold. In particular, each family nucleates a particular consensus motif in the sequence position related to the flap. The different motifs constitute a network where an alanine-asparagine-like variable motif predominates, instead of the canonical flap of the HIV-1 peptidase and closer relatives. Reviewers This article was reviewed by Daniel H. Haft, Vladimir Kapitonov (nominated by Jerry Jurka), and Ben M. Dunn (nominated by Claus Wilke). PMID:19173708
MitoRes: a resource of nuclear-encoded mitochondrial genes and their products in Metazoa.
Catalano, Domenico; Licciulli, Flavio; Turi, Antonio; Grillo, Giorgio; Saccone, Cecilia; D'Elia, Domenica
2006-01-24
Mitochondria are sub-cellular organelles that have a central role in energy production and in other metabolic pathways of all eukaryotic respiring cells. In the last few years, with more and more genomes being sequenced, a huge amount of data has been generated providing an unprecedented opportunity to use the comparative analysis approach in studies of evolution and functional genomics with the aim of shedding light on molecular mechanisms regulating mitochondrial biogenesis and metabolism. In this context, the problem of the optimal extraction of representative datasets of genomic and proteomic data assumes a crucial importance. Specialised resources for nuclear-encoded mitochondria-related proteins already exist; however, no mitochondrial database is currently available with the same features of MitoRes, which is an update of the MitoNuc database extensively modified in its structure, data sources and graphical interface. It contains data on nuclear-encoded mitochondria-related products for any metazoan species for which this type of data is available and also provides comprehensive sequence datasets (gene, transcript and protein) as well as useful tools for their extraction and export. MitoRes http://www2.ba.itb.cnr.it/MitoRes/ consolidates information from publicly external sources and automatically annotates them into a relational database. Additionally, it also clusters proteins on the basis of their sequence similarity and interconnects them with genomic data. The search engine and sequence management tools allow the query/retrieval of the database content and the extraction and export of sequences (gene, transcript, protein) and related sub-sequences (intron, exon, UTR, CDS, signal peptide and gene flanking regions) ready to be used for in silico analysis. The tool we describe here has been developed to support lab scientists and bioinformaticians alike in the characterization of molecular features and evolution of mitochondrial targeting sequences. The way it provides for the retrieval and extraction of sequences allows the user to overcome the obstacles encountered in the integrative use of different bioinformatic resources and the completeness of the sequence collection allows intra- and interspecies comparison at different biological levels (gene, transcript and protein).
Lee, Ken Ka-Yin; Tang, Wai-Choi; Choi, Kup-Sze
2013-04-01
Clinical data are dynamic in nature, often arranged hierarchically and stored as free text and numbers. Effective management of clinical data and the transformation of the data into structured format for data analysis are therefore challenging issues in electronic health records development. Despite the popularity of relational databases, the scalability of the NoSQL database model and the document-centric data structure of XML databases appear to be promising features for effective clinical data management. In this paper, three database approaches--NoSQL, XML-enabled and native XML--are investigated to evaluate their suitability for structured clinical data. The database query performance is reported, together with our experience in the databases development. The results show that NoSQL database is the best choice for query speed, whereas XML databases are advantageous in terms of scalability, flexibility and extensibility, which are essential to cope with the characteristics of clinical data. While NoSQL and XML technologies are relatively new compared to the conventional relational database, both of them demonstrate potential to become a key database technology for clinical data management as the technology further advances. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
DOT National Transportation Integrated Search
2009-07-01
"Considerable data exists for soils that were tested and documented, both for native properties and : properties with pozzolan stabilization. While the data exists there was no database for the Nebraska : Department of Roads to retrieve this data for...
Challenges in developing medicinal plant databases for sharing ethnopharmacological knowledge.
Ningthoujam, Sanjoy Singh; Talukdar, Anupam Das; Potsangbam, Kumar Singh; Choudhury, Manabendra Dutta
2012-05-07
Major research contributions in ethnopharmacology have generated vast amount of data associated with medicinal plants. Computerized databases facilitate data management and analysis making coherent information available to researchers, planners and other users. Web-based databases also facilitate knowledge transmission and feed the circle of information exchange between the ethnopharmacological studies and public audience. However, despite the development of many medicinal plant databases, a lack of uniformity is still discernible. Therefore, it calls for defining a common standard to achieve the common objectives of ethnopharmacology. The aim of the study is to review the diversity of approaches in storing ethnopharmacological information in databases and to provide some minimal standards for these databases. Survey for articles on medicinal plant databases was done on the Internet by using selective keywords. Grey literatures and printed materials were also searched for information. Listed resources were critically analyzed for their approaches in content type, focus area and software technology. Necessity for rapid incorporation of traditional knowledge by compiling primary data has been felt. While citation collection is common approach for information compilation, it could not fully assimilate local literatures which reflect traditional knowledge. Need for defining standards for systematic evaluation, checking quality and authenticity of the data is felt. Databases focussing on thematic areas, viz., traditional medicine system, regional aspect, disease and phytochemical information are analyzed. Issues pertaining to data standard, data linking and unique identification need to be addressed in addition to general issues like lack of update and sustainability. In the background of the present study, suggestions have been made on some minimum standards for development of medicinal plant database. In spite of variations in approaches, existence of many overlapping features indicates redundancy of resources and efforts. As the development of global data in a single database may not be possible in view of the culture-specific differences, efforts can be given to specific regional areas. Existing scenario calls for collaborative approach for defining a common standard in medicinal plant database for knowledge sharing and scientific advancement. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
A computational platform to maintain and migrate manual functional annotations for BioCyc databases.
Walsh, Jesse R; Sen, Taner Z; Dickerson, Julie A
2014-10-12
BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database. We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers. Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.
SORTEZ: a relational translator for NCBI's ASN.1 database.
Hart, K W; Searls, D B; Overton, G C
1994-07-01
The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.
Comparative Bacterial Proteomics: Analysis of the Core Genome Concept
Callister, Stephen J.; McCue, Lee Ann; Turse, Joshua E.; Monroe, Matthew E.; Auberry, Kenneth J.; Smith, Richard D.; Adkins, Joshua N.; Lipton, Mary S.
2008-01-01
While comparative bacterial genomic studies commonly predict a set of genes indicative of common ancestry, experimental validation of the existence of this core genome requires extensive measurement and is typically not undertaken. Enabled by an extensive proteome database developed over six years, we have experimentally verified the expression of proteins predicted from genomic ortholog comparisons among 17 environmental and pathogenic bacteria. More exclusive relationships were observed among the expressed protein content of phenotypically related bacteria, which is indicative of the specific lifestyles associated with these organisms. Although genomic studies can establish relative orthologous relationships among a set of bacteria and propose a set of ancestral genes, our proteomics study establishes expressed lifestyle differences among conserved genes and proposes a set of expressed ancestral traits. PMID:18253490
OptoBase: A web platform for molecular optogenetics.
Kolar, Katja; Knobloch, Christian; Stork, Hendrik; Žnidarič, Matej; Weber, Wilfried
2018-06-18
OptoBase is an online platform for molecular optogenetics. At its core is a hand-annotated and ontology-supported database that aims to cover all existing optogenetic switches and publications, which is further complemented with a collection of convenient optogenetics-related web tools. OptoBase is meant for both expert optogeneticists, to easily keep track of the field, as well as for all researchers who find optogenetics inviting as a powerful tool to address their biological questions of interest. It is available at https://www.optobase.org. This work also presents OptoBase-based analysis of the trends in molecular optogenetics.
New Zealand's National Landslide Database
NASA Astrophysics Data System (ADS)
Rosser, B.; Dellow, S.; Haubrook, S.; Glassey, P.
2016-12-01
Since 1780, landslides have caused an average of about 3 deaths a year in New Zealand and have cost the economy an average of at least NZ$250M/a (0.1% GDP). To understand the risk posed by landslide hazards to society, a thorough knowledge of where, when and why different types of landslides occur is vital. The main objective for establishing the database was to provide a centralised national-scale, publically available database to collate landslide information that could be used for landslide hazard and risk assessment. Design of a national landslide database for New Zealand required consideration of both existing landslide data stored in a variety of digital formats, and future data, yet to be collected. Pre-existing databases were developed and populated with data reflecting the needs of the landslide or hazard project, and the database structures of the time. Bringing these data into a single unified database required a new structure capable of storing and delivering data at a variety of scales and accuracy and with different attributes. A "unified data model" was developed to enable the database to hold old and new landslide data irrespective of scale and method of capture. The database contains information on landslide locations and where available: 1) the timing of landslides and the events that may have triggered them; 2) the type of landslide movement; 3) the volume and area; 4) the source and debris tail; and 5) the impacts caused by the landslide. Information from a variety of sources including aerial photographs (and other remotely sensed data), field reconnaissance and media accounts has been collated and is presented for each landslide along with metadata describing the data sources and quality. There are currently nearly 19,000 landslide records in the database that include point locations, polygons of landslide source and deposit areas, and linear features. Several large datasets are awaiting upload which will bring the total number of landslides to over 100,000. The geo-spatial database is publicly available via the Internet. Software components, including the underlying database (PostGIS), Web Map Server (GeoServer) and web application use open-source software. The hope is that others will add relevant information to the database as well as download the data contained in it.
Using SQL Databases for Sequence Similarity Searching and Analysis.
Pearson, William R; Mackey, Aaron J
2017-09-13
Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Using Web Ontology Language to Integrate Heterogeneous Databases in the Neurosciences
Lam, Hugo Y.K.; Marenco, Luis; Shepherd, Gordon M.; Miller, Perry L.; Cheung, Kei-Hoi
2006-01-01
Integrative neuroscience involves the integration and analysis of diverse types of neuroscience data involving many different experimental techniques. This data will increasingly be distributed across many heterogeneous databases that are web-accessible. Currently, these databases do not expose their schemas (database structures) and their contents to web applications/agents in a standardized, machine-friendly way. This limits database interoperation. To address this problem, we describe a pilot project that illustrates how neuroscience databases can be expressed using the Web Ontology Language, which is a semantically-rich ontological language, as a common data representation language to facilitate complex cross-database queries. In this pilot project, an existing tool called “D2RQ” was used to translate two neuroscience databases (NeuronDB and CoCoDat) into OWL, and the resulting OWL ontologies were then merged. An OWL-based reasoner (Racer) was then used to provide a sophisticated query language (nRQL) to perform integrated queries across the two databases based on the merged ontology. This pilot project is one step toward exploring the use of semantic web technologies in the neurosciences. PMID:17238384
SAMMD: Staphylococcus aureus microarray meta-database.
Nagarajan, Vijayaraj; Elasri, Mohamed O
2007-10-02
Staphylococcus aureus is an important human pathogen, causing a wide variety of diseases ranging from superficial skin infections to severe life threatening infections. S. aureus is one of the leading causes of nosocomial infections. Its ability to resist multiple antibiotics poses a growing public health problem. In order to understand the mechanism of pathogenesis of S. aureus, several global expression profiles have been developed. These transcriptional profiles included regulatory mutants of S. aureus and growth of wild type under different growth conditions. The abundance of these profiles has generated a large amount of data without a uniform annotation system to comprehensively examine them. We report the development of the Staphylococcus aureus Microarray meta-database (SAMMD) which includes data from all the published transcriptional profiles. SAMMD is a web-accessible database that helps users to perform a variety of analysis against and within the existing transcriptional profiles. SAMMD is a relational database that uses MySQL as the back end and PHP/JavaScript/DHTML as the front end. The database is normalized and consists of five tables, which holds information about gene annotations, regulated gene lists, experimental details, references, and other details. SAMMD data is collected from the peer-reviewed published articles. Data extraction and conversion was done using perl scripts while data entry was done through phpMyAdmin tool. The database is accessible via a web interface that contains several features such as a simple search by ORF ID, gene name, gene product name, advanced search using gene lists, comparing among datasets, browsing, downloading, statistics, and help. The database is licensed under General Public License (GPL). SAMMD is hosted and available at http://www.bioinformatics.org/sammd/. Currently there are over 9500 entries for regulated genes, from 67 microarray experiments. SAMMD will help staphylococcal scientists to analyze their expression data and understand it at global level. It will also allow scientists to compare and contrast their transcriptome to that of the other published transcriptomes.
NASA Technical Reports Server (NTRS)
Orcutt, John M.; Brenton, James C.
2016-01-01
An accurate database of meteorological data is essential for designing any aerospace vehicle and for preparing launch commit criteria. Meteorological instrumentation were recently placed on the three Lightning Protection System (LPS) towers at Kennedy Space Center (KSC) launch complex 39B (LC-39B), which provide a unique meteorological dataset existing at the launch complex over an extensive altitude range. Data records of temperature, dew point, relative humidity, wind speed, and wind direction are produced at 40, 78, 116, and 139 m at each tower. The Marshall Space Flight Center Natural Environments Branch (EV44) received an archive that consists of one-minute averaged measurements for the period of record of January 2011 - April 2015. However, before the received database could be used EV44 needed to remove any erroneous data from within the database through a comprehensive quality control (QC) process. The QC process applied to the LPS towers' meteorological data is similar to other QC processes developed by EV44, which were used in the creation of meteorological databases for other towers at KSC. The QC process utilized in this study has been modified specifically for use with the LPS tower database. The QC process first includes a check of each individual sensor. This check includes removing any unrealistic data and checking the temporal consistency of each variable. Next, data from all three sensors at each height are checked against each other, checked against climatology, and checked for sensors that erroneously report a constant value. Then, a vertical consistency check of each variable at each tower is completed. Last, the upwind sensor at each level is selected to minimize the influence of the towers and other structures at LC-39B on the measurements. The selection process for the upwind sensor implemented a study of tower-induced turbulence. This paper describes in detail the QC process, QC results, and the attributes of the LPS towers meteorological database.
BIOFRAG – a new database for analyzing BIOdiversity responses to forest FRAGmentation
Pfeifer, Marion; Lefebvre, Veronique; Gardner, Toby A; Arroyo-Rodriguez, Victor; Baeten, Lander; Banks-Leite, Cristina; Barlow, Jos; Betts, Matthew G; Brunet, Joerg; Cerezo, Alexis; Cisneros, Laura M; Collard, Stuart; D'Cruze, Neil; da Silva Motta, Catarina; Duguay, Stephanie; Eggermont, Hilde; Eigenbrod, Felix; Hadley, Adam S; Hanson, Thor R; Hawes, Joseph E; Heartsill Scalley, Tamara; Klingbeil, Brian T; Kolb, Annette; Kormann, Urs; Kumar, Sunil; Lachat, Thibault; Lakeman Fraser, Poppy; Lantschner, Victoria; Laurance, William F; Leal, Inara R; Lens, Luc; Marsh, Charles J; Medina-Rangel, Guido F; Melles, Stephanie; Mezger, Dirk; Oldekop, Johan A; Overal, William L; Owen, Charlotte; Peres, Carlos A; Phalan, Ben; Pidgeon, Anna M; Pilia, Oriana; Possingham, Hugh P; Possingham, Max L; Raheem, Dinarzarde C; Ribeiro, Danilo B; Ribeiro Neto, Jose D; Douglas Robinson, W; Robinson, Richard; Rytwinski, Trina; Scherber, Christoph; Slade, Eleanor M; Somarriba, Eduardo; Stouffer, Philip C; Struebig, Matthew J; Tylianakis, Jason M; Tscharntke, Teja; Tyre, Andrew J; Urbina Cardona, Jose N; Vasconcelos, Heraldo L; Wearn, Oliver; Wells, Konstans; Willig, Michael R; Wood, Eric; Young, Richard P; Bradley, Andrew V; Ewers, Robert M
2014-01-01
Habitat fragmentation studies have produced complex results that are challenging to synthesize. Inconsistencies among studies may result from variation in the choice of landscape metrics and response variables, which is often compounded by a lack of key statistical or methodological information. Collating primary datasets on biodiversity responses to fragmentation in a consistent and flexible database permits simple data retrieval for subsequent analyses. We present a relational database that links such field data to taxonomic nomenclature, spatial and temporal plot attributes, and environmental characteristics. Field assessments include measurements of the response(s) (e.g., presence, abundance, ground cover) of one or more species linked to plots in fragments within a partially forested landscape. The database currently holds 9830 unique species recorded in plots of 58 unique landscapes in six of eight realms: mammals 315, birds 1286, herptiles 460, insects 4521, spiders 204, other arthropods 85, gastropods 70, annelids 8, platyhelminthes 4, Onychophora 2, vascular plants 2112, nonvascular plants and lichens 320, and fungi 449. Three landscapes were sampled as long-term time series (>10 years). Seven hundred and eleven species are found in two or more landscapes. Consolidating the substantial amount of primary data available on biodiversity responses to fragmentation in the context of land-use change and natural disturbances is an essential part of understanding the effects of increasing anthropogenic pressures on land. The consistent format of this database facilitates testing of generalizations concerning biologic responses to fragmentation across diverse systems and taxa. It also allows the re-examination of existing datasets with alternative landscape metrics and robust statistical methods, for example, helping to address pseudo-replication problems. The database can thus help researchers in producing broad syntheses of the effects of land use. The database is dynamic and inclusive, and contributions from individual and large-scale data-collection efforts are welcome. PMID:24967073
SAMMD: Staphylococcus aureus Microarray Meta-Database
Nagarajan, Vijayaraj; Elasri, Mohamed O
2007-01-01
Background Staphylococcus aureus is an important human pathogen, causing a wide variety of diseases ranging from superficial skin infections to severe life threatening infections. S. aureus is one of the leading causes of nosocomial infections. Its ability to resist multiple antibiotics poses a growing public health problem. In order to understand the mechanism of pathogenesis of S. aureus, several global expression profiles have been developed. These transcriptional profiles included regulatory mutants of S. aureus and growth of wild type under different growth conditions. The abundance of these profiles has generated a large amount of data without a uniform annotation system to comprehensively examine them. We report the development of the Staphylococcus aureus Microarray meta-database (SAMMD) which includes data from all the published transcriptional profiles. SAMMD is a web-accessible database that helps users to perform a variety of analysis against and within the existing transcriptional profiles. Description SAMMD is a relational database that uses MySQL as the back end and PHP/JavaScript/DHTML as the front end. The database is normalized and consists of five tables, which holds information about gene annotations, regulated gene lists, experimental details, references, and other details. SAMMD data is collected from the peer-reviewed published articles. Data extraction and conversion was done using perl scripts while data entry was done through phpMyAdmin tool. The database is accessible via a web interface that contains several features such as a simple search by ORF ID, gene name, gene product name, advanced search using gene lists, comparing among datasets, browsing, downloading, statistics, and help. The database is licensed under General Public License (GPL). Conclusion SAMMD is hosted and available at . Currently there are over 9500 entries for regulated genes, from 67 microarray experiments. SAMMD will help staphylococcal scientists to analyze their expression data and understand it at global level. It will also allow scientists to compare and contrast their transcriptome to that of the other published transcriptomes. PMID:17910768
Two centuries of French patents as documentation of musical instrument construction
NASA Astrophysics Data System (ADS)
Jean, Haury
2005-09-01
The French Patent Office I.N.P.I. has preserved the originals of ca. 12
Measurement of the local food environment: a comparison of existing data sources.
Bader, Michael D M; Ailshire, Jennifer A; Morenoff, Jeffrey D; House, James S
2010-03-01
Studying the relation between the residential environment and health requires valid, reliable, and cost-effective methods to collect data on residential environments. This 2002 study compared the level of agreement between measures of the presence of neighborhood businesses drawn from 2 common sources of data used for research on the built environment and health: listings of businesses from commercial databases and direct observations of city blocks by raters. Kappa statistics were calculated for 6 types of businesses-drugstores, liquor stores, bars, convenience stores, restaurants, and grocers-located on 1,663 city blocks in Chicago, Illinois. Logistic regressions estimated whether disagreement between measurement methods was systematically correlated with the socioeconomic and demographic characteristics of neighborhoods. Levels of agreement between the 2 sources were relatively high, with significant (P < 0.001) kappa statistics for each business type ranging from 0.32 to 0.70. Most business types were more likely to be reported by direct observations than in the commercial database listings. Disagreement between the 2 sources was not significantly correlated with the socioeconomic and demographic characteristics of neighborhoods. Results suggest that researchers should have reasonable confidence using whichever method (or combination of methods) is most cost-effective and theoretically appropriate for their research design.
System, method and apparatus for generating phrases from a database
NASA Technical Reports Server (NTRS)
McGreevy, Michael W. (Inventor)
2004-01-01
A phrase generation is a method of generating sequences of terms, such as phrases, that may occur within a database of subsets containing sequences of terms, such as text. A database is provided and a relational model of the database is created. A query is then input. The query includes a term or a sequence of terms or multiple individual terms or multiple sequences of terms or combinations thereof. Next, several sequences of terms that are contextually related to the query are assembled from contextual relations in the model of the database. The sequences of terms are then sorted and output. Phrase generation can also be an iterative process used to produce sequences of terms from a relational model of a database.
Science Across the World in Teacher Training
ERIC Educational Resources Information Center
Schoen, Lida; Weishet, Egbert; Kennedy, Declan
2007-01-01
Science Across the World is an exchange programme between schools world-wide. It has two main components: existing resources for students (age 6-10) and a database with all participating schools. The programme exists since 1990. It is carried out in partnership with the British Association of Science Education (ASE) and international…
SPATIALLY-BALANCED SURVEY DESIGN FOR GROUNDWATER USING EXISTING WELLS
Many states have a monitoring program to evaluate the water quality of groundwater across the state. These programs rely on existing wells for access to the groundwater, due to the high cost of drilling new wells. Typically, a state maintains a database of all well locations, in...